Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom implementation LocalFileSystem::list_with_offset #7019

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

corwinjoy
Copy link

Which issue does this PR close?

Closes #7018.

Rationale for this change

Performance enhancement needed for network file system as described in the PR.

What changes are included in this PR?

Add an implementation LocalFileSystem::list_with_offset so that it can intelligently scan files and drastically reduce the number of statx system calls. Basically, we add a prefilter to pull only what is needed in-place.

Are there any user-facing changes?

None.

@github-actions github-actions bot added the object-store Object Store Interface label Jan 24, 2025
@corwinjoy
Copy link
Author

The initial work for this PR was done by @dplasto, who reported the issue and provided an initial implementation. Further suggestions were provided by @adamreeve .

@tustvold
Copy link
Contributor

It would appear this is legitimately failing CI

@alamb alamb marked this pull request as draft January 25, 2025 17:13
@alamb
Copy link
Contributor

alamb commented Jan 25, 2025

Thanks @corwinjoy and @tustvold -- marking this as a draft as we work through the test failures

@corwinjoy
Copy link
Author

corwinjoy commented Jan 25, 2025 via email

@etseidl
Copy link
Contributor

etseidl commented Jan 26, 2025

I'm not sure what is happening with the CI. It seems to be failing formatting. But, I ran 'cargo format' and clippy as specified in Contributing.md so I am not sure what is wrong.

Are you running from the arrow-rs directory or the object_store directory. When I run cargo fmt --all -- --check from the object_store directory I get similar errors to CI.

@corwinjoy
Copy link
Author

corwinjoy commented Jan 26, 2025

Ah, thanks for the clarification @etseidl! From the guidelines I didn't realize that I had to run these commands in the object-store rather than the arrow-rs directory. Doing that, I do get suggested changes which I have applied. Hopefully this will clear the CI checks.

@alamb
Copy link
Contributor

alamb commented Jan 27, 2025

Ah, thanks for the clarification @etseidl! From the guidelines I didn't realize that I had to run these commands in the object-store rather than the arrow-rs directory. Doing that, I do get suggested changes which I have applied. Hopefully this will clear the CI checks.

The emulator tests appear to be failing: https://github.com/apache/arrow-rs/actions/runs/12970828706/job/36180799417?pr=7019

---- local::tests::test_path_with_offset stdout ----
thread 'local::tests::test_path_with_offset' panicked at src/local.rs:1440:66:
called `Result::unwrap()` on an `Err` value: Generic { store: "LocalFileSystem", source: UnableToCanonicalize { path: "../testing/data/arrow-ipc-file", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } } }
stack backtrace:

…or testing rather than pointing to existing dir.
@corwinjoy
Copy link
Author

OK. I have changed this test to use a temporary test and files which should make it more portable. We will see what the emulator says about this.

@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

OK. I have changed this test to use a temporary test and files which should make it more portable. We will see what the emulator says about this.

Looks like the tests are good

@alamb
Copy link
Contributor

alamb commented Jan 29, 2025

This PR looks ready for review to me. @corwinjoy shall we mark it as ready for review?

@corwinjoy
Copy link
Author

@alamb Yes please. Mark it as ready for review.

@alamb alamb marked this pull request as ready for review January 29, 2025 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LocalFileSystem::list_with_offset is very slow over network file system
4 participants