Skip to content

ListingTable silently drops paths when given many files #18242

@colinmarc

Description

@colinmarc

Describe the bug

In our system, we hand a large list of individual S3 paths (taken from an iceberg manifest) to ListingTable. With high levels of paralellism, we've observed ListingTable dropping 5-10% of those files silently, leading to incorrect results.

To Reproduce

Here is a test you can run without AWS credentials: https://gist.github.com/colinmarc/2e1c62a1909b2aa63b6bdbad4b81ce64

Expected behavior

There are 3283 paths in the input, and the scan is unfiltered, so it should read all of them. Instead, it reads around ~2950 on my machine (the number is different every time).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions