-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for large JSON documents (> Int32::MAX
)
#16144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
People do actual JSON that large without resorting to JSONL or similar? Yikes, that is the kind of size where a streaming parser would be nice to have (like SAX parsers for XML. Dunno what is possible for JSON). |
|
Is there a reason not to use unsigned here? |
Unsigned types are more susceptible to errors on math operations. Technically, the domain range would be covered by unsigned values. But they propagate into other calculations and cause issues there. |
I've marked the line/column accessors experimental, and the i32 methods now raise on overflow instead of saturate. I think this PR can probably be reviewed in more detail unless there's still concerns about the overall approach. |
Int32::MAX
)
@RX14 We're in the middle of the freeze period for 1.18.0. This is a feature addition and shouldn't have been merged into master until after the release. It seems I forgot adding it to the milestone after I approved it. It should've been added to 1.19.0 though. |
This reverts commit c4b8bbf.
The commit has been reverted. |
Sorry, that's my bad for not reading the processes. I've re-made the PR in #16211. |
Addresses #12946
This is a WIP, mostly to discuss the general approach now something is compiling. I'd like to avoid nitpicks about the code and focus on if I've missed an i32 anywhere, and the resulting API. This code has been tested parsing a 150GiB JSON file, but I can't be sure I didn't miss something.
If saturating the line/column numbers is the consensus, the
JSON.saturate_to_i32
helper would be nice to generalise toIntXX#saturate_to_ixx
.