-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv: remove old workaround in UTF8Reader #546
base: 2.19
Are you sure you want to change the base?
Conversation
@@ -403,19 +403,6 @@ private boolean loadMore(int available) throws IOException | |||
// Bytes that need to be moved to the beginning of buffer? | |||
if (available > 0) { | |||
if (_inputPtr > 0) { | |||
if (!canModifyBuffer()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you feel this is no longer needed? This is to prevent reader from modifying buffer caller gave, in case moving would occur.
If anything, looks like it should avoid copying even if _inputSource
was not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. It'd solve #497?
We do need to avoid modifying buffer, however, I don't think this PR quite works in that sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test pass without this check. The existing tests and we gain an extra working test (the 497 test).
I don't know how to rewrite this because I don't have a broken test to work with. My main aim is not to have the 497 test (which is a valid test) fail. I'm not against revisiting this but since this change breaks no tests (other than minor woring diff in exception message on 1 test) and fixes one broken test, I'd prefer to remove this check for now. If this was merged, I would follow up and remove the same check in the TOML and YAML code bases because there is no proof that this check helps there either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have a test to verify underlying buffer is not modified when it shouldn't be -- which is what check does -- but even without that we should not make change that will now unexpectedly do that.
Specifically this is the case where caller passes byte[]
containing UTF-8 content: expectation is that this buffer is not modified. And as such it should not be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ok, so I extended the tests to see if array input gets modified and it does
- is there anyway that the parser could be modified to clone the array when we hit this case and to work with the cloned array - to avoid modifying the input array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added back the code check for whether the buffer is modifiable - but instead of throwing an exception, I clone the buffer to avoid modifying the original. The cloning only happens in rare cases but I think it is worth the runtime hit to fix the broken test.
UTF8Reader
throws"Need to move partially decoded character; buffer not modifiable" when read only one chinese char #497