Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv: remove old workaround in UTF8Reader #546

Open
wants to merge 3 commits into
base: 2.19
Choose a base branch
from

Conversation

pjfanning
Copy link
Member

@@ -403,19 +403,6 @@ private boolean loadMore(int available) throws IOException
// Bytes that need to be moved to the beginning of buffer?
if (available > 0) {
if (_inputPtr > 0) {
if (!canModifyBuffer()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you feel this is no longer needed? This is to prevent reader from modifying buffer caller gave, in case moving would occur.

If anything, looks like it should avoid copying even if _inputSource was not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It'd solve #497?

We do need to avoid modifying buffer, however, I don't think this PR quite works in that sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test pass without this check. The existing tests and we gain an extra working test (the 497 test).

I don't know how to rewrite this because I don't have a broken test to work with. My main aim is not to have the 497 test (which is a valid test) fail. I'm not against revisiting this but since this change breaks no tests (other than minor woring diff in exception message on 1 test) and fixes one broken test, I'd prefer to remove this check for now. If this was merged, I would follow up and remove the same check in the TOML and YAML code bases because there is no proof that this check helps there either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have a test to verify underlying buffer is not modified when it shouldn't be -- which is what check does -- but even without that we should not make change that will now unexpectedly do that.

Specifically this is the case where caller passes byte[] containing UTF-8 content: expectation is that this buffer is not modified. And as such it should not be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ok, so I extended the tests to see if array input gets modified and it does
  • is there anyway that the parser could be modified to clone the array when we hit this case and to work with the cloned array - to avoid modifying the input array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the code check for whether the buffer is modifiable - but instead of throwing an exception, I clone the buffer to avoid modifying the original. The cloning only happens in rare cases but I think it is worth the runtime hit to fix the broken test.

@pjfanning pjfanning marked this pull request as draft March 7, 2025 12:20
@pjfanning pjfanning marked this pull request as ready for review March 7, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants