Skip to content

Conversation

@bilichboris
Copy link
Contributor

  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
  • I have reviewed and followed all the contribution guidelines
  • If I used AI to develop this pull request, I prompted it to follow AGENTS.md.

This PR fixes inconsistent behavior of JsonReader with lines=True, when chunksize does not divide nrows or when nrows=0. This can be illustrated by the following two snippets:

import pandas as pd
from io import StringIO

jsonl = """{"a": 1, "b": 2}
{"a": 3, "b": 4}
{"a": 5, "b": 6}
{"a": 7, "b": 8}"""

reader = pd.read_json(StringIO(jsonl), lines=True, nrows=3, chunksize=2)
result = pd.concat(reader)

# Should be exactly 3 rows; pre-fix returns 4.
assert len(result) == 3, f"expected 3 rows, got {len(result)}"

jsonl = """{"a": 1, "b": 2}
{"a": 3, "b": 4}
"""

result = pd.read_json(StringIO(jsonl), lines=True, nrows=0)
assert len(result) == 0, f"expected 0 rows, got {len(result)}" # got 2

@bilichboris bilichboris marked this pull request as ready for review February 4, 2026 14:44
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

# If self.chunksize, we prepare the data for the `__next__` method.
# Otherwise, we read it into memory for the `read` method.
if not (self.chunksize or self.nrows):
if self.chunksize is None and self.nrows is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we validate that if these are not None, then nrows is nonnegative and chunksize is positive?

Comment on lines +1094 to 1096
if self.nrows is not None and self.nrows_seen >= self.nrows:
self.close()
raise StopIteration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block can just be removed

Comment on lines +30 to +32
- Fixed :func:`read_json` with ``lines=True`` and ``chunksize`` to respect ``nrows``
when the requested row count is not a multiple of the chunk size.
- Fixed :func:`read_json` with ``lines=True`` and ``nrows=0`` returning all rows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the note to 3.1.0 since this is not a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants