Reduce copies when reading files in pyio, match behavior of _io #129005

cmaloney · 2025-01-18T22:59:43Z

Feature or enhancement

Proposal:

Currently _pyio uses ~2x as much memory to read all data from a file compared to _io. This is because it makes more than one copy of the data.

Details from test_fileio run

$ ./python -m test -M8g -uall test_largefile -m test_large_read -vvv
== CPython 3.14.0a4+ (heads/main-dirty:3829104ab41, Jan 17 2025, 21:40:47) [Clang 19.1.6 ]
== Linux-6.12.9-arch1-1-x86_64-with-glibc2.40 little-endian
== Python build: debug
== cwd: <$HOME>/python/build/build/test_python_worker_32392æ
== CPU count: 32
== encodings: locale=UTF-8 FS=utf-8
== resources: all

Using random seed: 1740056613
0:00:00 load avg: 0.53 Run 1 test sequentially in a single process
0:00:00 load avg: 0.53 [1/1] test_largefile
test_large_read (test.test_largefile.CLargeFileTest.test_large_read) ... 
 ... expected peak memory use: 4.7G
 ... process data size: 2.3G
ok
test_large_read (test.test_largefile.PyLargeFileTest.test_large_read) ... 
 ... expected peak memory use: 4.7G
 ... process data size: 2.3G
 ... process data size: 4.3G
 ... process data size: 4.7G
ok

----------------------------------------------------------------------
Ran 2 tests in 3.711s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 3.7 sec
Total tests: run=2 (filtered)
Total test files: run=1/1 (filtered)
Result: SUCCESS

Plan:

Switch to ~~os.readv()~~ os.readinto() to do readinto like C _Py_read used by _io does. os.read() can't take a buffer to use. This aligns behavior between _io.FileIO.readall and _pyio.FileIO.readall. os.readv works well today and takes a caller allocated buffer rather than needing to add a new os API. readv(2) mirrors the behavior and errors of read(2), so this should keep the same end behavior.
Update _pyio.BufferedIO to not force a copy of the buffer for readall when its internal buffer is empty. Currently it always slices its internal buffer then adds the result of _pyio.FileIO.readall to it.

For iterating, I'm using a small tracemalloc script to find where copies are:

from _pyio import open

import tracemalloc

with open("README.rst", 'rb') as file:
    tracemalloc.start()
    data = file.read()
    snap = tracemalloc.take_snapshot()


stats = snap.statistics('lineno')
for stat in stats:
    print(stat)

Loose Ends

os.readv seems to be well supported but is currently guarded by a configure check. I'd like to just make pyio require readv, but can do conditional code if needed. If making readv non-optional generally is feasible, happy to work on that.
- os.readv is not supported on WASI, so need to add conditional code.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

The text was updated successfully, but these errors were encountered:

`os.read` allocated and filled a buffer by calling `read(2)`, than that data was copied into the user provied buffer. Read directly into the caller's buffer instead by using `os.readv`. `self.read()` was doing the closed and readable checks so move those into `readinto`

cmaloney · 2025-01-22T20:26:16Z

I took a tangent and looked at the code complexity of adding os.readinto vs. the conditionals around os.readv not existing on some platforms + needing to pass a sequence of buffers to os.readv.

_pyio Using os.readv with conditional: https://github.com/python/cpython/pull/129006/files
Adding os.readinto, _pyio using it: cmaloney@52c6013

Happy to work on either course, slight preference for os.readinto as it feels like by adding one function removes a lot of new conditional code to make these cases more efficient. Code wise, with the Buffer Protocol, the code to me gets quite a bit easier to follow. Not sure it's worth another public OS api to maintain though.

Know moving to a full PR for os.readinto would need to add tests and news, just focusing on "adding os.readinto" vs. "using os.readv".

cc: @tomasr8 , @vstinner , @gpshead (reviewers where os.read vs. _Py_read / readinto has come up).

vstinner · 2025-01-22T22:35:35Z

Can you please open a separated issue for os.readinto()? This function looks useful and simple.

`os.read` allocated and filled a buffer by calling `read(2)`, than that data was copied into the user provied buffer. Read directly into the caller's buffer instead by using `os.readinto`. `os.readinto` uses `PyObject_GetBuffer`` to make sure the passed in buffer is writeable and bytes-like, drop the manual check.

`os.read` allocated and filled a buffer by calling `read(2)`, than that data was copied into the user provied buffer. Read directly into the caller's buffer instead by using `os.readinto`. `os.readinto` uses `PyObject_GetBuffer` to make sure the passed in buffer is writeable and bytes-like, drop the manual check.

`os.read()` allocated and filled a buffer by calling `read(2)`, than that data was copied into the user provied buffer. Read directly into the caller's buffer instead by using `os.readinto()`. `os.readinto()` uses `PyObject_GetBuffer()` to make sure the passed in buffer is writeable and bytes-like, drop the manual check.

Both now use a pre-allocated buffer of length `bufsize`, fill it using a readinto, and have matching "expand buffer" logic. On my machine this takes: `./python -m test -M8g -uall test_largefile -m test_large_read -v` from ~3.7 seconds to ~3.3 seconds

Slicing buf and appending chunk would always result in a copy. Commonly in a readall there is no already read data in buf, and the amount of data read may be large, so the copy is expensive.

This aligns the memory usage between _pyio and _io. Both use the same amount of memory.

cmaloney · 2025-01-29T21:09:30Z

The full set of changes on my Linux machine debug build, reduces ./python -m test -M8g -uall test_largefile -m test_large_read -v from ~3.8 seconds -> ~2.4 seconds with a peak memory usage of 2.3GB (identical memory usage for C and Python implementations).

Both now use a pre-allocated buffer of length `bufsize`, fill it using a readinto, and have matching "expand buffer" logic. On my machine this takes: `./python -m test -M8g -uall test_largefile -m test_large_read -v` from ~3.7 seconds to ~3.4 seconds

Both now use a pre-allocated buffer of length `bufsize`, fill it using a readinto(), and have matching "expand buffer" logic. On my machine this takes: `./python -m test -M8g -uall test_largefile -m test_large_read -v` from ~3.7 seconds to ~3.4 seconds.

Slicing buf and appending chunk would always result in a copy. Commonly in a readall() there is no already read data in buf, and the amount of data read may be large, so the copy is expensive.

This aligns the memory usage between _pyio and _io. Both now use the same amount of memory when reading a file.

Fixes python#129005

cmaloney · 2025-01-31T05:13:26Z

The _pyio changes seems to have broken a number of bots; currently investigating

ERROR: test_nonblock_pipe_write_bigbuf (test.test_io.PyMiscIOTest.test_nonblock_pipe_write_bigbuf)
ERROR: test_nonblock_pipe_write_smallbuf (test.test_io.PyMiscIOTest.test_nonblock_pipe_write_smallbuf)

https://buildbot.python.org/#/builders/338/builds/7968/steps/6/logs/stdio

(Failed on latest commit in main, 10ee2d9, but I think relates to these changes)

cc: @vstinner

…hon#129454)" This reverts commit e1c4ba9.

…#129500) This reverts commit e1c4ba9.

cmaloney · 2025-02-01T00:19:58Z

I think I found the issue in #129458. I tried using result[bytes_read:bufsize] = b'\0' to append bufsize-bytes_read null bytes to the bytearray, but that doesn't do that:

>>> a[0:5] = b'\0'
>>> a
bytearray(b'\x00')
>>> a[5:16] = b'\01'
>>> a
bytearray(b'\x00\x01')
>>> len(a)
2

(Pipes in the test are a case where we don't have a stat.st_size estimate, and can't seek to figure out size, so always goes through the resizing code). Not sure why that commit worked at first but later stopped working... Guessing a system needs to be somewhat loaded to hit.

Validated by adding an assert locally, which fails every time:

                result[bytes_read:bufsize] = b'\0'
                assert len(result) == bufsize, f"Should have expanded in size. {len(result)=}, {bufsize=}"

$ make && ./python -m test -uall test_io -vvv -m 'test_nonblock*'
<...>
AssertionError: Should have expanded in size. len(result)=8193, bufsize=16640

I don't see a clear way to resize the bytearray without making a whole array of null bytes which get copied into it... The tests just use the C API to resize: https://github.com/python/cpython/blob/main/Lib/test/test_capi/test_bytearray.py#L134-L160

Would adding a .resize() member be reasonable here? Is probably possible to do efficiently with slicing, but not in a way I know how...

cc: @vstinner

Move to a linear slice append with an iterator which has a length hint. This is more expensive then PyByteArray_Resize, but I think as efficient as can get without a new bytearray Python API to resize. The previous code didn't append as I had intended: ```python a = bytearray() >>> a[0:5] = b'\0' >>> a bytearray(b'\x00') >>> a[5:16] = b'\01' >>> a bytearray(b'\x00\x01') >>> len(a) 2 ```

…9458)" This reverts commit f927204.

cmaloney added the type-feature A feature request or enhancement label Jan 18, 2025

bedevere-app bot mentioned this issue Jan 18, 2025

gh-129005: Avoid copy in _pyio.FileIO.readinto #129006

Closed

cmaloney mentioned this issue Jan 19, 2025

gh-129011: Update comments in FileIO to match current code #129012

Open

cmaloney mentioned this issue Jan 22, 2025

Add os.readinto API for reading data into a caller provided buffer #129205

Open

7 tasks

picnixz added performance Performance or resource usage stdlib Python modules in the Lib dir labels Jan 23, 2025

bedevere-app bot mentioned this issue Jan 26, 2025

gh-129005: Avoid copy in _pyio.FileIO.readinto #129324

Merged

bedevere-app bot mentioned this issue Jan 29, 2025

gh-129005: Align FileIO.readall allocation #129424

Closed

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 29, 2025

pythongh-129005: _pyio.BufferedIO Remove copy on readall

e382660

bedevere-app bot mentioned this issue Jan 29, 2025

gh-129005: _pyio.BufferedIO remove copy on readall #129454

Merged

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 29, 2025

pythongh-129005: _pyio.BufferedIO Remove copy on readall

d0d9463

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 29, 2025

pythongh-129005: Remove copy in _pyio.FileIO.readall()

9ccf3e2

This aligns the memory usage between _pyio and _io. Both use the same amount of memory.

bedevere-app bot mentioned this issue Jan 29, 2025

gh-129005: Align FileIO.readall allocation #129458

Merged

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 31, 2025

pythongh-129005: Remove copy in _pyio.FileIO.readall()

a881837

This aligns the memory usage between _pyio and _io. Both now use the same amount of memory when reading a file.

bedevere-app bot mentioned this issue Jan 31, 2025

gh-129005: Remove copy in _pyio.FileIO.readall() #129496

Closed

SunderSingh27 added a commit to SunderSingh27/cpython that referenced this issue Jan 31, 2025

Reduce copies when reading files in pyio, match behavior of _io

e195ba9

Fixes python#129005

SunderSingh27 mentioned this issue Jan 31, 2025

spam #129499

Closed

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 31, 2025

Revert "pythongh-129005: _pyio.BufferedIO remove copy on readall (pyt…

c37b122

…hon#129454)" This reverts commit e1c4ba9.

bedevere-app bot mentioned this issue Jan 31, 2025

Revert "gh-129005: _pyio.BufferedIO remove copy on readall (#129454)" #129500

Merged

cmaloney added a commit to cmaloney/cpython that referenced this issue Jan 31, 2025

pythongh-129005 More precisely check if buffer is empty

a98b60a

vstinner pushed a commit that referenced this issue Jan 31, 2025

Revert "gh-129005: _pyio.BufferedIO remove copy on readall (#129454)" (…

3ebe3d7

…#129500) This reverts commit e1c4ba9.

cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 1, 2025

pythongh-129005: Use bytearray.resize

3419a39

cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 1, 2025

pythongh-129005: Fix buffer expansion in _pyio.FileIO.readall

5bf04e3

bedevere-app bot mentioned this issue Feb 1, 2025

gh-129005: Fix buffer expansion in _pyio.FileIO.readall #129541

Open

cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 2, 2025

Revert "pythongh-129005: Align FileIO.readall() allocation (python#12…

eb3aa4b

…9458)" This reverts commit f927204.

bedevere-app bot mentioned this issue Feb 2, 2025

Revert "gh-129005: Align FileIO.readall() allocation (#129458)" #129572

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce copies when reading files in pyio, match behavior of _io #129005

Reduce copies when reading files in pyio, match behavior of _io #129005

cmaloney commented Jan 18, 2025 •

edited by bedevere-app bot

Loading

cmaloney commented Jan 22, 2025 •

edited

Loading

vstinner commented Jan 22, 2025 •

edited

Loading

cmaloney commented Jan 29, 2025

cmaloney commented Jan 31, 2025 •

edited

Loading

cmaloney commented Feb 1, 2025 •

edited

Loading

Reduce copies when reading files in pyio, match behavior of _io #129005

Reduce copies when reading files in pyio, match behavior of _io #129005

Comments

cmaloney commented Jan 18, 2025 • edited by bedevere-app bot Loading

Feature or enhancement

Proposal:

Plan:

Loose Ends

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

Linked PRs

cmaloney commented Jan 22, 2025 • edited Loading

vstinner commented Jan 22, 2025 • edited Loading

cmaloney commented Jan 29, 2025

cmaloney commented Jan 31, 2025 • edited Loading

cmaloney commented Feb 1, 2025 • edited Loading

cmaloney commented Jan 18, 2025 •

edited by bedevere-app bot

Loading

cmaloney commented Jan 22, 2025 •

edited

Loading

vstinner commented Jan 22, 2025 •

edited

Loading

cmaloney commented Jan 31, 2025 •

edited

Loading

cmaloney commented Feb 1, 2025 •

edited

Loading