Add stream parameters in pylibcudf IO APIs #17620

Matt711 · 2024-12-18T06:42:23Z

Description

Apart of #15163. As #13744 comes to a close, we can begin exposing the streams parameter to pylibcudf. This PR will focus on the IO APIs.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-12-18T06:42:27Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Matt711 · 2024-12-18T14:33:27Z

/ok to test

Matt711 · 2024-12-18T15:08:21Z

/ok to test

python/pylibcudf/pylibcudf/io/csv.pyx

python/pylibcudf/pylibcudf/io/json.pyx

Matt711 · 2024-12-18T16:27:28Z

/ok to test

python/pylibcudf/pylibcudf/io/csv.pyx

Matt711 · 2024-12-18T19:39:24Z

/ok to test

Matt711 · 2024-12-18T20:13:42Z

/ok to test

Matt711 · 2024-12-18T20:32:27Z

/ok to test

bdice

How should we test this? We have a pretty convoluted stream testing framework in libcudf right now. I wonder if there is a cleaner way to do that kind of thing in pylibcudf, or if there is potential for reuse of the custom library.

bdice · 2024-12-18T20:34:51Z

python/pylibcudf/pylibcudf/io/avro.pxd

@@ -20,4 +20,4 @@ cdef class AvroReaderOptionsBuilder:
    cpdef AvroReaderOptionsBuilder num_rows(self, size_type num_rows)
    cpdef AvroReaderOptions build(self)

-cpdef TableWithMetadata read_avro(AvroReaderOptions options)
+cpdef TableWithMetadata read_avro(AvroReaderOptions options, Stream stream = *)


What does = * do in Cython? I don't think I've used this syntax but I have seen it in a few places.

Indicates the argument is optional: ref

bdice · 2024-12-18T20:37:51Z

python/pylibcudf/pylibcudf/io/avro.pyx

+    if stream is None:
+        stream = Stream()


We need to streamline this. Maybe we can use a default besides None, or perhaps we need a cdef stream_or_default(stream) function.

Matt711 · 2024-12-18T20:46:25Z

How should we test this? We have a pretty convoluted stream testing framework in libcudf right now. I wonder if there is a cleaner way to do that kind of thing in pylibcudf, or if there is potential for reuse of the custom library.

We can create streams from numba and cupy (and default streams). I think we should add tests to each of the IO test modules. And test that the stream param works from cupy, numba, and default streams. Eg.

def test_read_parquet_with_numba_stream():
    import numba
    numba_stream = # create numba stream
    plc.io.parquet.read_parquet(options, Stream(numba_stream))
    ...

Matt711 · 2024-12-19T02:47:56Z

/ok to test

python/pylibcudf/pylibcudf/io/json.pyx

Matt711 · 2024-12-19T14:59:39Z

/ok to test

Matt711 · 2024-12-19T16:11:35Z

/ok to test

vyasr · 2024-12-20T19:21:06Z

We should address rapidsai/rmm#1770 before we merge this PR or anything like it in cudf.

Add stream parameters in pylibcudf IO APIs

e27cad2

Matt711 added feature request New feature or request non-breaking Non-breaking change labels Dec 18, 2024

github-actions bot assigned Matt711 Dec 18, 2024

add remaining streams

a72988f

github-actions bot added Python Affects Python cuDF API. pylibcudf Issues specific to the pylibcudf package labels Dec 18, 2024

Merge branch 'branch-25.02' into fea/plc/io/streams

76ffc61

clean up

48fc8a9

Matt711 mentioned this pull request Dec 18, 2024

[FEA] Make the rmm._cuda.stream.Stream a part of the public API rapidsai/rmm#1770

Open

Matt711 added 3 commits December 18, 2024 11:03

use stream

8cf0eb1

add stream to parquet_chunked_writer

dee9fda

add stream to orc_chunked_writer

b06dabc

Matt711 commented Dec 18, 2024

View reviewed changes

python/pylibcudf/pylibcudf/io/csv.pyx Outdated Show resolved Hide resolved

python/pylibcudf/pylibcudf/io/csv.pyx Show resolved Hide resolved

python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved

Matt711 and others added 5 commits December 18, 2024 11:08

Update python/pylibcudf/pylibcudf/io/csv.pyx

4fa1a1a

Update python/pylibcudf/pylibcudf/io/csv.pyx

a60f800

Update python/pylibcudf/pylibcudf/io/json.pyx

5727aa1

Merge branch 'branch-25.02' into fea/plc/io/streams

a8da383

clean up

9d1cc92

Matt711 commented Dec 18, 2024

View reviewed changes

python/pylibcudf/pylibcudf/io/csv.pyx Outdated Show resolved Hide resolved

fix typo

8477869

add stream param to cpp_read_orc

e117d64

Merge branch 'branch-25.02' into fea/plc/io/streams

0fd96db

bdice reviewed Dec 18, 2024

View reviewed changes

Matt711 and others added 2 commits December 18, 2024 18:01

Merge branch 'branch-25.02' into fea/plc/io/streams

8758baa

add a test

3913229

Matt711 commented Dec 19, 2024

View reviewed changes

python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved

python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved

Matt711 added 2 commits December 19, 2024 09:59

Update python/pylibcudf/pylibcudf/io/json.pyx

5affbbc

Update python/pylibcudf/pylibcudf/io/json.pyx

218e73b

Matt711 added 3 commits December 19, 2024 16:06

remove stream param from avro reader

c29cdc4

clean up

9f7e87e

clean up

3f84e5c

Merge branch 'branch-25.02' into fea/plc/io/streams

f34b519

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stream parameters in pylibcudf IO APIs #17620

Add stream parameters in pylibcudf IO APIs #17620

Matt711 commented Dec 18, 2024 •

edited

Loading

copy-pr-bot bot commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

bdice left a comment

bdice Dec 18, 2024

Matt711 Dec 18, 2024 •

edited

Loading

bdice Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 19, 2024

Matt711 commented Dec 19, 2024

Matt711 commented Dec 19, 2024

vyasr commented Dec 20, 2024

Add stream parameters in pylibcudf IO APIs #17620

Are you sure you want to change the base?

Add stream parameters in pylibcudf IO APIs #17620

Conversation

Matt711 commented Dec 18, 2024 • edited Loading

Description

Checklist

copy-pr-bot bot commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

Matt711 commented Dec 18, 2024

bdice left a comment

Choose a reason for hiding this comment

bdice Dec 18, 2024

Choose a reason for hiding this comment

Matt711 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

bdice Dec 18, 2024

Choose a reason for hiding this comment

Matt711 commented Dec 18, 2024

Matt711 commented Dec 19, 2024

Matt711 commented Dec 19, 2024

Matt711 commented Dec 19, 2024

vyasr commented Dec 20, 2024

Matt711 commented Dec 18, 2024 •

edited

Loading

Matt711 Dec 18, 2024 •

edited

Loading