Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream parameters in pylibcudf IO APIs #17620

Draft
wants to merge 23 commits into
base: branch-25.02
Choose a base branch
from

Conversation

Matt711
Copy link
Contributor

@Matt711 Matt711 commented Dec 18, 2024

Description

Apart of #15163. As #13744 comes to a close, we can begin exposing the streams parameter to pylibcudf. This PR will focus on the IO APIs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Matt711 Matt711 added feature request New feature or request non-breaking Non-breaking change labels Dec 18, 2024
Copy link

copy-pr-bot bot commented Dec 18, 2024

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added Python Affects Python cuDF API. pylibcudf Issues specific to the pylibcudf package labels Dec 18, 2024
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

python/pylibcudf/pylibcudf/io/csv.pyx Outdated Show resolved Hide resolved
python/pylibcudf/pylibcudf/io/csv.pyx Show resolved Hide resolved
python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

/ok to test

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we test this? We have a pretty convoluted stream testing framework in libcudf right now. I wonder if there is a cleaner way to do that kind of thing in pylibcudf, or if there is potential for reuse of the custom library.

@@ -20,4 +20,4 @@ cdef class AvroReaderOptionsBuilder:
cpdef AvroReaderOptionsBuilder num_rows(self, size_type num_rows)
cpdef AvroReaderOptions build(self)

cpdef TableWithMetadata read_avro(AvroReaderOptions options)
cpdef TableWithMetadata read_avro(AvroReaderOptions options, Stream stream = *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does = * do in Cython? I don't think I've used this syntax but I have seen it in a few places.

Copy link
Contributor Author

@Matt711 Matt711 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicates the argument is optional: ref

Comment on lines 146 to 147
if stream is None:
stream = Stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to streamline this. Maybe we can use a default besides None, or perhaps we need a cdef stream_or_default(stream) function.

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 18, 2024

How should we test this? We have a pretty convoluted stream testing framework in libcudf right now. I wonder if there is a cleaner way to do that kind of thing in pylibcudf, or if there is potential for reuse of the custom library.

We can create streams from numba and cupy (and default streams). I think we should add tests to each of the IO test modules. And test that the stream param works from cupy, numba, and default streams. Eg.

def test_read_parquet_with_numba_stream():
    import numba
    numba_stream = # create numba stream
    plc.io.parquet.read_parquet(options, Stream(numba_stream))
    ...

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 19, 2024

/ok to test

python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved
python/pylibcudf/pylibcudf/io/json.pyx Outdated Show resolved Hide resolved
@Matt711
Copy link
Contributor Author

Matt711 commented Dec 19, 2024

/ok to test

@Matt711
Copy link
Contributor Author

Matt711 commented Dec 19, 2024

/ok to test

@vyasr
Copy link
Contributor

vyasr commented Dec 20, 2024

We should address rapidsai/rmm#1770 before we merge this PR or anything like it in cudf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants