Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream? #208

Open
dshorthouse opened this issue Nov 18, 2024 · 5 comments
Open

stream? #208

dshorthouse opened this issue Nov 18, 2024 · 5 comments

Comments

@dshorthouse
Copy link

I see a client connection benchmark for the mysql client to stream a query. Is there something comparable for the trilogy client? I used this via the mysql2 gem to stream millions of records out to csv and was hoping there might be something similar with trilogy.

@brianmario
Copy link
Contributor

The Trilogy C library itself supports "streaming" by design in that you have to read every row in a loop and deal with each of them yourself. But in the Ruby library, the choice was made to buffer the entire result set in memory. If I recall, the original design decision about that was to free server resources ASAP.

Streaming support could technically be added, but it's not there today that I'm aware of.

@dshorthouse
Copy link
Author

Thanks for the response @brianmario. Bummer as this was a great way for me to stream out several extremely large csv files via background processing through sidekiq using mysql2. Buffering the entire result set in memory in my case is the exact opposite of freeing server resources; it kills the server.

@brianmario
Copy link
Contributor

Sorry, I meant freeing resources on the mysql server not the web server or whatever.

But yeah, your use case is exactly why I added streaming support to mysql2 in the first place. I've been pretty busy lately, but I'll see if I can find some time to open a pull request here for that.

If one of the folks with commit access has time, I think we could refactor this block into a read_row method (could even be private) that is used to buffer by default, but is skipped if a stream option is passed. I think having Trilogy#query accept some options after the query param would be the best place.

In mysql2 I made it so you can specify defaults for things like streaming at client initialization and override those per-query. Not sure that complexity is worth adding vs just a named parameter to the query method though...

@brianmario
Copy link
Contributor

I should add, to work around this if you're already using Trilogy is to paginate your query/results so you work through the records in batches instead of one huge set. It's potentially a lot more round-trip traffic, but unless the mysql server is under insane load it should be fine.

@dshorthouse
Copy link
Author

That'd be really awesome to have a named parameter on the query method as a drop-in replacement for this particular use case. I'm aware of the work-around but in my case, I queue up several thousand jobs that each kick off with a heavy join & so streaming out the result set is way more efficient than paginating. So yes, it'd be under insane load 😃.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants