Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance against dask read_sql_table #263

Open
argenisleon opened this issue May 23, 2020 · 4 comments
Open

Performance against dask read_sql_table #263

argenisleon opened this issue May 23, 2020 · 4 comments

Comments

@argenisleon
Copy link

Hi,

Is there any benchmark against pandas or dask?. I am thinking about using turbodbc in https://github.com/ironmussa/optimus to move data from databases to cudf and dask-cudf?

Any idea?

@xhochy
Copy link
Collaborator

xhochy commented May 23, 2020

Have a look at @MathMagique presentation starting at 20:00 http://2017.de.pycon.org/schedule/talks/turbodbc-turbocharged-database-access-for-data-scientists/

The PEP-249 performance should be roughly similar to the pandas.read_sql_table performance. There you can see what the performance differences are. Going to cudf which the Turbodbc Arrow Adapter might be even more efficient as you should be able to avoid the roundtrip through pandas as cudf also uses Arrow as its memory layout.

@dhirschfeld
Copy link

I did a comparison against mssql/sqlalchemy, fetching 1e6 records from a SQL Server database and got a 6x speedup with turbodbc:

image

...and that includes the cost of converting to pandas. Plans are to try and avoid that overhead with fletcher.

@xhochy
Copy link
Collaborator

xhochy commented May 23, 2020

Thanks for doing this @dhirschfeld !

@argenisleon
Copy link
Author

Amazing talk @MathMagique , and thanks for the info @xhochy @dhirschfeld.

Internally dask uses the table index to parallelize the data reading. Any idea on how this could play with turbodbc? Could be any gain in using dask for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants