Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pyspark.pandas to benchmark #246

Open
TjommeVergauwen opened this issue Apr 2, 2022 · 1 comment
Open

Add Pyspark.pandas to benchmark #246

TjommeVergauwen opened this issue Apr 2, 2022 · 1 comment

Comments

@TjommeVergauwen
Copy link

Are there any plans to add Pyspark.pandas to the benchmark?

@jangorecki
Copy link
Contributor

jangorecki commented Apr 2, 2022

Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower?
I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants