Skip to content

Miscellaneous ArrayData Cleanup #5612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 16, 2023
Merged

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate physical-expr Changes to the physical-expr crates labels Mar 15, 2023
let rows = self
.aggregate(
vec![],
vec![datafusion_expr::count(Expr::Literal(ScalarValue::Null))],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was working by accident, as the null count for NullArray was computed incorrectly

"Unexpected output when collecting for count()".to_string(),
))? as usize;
Ok(len)
let mut stream = self.execute_stream().await?;
Copy link
Contributor Author

@tustvold tustvold Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a better way to express this, in particular I'm not sure what select count(*) from table would compile down to, I can't help feeling there should be a better way to express this for the optimiser

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could keep the old behaviour and use datafusion_exr::utils::COUNT_STAR_EXPANSION instead of the ScalarValue::Null, since count(*) is expanded to that in SQL planner (related bug #5518 prevents directly using Count(Expr:Wildcard) at the moment)

The reason I had it that way was to let the count optimize out any column projections to only the minimal needed, which I think theoretically is faster than just executing the plan as is and counting the rows

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fix to DataFrame count 👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great -- thank you @tustvold and @Jefffrey

@alamb alamb merged commit e92fc5a into apache:main Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants