-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
I'm trying to remove some explicit DataFrame references in delta-rs' codebase to operate directly on Logical Plans instead, so we can stream and materialize later in the process. Most methods on DataFrame are a wrapper around LogicalPlanBuilder, except for the with_columnfunction.
Describe the solution you'd like
I propose that we add a new trait, LogicalPlanBuilderExt, that adds convenience methods with_column (and select, etc) that performs the exact operations that DataFrame is using.
datafusion/datafusion/core/src/dataframe/mod.rs
Lines 2135 to 2187 in d13d891
| pub fn with_column(self, name: &str, expr: Expr) -> Result<DataFrame> { | |
| let window_func_exprs = find_window_exprs([&expr]); | |
| let original_names: HashSet<String> = self | |
| .plan | |
| .schema() | |
| .iter() | |
| .map(|(_, f)| f.name().clone()) | |
| .collect(); | |
| // Maybe build window plan | |
| let plan = if window_func_exprs.is_empty() { | |
| self.plan | |
| } else { | |
| LogicalPlanBuilder::window_plan(self.plan, window_func_exprs)? | |
| }; | |
| let new_column = expr.alias(name); | |
| let mut col_exists = false; | |
| let mut fields: Vec<(Expr, bool)> = plan | |
| .schema() | |
| .iter() | |
| .filter_map(|(qualifier, field)| { | |
| // Skip new fields introduced by window_plan | |
| if !original_names.contains(field.name()) { | |
| return None; | |
| } | |
| if field.name() == name { | |
| col_exists = true; | |
| Some((new_column.clone(), true)) | |
| } else { | |
| let e = col(Column::from((qualifier, field))); | |
| Some((e, self.projection_requires_validation)) | |
| } | |
| }) | |
| .collect(); | |
| if !col_exists { | |
| fields.push((new_column, true)); | |
| } | |
| let project_plan = LogicalPlanBuilder::from(plan) | |
| .project_with_validation(fields)? | |
| .build()?; | |
| Ok(DataFrame { | |
| session_state: self.session_state, | |
| plan: project_plan, | |
| projection_requires_validation: false, | |
| }) | |
| } |
This also helps DataFrame become a loose wrapper around LogicalPlanBuilder and lets the logic live in datafusion_expr.
Describe alternatives you've considered
N/A
Additional context
N/A
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request