Skip to content

Add helper functions from DataFrame to LogicalPlanBuilder #19564

@abhiaagarwal

Description

@abhiaagarwal

Is your feature request related to a problem or challenge?

I'm trying to remove some explicit DataFrame references in delta-rs' codebase to operate directly on Logical Plans instead, so we can stream and materialize later in the process. Most methods on DataFrame are a wrapper around LogicalPlanBuilder, except for the with_columnfunction.

Describe the solution you'd like

I propose that we add a new trait, LogicalPlanBuilderExt, that adds convenience methods with_column (and select, etc) that performs the exact operations that DataFrame is using.

pub fn with_column(self, name: &str, expr: Expr) -> Result<DataFrame> {
let window_func_exprs = find_window_exprs([&expr]);
let original_names: HashSet<String> = self
.plan
.schema()
.iter()
.map(|(_, f)| f.name().clone())
.collect();
// Maybe build window plan
let plan = if window_func_exprs.is_empty() {
self.plan
} else {
LogicalPlanBuilder::window_plan(self.plan, window_func_exprs)?
};
let new_column = expr.alias(name);
let mut col_exists = false;
let mut fields: Vec<(Expr, bool)> = plan
.schema()
.iter()
.filter_map(|(qualifier, field)| {
// Skip new fields introduced by window_plan
if !original_names.contains(field.name()) {
return None;
}
if field.name() == name {
col_exists = true;
Some((new_column.clone(), true))
} else {
let e = col(Column::from((qualifier, field)));
Some((e, self.projection_requires_validation))
}
})
.collect();
if !col_exists {
fields.push((new_column, true));
}
let project_plan = LogicalPlanBuilder::from(plan)
.project_with_validation(fields)?
.build()?;
Ok(DataFrame {
session_state: self.session_state,
plan: project_plan,
projection_requires_validation: false,
})
}

This also helps DataFrame become a loose wrapper around LogicalPlanBuilder and lets the logic live in datafusion_expr.

Describe alternatives you've considered

N/A

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions