-
Notifications
You must be signed in to change notification settings - Fork 68
feat: add aggregate expressions and evaluator #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7a646ab to
0281987
Compare
wgtmac
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I just scanned through the design and it seems that the current design is not consistent with both C++ and Java impls. What is in my mind is like something below:
template <typename T>
concept TermType = std::derived_from<T, Term>;
template <TermType T>
class ICEBERG_EXPORT Aggregate : public virtual Expression {
public:
Expression::Operation op() const override { return operation_; }
const std::shared_ptr<T>& term() const { return term_; }
protected:
Expression::Operation operation_;
std::shared_ptr<T> term_;
};
class ICEBERG_EXPORT UnboundAggregate : public virtual Expression,
public Unbound<Expression> {
public:
Result<std::shared_ptr<Expression>> Bind(const Schema& schema,
bool case_sensitive) const override = 0;
bool is_unbound_aggregate() const override { return true; }
};
template <typename B>
class ICEBERG_EXPORT UnboundAggregateImpl : public UnboundAggregate,
public Aggregate<UnboundTerm<B>> {
using BASE = Aggregate<UnboundTerm<B>>;
public:
std::shared_ptr<NamedReference> reference() override {
return BASE::term() ? BASE::term()->reference() : nullptr;
}
Result<std::shared_ptr<Expression>> Bind(const Schema& schema,
bool case_sensitive) const override;
};
class ICEBERG_EXPORT BoundAggregate : public Aggregate<BoundTerm>, public Bound {
public:
using Aggregate<BoundTerm>::op;
using Aggregate<BoundTerm>::term;
std::shared_ptr<BoundReference> reference() override {
return term_ ? term_->reference() : nullptr;
}
Result<Literal> Evaluate(const StructLike& data) const override;
bool is_bound_aggregate() const override { return true; }
enum class Kind : int8_t {
// Count aggregates (COUNT, COUNT_STAR, COUNT_NULL)
kCount = 0,
// Value aggregates (MIN, MAX)
kValue,
};
virtual Kind kind() const = 0;
};
class ICEBERG_EXPORT CountAggregate : public BoundAggregate {
public:
Result<Literal> Evaluate(const StructLike& data) const override;
Kind kind() const override { return Kind::kCount; }
};
class ICEBERG_EXPORT ValueAggregate : public BoundAggregate {
public:
Result<Literal> Evaluate(const StructLike& data) const override;
Kind kind() const override { return Kind::kValue; }
};
src/iceberg/expression/aggregate.h
Outdated
| }; | ||
|
|
||
| /// \brief COUNT aggregate variants. | ||
| class ICEBERG_EXPORT CountAggregate : public UnboundAggregate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to define subclass for unbound aggregates. Please check the current design of Predicate in both C++ and Java implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! In progress.
Co-authored-by: Gang Wu <[email protected]>
|
I've updated the design to align with the current Predicate pattern and the Java implementation:
|
This PR addresses issue #330 by introducing aggregate expressions & execution support.
Add aggregate expression family (count / count_null / count_star / max / min)
with bound/unbound types, visitor and binder support.
Add
AggregateEvaluatorfor count/max/min execution overStructLikerows.Expose aggregate factories in
Expressionsand wire into CMake/Meson buildswith new aggregate tests.