Skip to content

Conversation

@SuKi2cn
Copy link

@SuKi2cn SuKi2cn commented Nov 21, 2025

This PR addresses issue #330 by introducing aggregate expressions & execution support.

  • Add aggregate expression family (count / count_null / count_star / max / min)
    with bound/unbound types, visitor and binder support.

  • Add AggregateEvaluator for count/max/min execution over StructLike rows.

  • Expose aggregate factories in Expressions and wire into CMake/Meson builds
    with new aggregate tests.

@SuKi2cn SuKi2cn mentioned this pull request Nov 21, 2025
6 tasks
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! I just scanned through the design and it seems that the current design is not consistent with both C++ and Java impls. What is in my mind is like something below:

template <typename T>
concept TermType = std::derived_from<T, Term>;

template <TermType T>
class ICEBERG_EXPORT Aggregate : public virtual Expression {
 public:
  Expression::Operation op() const override { return operation_; }

  const std::shared_ptr<T>& term() const { return term_; }

 protected:
  Expression::Operation operation_;
  std::shared_ptr<T> term_;
};

class ICEBERG_EXPORT UnboundAggregate : public virtual Expression,
                                        public Unbound<Expression> {
 public:
  Result<std::shared_ptr<Expression>> Bind(const Schema& schema,
                                           bool case_sensitive) const override = 0;

  bool is_unbound_aggregate() const override { return true; }
};

template <typename B>
class ICEBERG_EXPORT UnboundAggregateImpl : public UnboundAggregate,
                                            public Aggregate<UnboundTerm<B>> {
  using BASE = Aggregate<UnboundTerm<B>>;

 public:
  std::shared_ptr<NamedReference> reference() override {
    return BASE::term() ? BASE::term()->reference() : nullptr;
  }

  Result<std::shared_ptr<Expression>> Bind(const Schema& schema,
                                           bool case_sensitive) const override;
};

class ICEBERG_EXPORT BoundAggregate : public Aggregate<BoundTerm>, public Bound {
 public:
  using Aggregate<BoundTerm>::op;
  using Aggregate<BoundTerm>::term;

  std::shared_ptr<BoundReference> reference() override {
    return term_ ? term_->reference() : nullptr;
  }

  Result<Literal> Evaluate(const StructLike& data) const override;

  bool is_bound_aggregate() const override { return true; }

  enum class Kind : int8_t {
    // Count aggregates (COUNT, COUNT_STAR, COUNT_NULL)
    kCount = 0,
    // Value aggregates (MIN, MAX)
    kValue,
  };

  virtual Kind kind() const = 0;
};

class ICEBERG_EXPORT CountAggregate : public BoundAggregate {
 public:
  Result<Literal> Evaluate(const StructLike& data) const override;

  Kind kind() const override { return Kind::kCount; }
};

class ICEBERG_EXPORT ValueAggregate : public BoundAggregate {
 public:
  Result<Literal> Evaluate(const StructLike& data) const override;
  
  Kind kind() const override { return Kind::kValue; }
};

};

/// \brief COUNT aggregate variants.
class ICEBERG_EXPORT CountAggregate : public UnboundAggregate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to define subclass for unbound aggregates. Please check the current design of Predicate in both C++ and Java implementations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! In progress.

@SuKi2cn
Copy link
Author

SuKi2cn commented Nov 21, 2025

I've updated the design to align with the current Predicate pattern and the Java implementation:

  • Removed unbound aggregate subclasses (e.g. CountAggregate on the unbound side).
  • Introduced UnboundAggregateImpl to handle all unbound aggregates, similar to Predicate.
  • Refactored BoundAggregate to be the primary execution unit and moved row-level logic into its subclasses.
  • Updated AggregateEvaluator to act as a driver over bound aggregates instead of owning aggregate semantics.

@SuKi2cn SuKi2cn requested a review from wgtmac November 21, 2025 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants