Skip to content

Add missing aggregate functions#1471

Open
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/expose-agg-fns
Open

Add missing aggregate functions#1471
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/expose-agg-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #861
Closes #925
Closes #1454

Rationale for this change

These functions exist upstream but were not exposed to the Python API

What changes are included in this PR?

Expose functions to Python API
Add unit tests

Are there any user-facing changes?

New addition only.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes several upstream DataFusion aggregate functions that were previously missing from the Python API, and adds unit tests to validate the new bindings.

Changes:

  • Expose grouping and percentile_cont from Rust bindings to the Python API.
  • Add Python-level wrapper for var_population as an alias of var_pop.
  • Add unit tests covering percentile_cont, grouping, and var_population.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
crates/core/src/functions.rs Enables the grouping aggregate binding and adds a new percentile_cont pyfunction export.
python/datafusion/functions.py Adds public Python wrappers/exports for grouping, percentile_cont, and var_population.
python/tests/test_functions.py Adds unit tests for the newly exposed aggregate functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

timsaucer and others added 2 commits April 3, 2026 16:04
…ation

Expose upstream DataFusion aggregate functions that were not yet
available in the Python API. Closes apache#1454.

- grouping: returns grouping set membership indicator (rewritten by
  the ResolveGroupingFunction analyzer rule before physical planning)
- percentile_cont: computes exact percentile using continuous
  interpolation (unlike approx_percentile_cont which uses t-digest)
- var_population: alias for var_pop

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@timsaucer timsaucer force-pushed the feat/expose-agg-fns branch from 0c8dc78 to d16cff1 Compare April 3, 2026 20:22
@timsaucer timsaucer marked this pull request as ready for review April 5, 2026 12:38
Copy link
Copy Markdown
Contributor

@ntjohnson1 ntjohnson1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping examples would make it a lot clearer. Without looking around or thinking about the description more deeply the usage isn't clear to me.

Args:
expression: The column to check grouping status for
distinct: If True, compute on distinct values only
filter: If provided, only compute against rows for which the filter is True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No examples

assert result.column(0)[0].as_py() == 3.0


def test_percentile_cont_with_filter():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Would be nicer to just be a parametrized test since the only diff is the filter and the result

assert result.column(0)[0].as_py() == 3.5


def test_grouping():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add more examples then probably don't need to also add tests here but this seems to only test one of the grouping cases

timsaucer and others added 5 commits April 6, 2026 09:00
Add docstring example to grouping(), parametrize percentile_cont tests,
and add multi-column grouping test case.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Expose ROLLUP, CUBE, and GROUPING SETS via the DataFrame API by adding
static methods on GroupingSet that construct the corresponding Expr
variants. Update grouping() docstring and tests to use the new API.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants