Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to close #1453 by exposing several DataFusion scalar functions that exist upstream but were not previously available in the Python API, along with adding Python unit tests for the new bindings.
Changes:
- Added Python wrappers and exports for
arrow_metadata,get_field,union_extract,union_tag,version, plus a Python-levelrowalias forstruct. - Added unit tests covering the newly exposed functions (notably union functions and
version). - Updated
codespellskip paths formatting inpyproject.toml.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
python/datafusion/functions.py |
Adds new Python-level function wrappers/exports (arrow_metadata, get_field, union_*, version, row). |
crates/core/src/functions.rs |
Exposes new functions from the Rust extension module to Python via pyo3 (arrow_metadata, get_field, union_extract, union_tag, version). |
python/tests/test_functions.py |
Adds tests for the newly exposed functions. |
pyproject.toml |
Normalizes codespell skip path entries (removes ./ prefixes). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
192593f to
2771621
Compare
…row_metadata, version, row Expose upstream DataFusion scalar functions that were not yet available in the Python API. Closes apache#1453. - get_field: extracts a field from a struct or map by name - union_extract: extracts a value from a union type by field name - union_tag: returns the active field name of a union type - arrow_metadata: returns Arrow field metadata (all or by key) - version: returns the DataFusion version string - row: alias for the struct constructor Note: arrow_try_cast was listed in the issue but does not exist in DataFusion 53, so it is not included. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Tests for get_field, arrow_metadata, version, row, union_tag, and union_extract. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Allow arrow_cast, get_field, and union_extract to accept plain str arguments instead of requiring Expr wrappers. Also improve arrow_metadata test coverage and fix parameter shadowing. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
4384c1f to
df1ead1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ntjohnson1
left a comment
There was a problem hiding this comment.
Claude didn't do as good a job maintaining existing structure as the last one. Not sure how pedantic we want to be about some of the formatting stuff since there isn't a ruff rule around it. A copilot setting or custom lint rule could help enforce if desired
| import numpy as np | ||
| import pyarrow as pa | ||
| import pytest | ||
| from datafusion import SessionContext, column, literal, string_literal |
There was a problem hiding this comment.
Love that this is no longer needed
Replace Args/Returns sections with doctest Examples blocks for arrow_metadata, get_field, union_extract, union_tag, and version to match existing codebase conventions. Simplify row to alias-style docstring with See Also reference. Document that arrow_cast accepts both str and Expr for data_type. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Allow arrow_cast to accept a pyarrow DataType in addition to str and Expr. The DataType is converted to its string representation before being passed to DataFusion. Adds test coverage for the new input type. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Note that expr["field"] is a convenient alternative when the field name is a static string, and get_field is needed for dynamic expressions. Add a second doctest example showing the bracket syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Use the existing Rust-side PyArrowType<DataType> conversion via Expr.cast() instead of str() which produces pyarrow type names that DataFusion does not recognize. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Thanks @ntjohnson1 ! |
Which issue does this PR close?
Closes #1453
Rationale for this change
These functions exist upstream but were not exposed to Python.
What changes are included in this PR?
Expose functions to Python
Add unit testss
Are there any user-facing changes?
New addition only.