feat: aggregate nbrs api by SemyonSinchenko · Pull Request #792 · graphframes/graphframes

SemyonSinchenko · 2026-02-03T21:08:14Z

What changes were proposed in this pull request?

New API as described in #785

Why are the changes needed?

Close #785

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala

core/src/main/scala/org/graphframes/GraphFrame.scala

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala

SemyonSinchenko · 2026-02-08T20:35:11Z

Hi @james-willis ! I addressed most of your comments. For some I left my answers. It looks like we are close to a final version. Could be nice if you can take another look, so I can work on bindings and docs.

see scaladoc issue

codecov-commenter · 2026-02-08T21:07:11Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 70.30303% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.80%. Comparing base (f1db6f4) to head (88001b9).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...park/sql/graphframes/GraphFramesConnectUtils.scala	0.00%	36 Missing ⚠️
...scala/org/graphframes/lib/AggregateNeighbors.scala	89.84%	13 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #792      +/-   ##
==========================================
- Coverage   84.94%   80.80%   -4.15%     
==========================================
  Files          68       78      +10     
  Lines        3507     4428     +921     
  Branches      453      527      +74     
==========================================
+ Hits         2979     3578     +599     
- Misses        528      850     +322

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

SemyonSinchenko · 2026-02-11T10:28:06Z

I will work on tests, bindings and docs.

SemyonSinchenko · 2026-02-26T16:31:01Z

@james-willis this one is ready for review

james-willis · 2026-02-28T20:30:28Z

Looking at this PR, here are my recommendations:

Code Quality & Performance

Memory Management Concerns (AggregateNeighbors.scala:267-268):
The hardcoded warning for maxHops > 10 should be configurable or include memory estimation logic rather than an arbitrary threshold.
Potential Optimization (AggregateNeighbors.scala:303):
The repartition by SRC could be expensive for smaller datasets. Consider making this optional or using coalesce when appropriate.

API Design

Required Parameter Validation:
The API requires either stoppingCondition or targetCondition but this isn't enforced until run(). Consider validation during builder setup for better UX.
Default Accumulator:
Most use cases will want basic path tracking. Consider adding a convenience method for common path accumulation patterns.

Testing & Documentation

Edge Case Testing: Add tests for:
- Very large graphs (performance characteristics)
- Disconnected components
- Graphs with cycles under different stopping conditions
Python API Consistency:
Ensure Python parameter names follow conventions consistently.

Implementation Robustness

Convergence Logic (AggregateNeighbors.scala:400):
Currently only checks if frontier is empty, but doesn't handle cases where states exist but no new edges can be traversed.

Overall, this is a solid implementation that follows GraphFrames patterns well. The main areas for improvement are around performance optimization configuration and edge case handling. The comprehensive test suite and documentation are excellent.

Comment by Claude (AI Assistant)

SemyonSinchenko · 2026-03-02T11:02:05Z

@james-willis

It is a warning, I don't understand how the warning can be configurable.
That is false. The join mean repartition by src anyway, so even on a single step it is better to keep the repartition
It is impossible to achieve, because we do not know the order of setter-calls
I do not understand, what does it mean? If it is about adding implementations, I'm going to do it in follow-up PRs. I'm going to add at least allPaths to find all paths between two nodes
I'm not going to add huge graphs (performance testing) to unit tests. All tests are 10+ minutes already, do we want to have 30 minutes tests?
All the Python API follows the snake_case way
In this case the DataFrame will be empty, so I don't fully understand the comment...

python/graphframes/graphframe.py

python/graphframes/classic/graphframe.py

core/src/main/scala/org/graphframes/GraphFrame.scala

docs/src/04-user-guide/05-traversals.md

python/graphframes/classic/graphframe.py

james-willis · 2026-03-10T21:21:19Z

If it is about adding implementations, I'm going to do it in follow-up PRs. I'm going to add at least allPaths to find all paths between two nodes

yeah that was it. sgtm.

All other answer sgtm on that Claude review.

…andom walks, and PG changes Resolved merge conflicts and integrated updates from graphframes/main into branch 785-aggregate-nbrs. Key changes: - Added sampling and convolution primitives (KMinSampling, SamplingConvolution) and related tests. - Introduced embeddings and random-walk features: Hash2Vec, RandomWalkEmbeddings, RandomWalkBase, RandomWalkWithRestart, and example runners. - Added new library components and algorithms: TwoPhase, Updated ConnectedComponents/KCore/Pregel/RandomizedContraction logic. - Added benchmarks and benchmark infrastructure for various algorithms. - Python API improvements: new property-graph package (python/graphframes/pg), internal utilities, updated client and protobuf bindings. - Build, docs, and packaging updates: build.sbt, docs (including graph-ml page), NOTICE, AGENTS.md, and pre-commit config. - Updated Spark shims and connect utilities to support the new features. - Added and updated tests across Scala and Python to cover the new functionality. All conflicts were fixed and changes staged for commit.

Copilot

Pull request overview

Adds a new multi-hop traversal API (“AggregateNeighbors”) to GraphFrames, implemented in Scala core with Spark Connect support and exposed via Python (classic + connect), along with documentation and tests to close #785.

Changes:

Introduces org.graphframes.lib.AggregateNeighbors (Scala) and wires it into GraphFrame.aggregateNeighbors.
Extends Spark Connect protocol + server/client plumbing to invoke AggregateNeighbors remotely.
Adds Python API wrappers (plus helper for attribute references), docs, and unit tests.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala`	New Scala implementation + builder-style API and attribute helpers (`srcAttr`/`dstAttr`/`edgeAttr`).
`core/src/main/scala/org/graphframes/GraphFrame.scala`	Exposes `aggregateNeighbors` entrypoint on `GraphFrame`.
`core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala`	Scala unit tests for traversal behavior (paths, filters, stopping, etc.).
`connect/src/main/protobuf/graphframes.proto`	Adds `AggregateNeighbors` message and oneof method field.
`connect/src/main/scala/.../GraphFramesConnectUtils.scala`	Server-side Connect planner support for AggregateNeighbors.
`python/graphframes/connect/graphframes_client.py`	Python Spark Connect client method building the protobuf plan.
`python/graphframes/connect/proto/graphframes_pb2.py` / `.pyi`	Regenerated protobuf bindings including AggregateNeighbors message.
`python/graphframes/graphframe.py`	Public Python `GraphFrame.aggregate_neighbors` + `AggregateNeighbors` helper class; docstring for API.
`python/graphframes/classic/graphframe.py`	Classic (Py4J) implementation of `aggregate_neighbors` calling JVM builder.
`python/tests/test_graphframes.py`	Adds Python tests for basic AggregateNeighbors scenarios.
`docs/src/04-user-guide/05-traversals.md`	User guide section documenting Aggregate Neighbors with examples.
`project/LaikaCustoms.scala`	Doc site navigation tweak (page navigation depth).
`build.sbt`	Adjusts protoc version selection for Spark 3 builds.
`python/dev/build_jar.py`	Updates default Spark version used by the dev jar build script.
`.gitignore`	Ignores additional AI-tool related files/directories.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/graphframes/classic/graphframe.py

python/tests/test_graphframes.py

docs/src/04-user-guide/05-traversals.md

python/graphframes/graphframe.py

docs/src/04-user-guide/05-traversals.md

SemyonSinchenko added 6 commits February 2, 2026 21:22

wip

f507233

Merge remote-tracking branch 'graphframes/main' into 785-aggregate-nbrs

25d8e99

wip

d0c0f09

wip

cc2e0f1

wip

f333462

working API

c9e9f63

SemyonSinchenko self-assigned this Feb 3, 2026

SemyonSinchenko added scala pyspark-classic GraphFrames on PySpark Classic pyspark-connect GraphFrames on PySpark Connect labels Feb 3, 2026

SemyonSinchenko mentioned this pull request Feb 3, 2026

feat: all paths between two nodes #200

Open

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala Outdated Show resolved Hide resolved

james-willis reviewed Feb 4, 2026

View reviewed changes

core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala Show resolved Hide resolved

james-willis requested changes Feb 4, 2026

View reviewed changes

SemyonSinchenko added 2 commits February 8, 2026 21:29

from comments

ca1e80a

from comments 2

8b62eb5

SemyonSinchenko requested a review from james-willis February 8, 2026 20:34

fix docs

7e1f9e6

see scaladoc issue

SemyonSinchenko added 3 commits February 26, 2026 13:53

Merge remote-tracking branch 'graphframes/main' into 785-aggregate-nbrs

b618d8e

feat: add AggregateNeighbors functionality with comprehensive test suite

e67285d

feat: add Aggregate Neighbors algorithm with multi-hop traversal support

929a28f

SemyonSinchenko marked this pull request as ready for review February 26, 2026 15:49

test: remove redundant error handling test for aggregate_neighbors

0118cb9

james-willis reviewed Mar 10, 2026

View reviewed changes

SemyonSinchenko added 2 commits March 26, 2026 21:26

fix: James' comments

88001b9

SemyonSinchenko requested review from Copilot and james-willis March 26, 2026 21:12

Copilot started reviewing on behalf of SemyonSinchenko March 26, 2026 21:18 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

SemyonSinchenko added 2 commits March 27, 2026 17:48

fix: copilot comments

ae55a86

fix: typo

b7f1be0

james-willis approved these changes Mar 27, 2026

View reviewed changes

SemyonSinchenko merged commit 82f600b into graphframes:main Mar 27, 2026
7 checks passed

SemyonSinchenko deleted the 785-aggregate-nbrs branch March 27, 2026 20:44

Conversation

SemyonSinchenko commented Feb 3, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SemyonSinchenko commented Feb 8, 2026

Uh oh!

codecov-commenter commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

SemyonSinchenko commented Feb 11, 2026

Uh oh!

SemyonSinchenko commented Feb 26, 2026

Uh oh!

james-willis commented Feb 28, 2026

Code Quality & Performance

API Design

Testing & Documentation

Implementation Robustness

Uh oh!

SemyonSinchenko commented Mar 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

james-willis commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Feb 8, 2026 •

edited

Loading

james-willis commented Mar 10, 2026 •

edited

Loading