Conversation
core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/org/graphframes/lib/AggregateNeighborsSuite.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/graphframes/lib/AggregateNeighbors.scala
Outdated
Show resolved
Hide resolved
|
Hi @james-willis ! I addressed most of your comments. For some I left my answers. It looks like we are close to a final version. Could be nice if you can take another look, so I can work on bindings and docs. |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #792 +/- ##
==========================================
- Coverage 84.90% 84.28% -0.62%
==========================================
Files 68 69 +1
Lines 3444 3672 +228
Branches 431 459 +28
==========================================
+ Hits 2924 3095 +171
- Misses 520 577 +57 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I will work on tests, bindings and docs. |
|
@james-willis this one is ready for review |
|
Looking at this PR, here are my recommendations: Code Quality & Performance
API Design
Testing & Documentation
Implementation Robustness
Overall, this is a solid implementation that follows GraphFrames patterns well. The main areas for improvement are around performance optimization configuration and edge case handling. The comprehensive test suite and documentation are excellent. Comment by Claude (AI Assistant) |
|
| - Source vertex attributes via ``srcAttr("attrName")`` | ||
| - Destination vertex attributes via ``dstAttr("attrName")`` | ||
| - Edge attributes via ``edgeAttr("attrName")`` |
There was a problem hiding this comment.
I think these are snake case in python, eg src_attr
| builder.setRemoveLoops(remove_loops) | ||
|
|
||
| if checkpoint_interval > 0: | ||
| builder.setCheckpointInterval(checkpoint_interval) |
There was a problem hiding this comment.
if we dont set this wont checkpoint default to 2? how does user disable checkpoints?
I think default should match scala (2) then we pass 0 (ie remove this if)
| * - Maximum number of hops via `setMaxHops()` | ||
| * - Accumulators to maintain state during traversal via `setAccumulators()` or | ||
| * `addAccumulator()` | ||
| * - Stopping conditions to terminate traversal early via `setStoppingConditions()` |
There was a problem hiding this comment.
should be setStoppingCondition
| * `addAccumulator()` | ||
| * - Stopping conditions to terminate traversal early via `setStoppingConditions()` | ||
| * - Target conditions to collect results when specific conditions are met via | ||
| * `setTargetConditions()` |
There was a problem hiding this comment.
should be setTargetCondition
| max_hops=5, | ||
| accumulator_names=["path"], | ||
| accumulator_inits=[F.lit("1")], | ||
| accumulator_updates=[F.concat(F.col("path"), F.lit("->"), F.col("dst_id").cast("string"))], |
There was a problem hiding this comment.
should this example use something like dstAttr("id") instead of F.col("dst_id")?
| .addAccumulator( | ||
| "path", | ||
| lit("1"), | ||
| concat(col("path"), lit("->"), col("dst_id").cast("string")) |
There was a problem hiding this comment.
should this example use something like dstAttr("id") instead of F.col("dst_id")?
| def aggregate_neighbors( | ||
| self, | ||
| starting_vertices: Column | str, | ||
| max_hops: int, |
There was a problem hiding this comment.
I guess this cannot have a default because the subsequent arguments dont have a default? A bit of API asymmetry between scala and python
| remove_loops: bool = False, | ||
| checkpoint_interval: int = 0, | ||
| use_local_checkpoints: bool = False, | ||
| storage_level: StorageLevel = StorageLevel.MEMORY_AND_DISK_DESER, |
There was a problem hiding this comment.
I think scala default to MEMORY_AND_DISK. should this match?
yeah that was it. sgtm. All other answer sgtm on that Claude review. |
What changes were proposed in this pull request?
New API as described in #785
Why are the changes needed?
Close #785