Create Pregel and Neo4j Ingeration tutorials and add Pregel API to user guide#578
Create Pregel and Neo4j Ingeration tutorials and add Pregel API to user guide#578rjurney wants to merge 89 commits intographframes:mainfrom
Conversation
…s.txt and split out requirements-dev.txt. Version bumps.
…ney/build-upgrades
…ney/build-upgrades
|
I renamed it to |
…ney/pregel-tutorial
|
@rjurney could u please deploy docs to your fork? It would be much easier to review the rendered one |
|
@rjurney if you need any help with adapting it for Laika, I can help. |
|
I think I just need to convert the format? |
You need to move the file to docs/src/03-tutorials |
|
|
||
| Before starting this tutorial, ensure you have: | ||
|
|
||
| - **GraphFrames installed**: `pip install graphframes-py` |
There was a problem hiding this comment.
+ graphframes core
| Before starting this tutorial, ensure you have: | ||
|
|
||
| - **GraphFrames installed**: `pip install graphframes-py` | ||
| - **Apache Spark 3.x**: Compatible with your Python version |
There was a problem hiding this comment.
+ spark should be compatible with GF version
| </figure> | ||
| </center> | ||
|
|
||
| ## Prerequisites |
There was a problem hiding this comment.
I would point people to installation section of docs
| # Create a GraphFrame to get access to AggregateMessages API | ||
| g: GraphFrame = GraphFrame(nodes_df, edges_df) | ||
|
|
||
| msgToDst = AM.src["start_degree"] |
There was a problem hiding this comment.
Why not just lit(1)?
| g_simple = GraphFrame(vertices_simple, edges_simple) | ||
|
|
||
| # Calculate in-degree using Pregel API | ||
| pregel_result = g_simple.pregel \ |
There was a problem hiding this comment.
It is better to wrap chains to () instead of using line breaks imo. And in previous snippets you were using ()
| # +---+-------+---------+ | ||
| ``` | ||
|
|
||
| ### Understanding the Pregel API |
There was a problem hiding this comment.
I was expecting you will highlight the whole API... Early stopping conditions especially. For example, PageRank may be run for a fixed num of iterations, but also until convergence based on tolerance factor. I was thinking you put it to the end, but you didn't. Can you add a couple of words about different early stopping strategies and maybe a simple example?
There was a problem hiding this comment.
That is the docs section, jfyi: https://graphframes.io/04-user-guide/10-pregel.html#termination-conditions
| F.coalesce(Pregel.msg(), Pregel.src("label"))) \ | ||
| .sendMsgToDst(Pregel.src("label")) \ | ||
| .sendMsgToSrc(Pregel.dst("label")) \ | ||
| .aggMsgs(F.expr("mode(collect_list(msg))")) \ |
There was a problem hiding this comment.
And there is a small tricky story :)
Deterministic versus not. To pass LDBC tests you need to run mode with deterministic=True (was added in spark 4.x)
There was a problem hiding this comment.
Actually that is the reason why GF implementation does not use mode (tldr we will switch to mode after spark 3.5.x EOL)
…y/pregel-tutorial
…mes into rjurney/pregel-tutorial
What changes were proposed in this pull request?
Why are the changes needed?