Skip to content

Create Pregel and Neo4j Ingeration tutorials and add Pregel API to user guide#578

Open
rjurney wants to merge 89 commits intographframes:mainfrom
rjurney:rjurney/pregel-tutorial
Open

Create Pregel and Neo4j Ingeration tutorials and add Pregel API to user guide#578
rjurney wants to merge 89 commits intographframes:mainfrom
rjurney:rjurney/pregel-tutorial

Conversation

@rjurney
Copy link
Collaborator

@rjurney rjurney commented Apr 15, 2025

What changes were proposed in this pull request?

Why are the changes needed?

…s.txt and split out requirements-dev.txt. Version bumps.
@rjurney
Copy link
Collaborator Author

rjurney commented Apr 18, 2025

I renamed it to Pregel and AggregateMessages API Tutorial.

@SemyonSinchenko
Copy link
Collaborator

@rjurney could u please deploy docs to your fork? It would be much easier to review the rendered one

@rjurney rjurney changed the title Creates a Pregel tutorial and add Pregel API to user guide Create a Pregel tutorial and add Pregel API to user guide Sep 4, 2025
@SemyonSinchenko
Copy link
Collaborator

@rjurney if you need any help with adapting it for Laika, I can help.

@rjurney
Copy link
Collaborator Author

rjurney commented Sep 26, 2025

I think I just need to convert the format?

@SemyonSinchenko
Copy link
Collaborator

I think I just need to convert the format?

You need to move the file to docs/src/03-tutorials
After that try to run ./build/sbt/docs laikaPreview


Before starting this tutorial, ensure you have:

- **GraphFrames installed**: `pip install graphframes-py`
Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ graphframes core

Before starting this tutorial, ensure you have:

- **GraphFrames installed**: `pip install graphframes-py`
- **Apache Spark 3.x**: Compatible with your Python version
Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ spark should be compatible with GF version

</figure>
</center>

## Prerequisites
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would point people to installation section of docs

# Create a GraphFrame to get access to AggregateMessages API
g: GraphFrame = GraphFrame(nodes_df, edges_df)

msgToDst = AM.src["start_degree"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just lit(1)?

g_simple = GraphFrame(vertices_simple, edges_simple)

# Calculate in-degree using Pregel API
pregel_result = g_simple.pregel \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to wrap chains to () instead of using line breaks imo. And in previous snippets you were using ()

# +---+-------+---------+
```

### Understanding the Pregel API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting you will highlight the whole API... Early stopping conditions especially. For example, PageRank may be run for a fixed num of iterations, but also until convergence based on tolerance factor. I was thinking you put it to the end, but you didn't. Can you add a couple of words about different early stopping strategies and maybe a simple example?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F.coalesce(Pregel.msg(), Pregel.src("label"))) \
.sendMsgToDst(Pregel.src("label")) \
.sendMsgToSrc(Pregel.dst("label")) \
.aggMsgs(F.expr("mode(collect_list(msg))")) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not F.mode?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is a small tricky story :)
Deterministic versus not. To pass LDBC tests you need to run mode with deterministic=True (was added in spark 4.x)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that is the reason why GF implementation does not use mode (tldr we will switch to mode after spark 3.5.x EOL)

@rjurney rjurney changed the title Create a Pregel tutorial and add Pregel API to user guide Create Pregel and Neo4j Ingeration tutorials and add Pregel API to user guide Dec 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants