Knowledge on Neo4j, the native graph database utilizing index-free adjacency (IFA).
- Graph Data Modeling
- Index-Free Adjacency (IFA)
- Definitions
- Neo4j Browser Commands
- Cypher Syntax
- Creating / Updating Nodes
- Creating / Updating Relationships
- Common Cypher Queries
- Merging Data in the Graph
- Using Indexes
- Query Best Practices
- Importing Data
- APOC
Two types of models are required when preforming the graph data modeling process for an application: 1. Data model - describes the labels, relationships, and properties for the graph; does not have specific data. 2. Instance model - set of sample data to test the data model against any number of use cases.
Utilized in graph databases; relationships are stored at write-time in a graph database, meaning the query time will remain consistent to the size of the data that is actually touched during a query.
| Definition | Meaning |
|---|---|
| node | vertices in a data graph; represent objects, entities, or things. |
| node label | identifies the subset a node belongs to. |
| node property | key, value pair of info providing additional context about the node. |
| relationship | line or edge in a data graph; describes how nodes are connected together. |
| relationship direction | identifies the direction of connection between nodes, important in context of hierarchy. |
| relationship type | identifies which part of the graph to traverse. |
| relationship property | key, value pair of info providing additional context about the relationship. |
| undirected graph | graph with bi-directional or symmetric relationships. |
| directed graph | graph with single directional or asymmetrical relationships. |
| weighted graph | graph with relationships that carry a value tha represents a weight or measure. |
| unweighted graph | graph with relationships that do not carry a measurable value. |
| graph traversal | the path or relationships traveled to obtain the graph. |
| Command | Definition |
|---|---|
:clear |
removes all frames from the stream. |
:config |
displays configuration settings. |
:history |
displays history of executed commands. |
:schema |
displays info about the database schema indexes and constraints. |
:sysinfo |
displays info about store size, id allocation, page cache, etc. |
CALL db.schema.visualization() |
Visualizes the data model of the graph, to better understand the nodes, labels, and relationships of the graph. |
CALL db.constraints() or CALL CONSTRAINTS |
Displays the set of constraints that have been defined for the graph. |
CALL db.indexes() or SHOW INDEXES |
Displays a list of all full-text schema indexes for the graph. |
CALL gds.graph.list() |
Displays all created named graphs and their related graph information. |
PROFILE MATCH... |
Runs the Cypher query statement, and provides run-time performance metrics. |
EXPLAIN MATCH... |
Provides the Cypher query plan; provides estimates of the graph engine processing that will occur (doesn't run query). |
| Node Syntax | Meaning |
|---|---|
() or (n) |
use (n) to denote an "anonymous" node for further query processing in the query. |
(n:NodeLabel) or (:NodeLabel) |
node labels can also be used with anonymous nodes and more than one label can be denoted. |
()--() |
2 nodes and any symmetrical relationship. |
()-[:RELATIONSHIP_TYPE]-() |
2 nodes and a specific symmetrical relationship type. |
(first)-[]->(second) |
the first node with any relationship type to the second node. |
(first)<-[]-(second) |
the second node with any relationship type to the first node. |
Create a node using CREATE; this method does not look up the primary key before adding the node and can create duplicates.
CREATE (nodeVariable: NodeLabels {optionalProperties})
// adding a 'hero' node to the graph w/name property
CREATE (h:Hero {name: 'Peter Parker'})Create a node using MERGE; this method looks up the primary key before adding the node, therefore avoiding duplication.
// adding a 'hero' node to the graph if not existing
MERGE (h:Hero {name: 'Peter Parker'})When defining label for a node, dominant nouns are assigned to the entities in the graph (ie. ingredient, movie, person, etc.). Node labels serve as an anchor point for a query; using a label helps reduce the amount of data that is retrieved.
Add/Remove node labels using SET and REMOVE
MATCH (h:Hero)
WHERE h.name = 'Peter Parker'
// adding label to the hero node if not already existing
SET h:Avenger, h:Amateur
// removing label from the hero node
REMOVE h:Amateur
// returns labels associated w/the node
RETURN labels(h)Add node label based on a relationship.
MATCH (h:Hero)
WHERE exists ((h)-[:WON_AGAINST]-())
// adding label to the hero node w/WON_AGAINST relationship
SET h:ChampionProperties for a node uniquely identify a node, provide flags, and return data.
Add/Remove node properties using SET and REMOVE
MATCH (h:Hero)
WHERE h.name = 'Peter Parker'
// adding one property to hero node
SET h.alias = 'Spiderman'
// this method must include all properties/values as all existing properties are overwritten
SET h = {name: 'Peter Parker', alias: 'Spiderman', employment: 'Unemployed'}
// this method will update existing and add properties not existing
SET h += {employment: 'Daily Bugle', symbiote: False}
// removing property from the hero node
REMOVE m.symbiote
// removing property from the hero node by setting to NULL
SET h.employment = null
RETURN hDelete a node using DELETE; this method is successful provided no relationships w/the node exist.
MATCH (h:Hero)
WHERE h.name = 'Peter Parker'
DELETE hDelete a node using DETACH DELETE; this method removes a node w/existing relationships.
MATCH (h:Hero)
WHERE h.name = 'Peter Parker'
DETACH DELETE hRelationships are the connection between entities or nodes. When defining a relationship type, verbs are assigned to the connections in the graph. (ie. USES, MARRIED_TO, FOUGHT_AGAINST).
A direction must either be specified explicitly, or inferred by the left-to-right direction in the pattern specified.
Fanout occurs when entities are spread out and represented as a network or linked nodes. Fanout can lead to supernodes, or very dense nodes; the splitting up of nodes should be done only to answer questions or use cases.
Create a relationship using CREATE; this method does not account for existing relationships and can create duplicates.
MATCH (h:Hero), (v:Villain)
WHERE h.name = 'Peter Parker'
AND v.alias = 'Green Goblin'
// creates the relationship between nodes
CREATE (h)-[:FOUGHT_AGAINST]->(v)
RETURN h, vCreate a relationship using MERGE; this method accounts for existing relationships, therefore avoiding duplication.
MATCH (h:Hero), (v:Villain), (s:Sidekick)
WHERE h.hame = 'Peter Parker'
AND v.alias = 'Green Goblin'
AND s.alias = 'Shockwave'
// creates multiple relationships to the nodes, if not existing
MERGE (h)-[:FOUGHT_AGAINST]->(v)<-[:SIDEKICK_OF]-(s)
RETURN h, v, sAdd/Remove relationship properties using SET and REMOVE
MATCH (h:Hero)-[rel:FOUGHT_AGAINST]->(v:Villain)
WHERE h.name = 'Peter Parker'
AND v.alias = 'Green Goblin'
// adding location and times properties for relationship
SET
rel.location = ['Times Square', 'MSG', 'Avenger Tower'],
rel.times = [2002, 2006, 2010]
// removing property from relationship
REMOVE rel.times
// removing property from relationship by setting to NULL
rel.location = null
RETURN h, rel, v| Command | Definition |
|---|---|
CALL db.propertyKeys() |
Displays all properties of a graph; does not what define which each property key belongs to. |
MATCH (n:Node) RETURN keys(n) |
Displays the properties of a node. |
:params {paramName: 'paramValue, ...'} OR :param paramName => paramValue |
Sets defined parameter and its value in the current session. |
:params |
Displays the current parameters in session and their values |
:queries |
Displays current running queries to allow for monitoring/troubleshooting |
CALL gds.graph.drop(<graph-name>) |
Drops the named graph specified. |
Create/Update properties depending on if a node exists or not using ON CREATE SET and ON MATCH SET
MERGE (h:Hero {name: 'Peter Parker'})
// property is created if node is created
ON CREATE SET h.alias = 'Spiderman'
// property is updated if node is existing
ON MATCH SET h.employed = 'Daily Bugle'
// property is set regardless
SET h.loveInterest = 'Gwen Stacy'
RETURN hCreating a MATCH clause that includes a pattern specified by two paths.
// returning actors and director from movie(s) released in 2000.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie), (m)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
Return a.name as ACTOR, m.title AS TITLE, d.name AS DIRECTOR
// when multi patterns are specified in a MATCH clause, no relationship is traversed more than one time
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
Return a.name as ACTOR, m.title AS TITLE, d.name AS DIRECTORUsing two patterns in a MATCH statement
// matching keanu reeves' movies and hugo weaving node
MATCH (keanu:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(n:Person), (hugo:Person)
WHERE keanu.name = 'Keanu Reeves'
AND hugo.name = 'Hugo Weaving'
AND NOT (hugo)-[:ACTED_IN]->(m) // filtering for movies w/out hugo weaving
RETURN n.name AS actorTraversal using multiple patterns in a MATCH clause
MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie), //pattern retrieves TopGun node
(actor:Person)-[:ACTED_IN]->(m) // pattern retrieves TopGun actor nodes
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name
// the result does not include the Val Kilmer nodeMATCH clause that defines the number of hops for relationship on a path
// retrieves all person nodes that are two hops away
MATCH (follower:Person)-[:FOLLOWS*2]->(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name
// using [:FOLLOWS*] would return all Person nodes that are in the :FOLLOWS path from 'Paul Blythe'The shortestPath() function finds the shortest path between nodes to improve the query.
// specify variable p as result of calling shortestPath() specify * for any relationship between nodes
MATCH p = shortestPath((m1:Movie)-[*]-(m2:Movie))
WHERE m1.title = 'A Few Good Men' AND
m2.title = 'The Matrix'
RETURN pCreating a subgraph, or a set of paths derived from the MATCH clause
MATCH paths = (m:Movie)--(p:Person)
WHERE m.title = 'The Replacements'
RETURN pathsOPTIONAL MATCH (similar to SQL outer join) returns Nulls for missing parts of the pattern
MATCH (p:Person)
WHERE p.name STARTS WITH 'James'
OPTIONAL MATCH (p)-[r:REVIEWED]->(m:Movie) // if 'James' is an actor, null will be returned
RETURN p.name, type(r), m.titleCommon way to aggregate data using count() function
// counts # of movies where actor and director included in same movie
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name, d.name, count(m)The collect() method will aggregate a value into a list.
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Cruise'
RETURN collect(m.title) AS `movies for Tom Cruise` // movies are returned as a list of values
// The collect() method can also be used to collect nodes
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Cruise'
RETURN collect(m) AS `movies for Tom Cruise` // movies returned as nodesCounting the paths found for each actor/director collaboration and returning movies as list.
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(director:Person)
RETURN
actor.name,
director.name,
count(m) AS collaborations, // count of collaboration
collect(m.title) AS movies // list of movies for each actor/director collaborationUsing a map projection when retrieving nodes to return some of the information and not all of it.
MATCH (m:Movie)
WHERE m.title CONTAINS 'Matrix'
RETURN m { .title, .released } AS movie // tagline property is not includedThe date() and datetime() functions store their values as strings, so date properties can be extracted.
// returning properties of today's date; day, year, hour, and minute values
RETURN date().day, date().year, datetime().year, datetime().hour, datetime().minuteThe timestamp() function stores its value as a long integer, requiring conversion before values can be extracted.
// returning the day, year, and, month values from the timestamp value.
RETURN datetime({epochmillis:timestamp()}).day,
datetime({epochmillis:timestamp()}).year,
datetime({epochmillis:timestamp()}).monthUsing the WITH clause allows you to specify intermediate calculations or values that will be used further in the query.
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(m) AS numMovies, collect(m.title) as movies // defined variables in WITH clause can be used later in query
WHERE 1 < numMovies < 4
RETURN a.name, numMovies, moviesUsing the UNWIND clause will provide nodes collected with collect() function to be displayed in rows
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
WITH collect(p) AS actors, count(p) AS actorCount, m
UNWIND actors AS actor // "unwinding" the names of actors that were collected
RETURN m.title, actorCount, actor.nameSubqueries can be produced using the WITH clause, so the second query can access the properties required
// first match clause finds all reviewers of movies
MATCH (m:Movie)<-[rv:REVIEWED]-(r:Person)
WITH m, rv, r
// second match clause finds directors of movies
MATCH (m)<-[:DIRECTED]-(d:Person)
// returns movie, reviewer rating, reviewer, and names of directors
RETURN m.title, rv.rating, r.name, collect(d.name)Subquery with the OPTIONAL MATCH clause will provide additional data from second query if data is available; similar to SQL outer join.
MATCH (p:Person)
WITH p, size((p)-[:ACTED_IN]->()) AS movies
WHERE movies >= 5
// optionally will return values if p (actor) is also found to have directed, else return 'null'
OPTIONAL MATCH (p)-[:DIRECTED]->(m:Movie)
RETURN p.name, m.titleSubqueries can also be achieved using the CALL function with { } around subquery
// first query collects movies that have been reviewed
CALL {
MATCH (p:Person)-[:REVIEWED]->(m:Movie)
RETURN m
}
// second query shows movies (that were reviewed) released in 2000
MATCH (m)
WHERE m.released=2000
RETURN m.title, m.releasedUse the MERGE clause to create/update a node; does take into account if the node is existing.
// generic framework
MERGE (variable:Label{nodeProperties})
RETURN variable
MERGE (a:Actor {name: 'Michael Caine'}) // creates new Actor node if not existing
SET a.born = 1933
RETURN aUse the MERGE clause to create / update a relationship as well.
// generic framework
MERGE (variable1:Label1 {nodeProperties1})-[:REL_TYPE]->(variable2:Label2 {nodeProperties2})
RETURN variable1, variable2
MATCH
(p:Person {name: 'Michael Caine'}),
(m:Movie {title:'Batman Begins'})
MERGE (p)-[:ACTED_IN]->(m) // creates 'ACTED_IN' relationship between 'michael caine' and 'batman begins' if not existing
RETURN p,mSpecify the creation behavior when merging by using the ON CREATE SET / ON MATCH SET clauses.
MERGE (a:Person {name: 'Sir Michael Caine'})
ON CREATE SET
a.birthPlace = 'London', // if node doesn't exist, it will create a new node
a.born = 1934 // with defined 'birthPlace', 'born' node properties
ON MATCH SET
a.birthPlace = 'UK' // if node does exist, it will set/update the 'birthPlace' node property
RETURN aRelationships can be created using the MERGE clause; can be expensive as this has the potential of creating two new nodes and relationship.
MATCH (p:Person), (m:Movie)
WHERE m.title = 'Batman Begins' AND p.name ENDS WITH 'Caine'
MERGE (p)-[:ACTED_IN]->(m) // if relationship doesn't exist, it will create it
RETURN p, mAdd a uniqueness constraint to the graph that asserts that a particular node property is unique for that node type. Uniqueness constraints can be only created for nodes.
// this will prevent another Node being created with the same 'title' node property
CREATE CONSTRAINT UniqueConstraintName ON (v:NodeVariable) ASSERT v.title IS UNIQUEAdd an existence constraint to the graph that asserts that a particular type of node or relationship property must exist in the graph when a node/relationship of that type is created/updated.
// this will prevent a Node being created without the 'tagline' node property being defined
CREATE CONSTRAINT ExistsConstraintName ON (v:NodeVariable) ASSERT exists(v.tagline)An existence constraint can also be used with relationship properties.
// this will prevent a 'REVIEWED' relationship from being created without the 'rating' relationship property being defined
CREATE CONSTRAINT ExistsREVIEWEDRating
ON ()-[rel:REVIEWED]-() ASSERT exists(rel.rating) // defining relationship pattern, asserting rating existsBoth a uniqueness constraint and an existence constraint can be removed using the DROP clause.
DROP CONSTRAINT ExistingConstraintNameA node key is used to define the uniqueness and existence constraint for multiple properties of a node of a certain type. A node key is also used as a composite index in the graph.
CREATE CONSTRAINT UniqueNameBornConstraint
ON (p:Person) ASSERT (p.name, p.born) IS NODE KEY // asserting both name and born existsIndexes are used to improve initial node lookup performance, but they require additional storage in the graph to maintain and also add to the cost of creating or modifying property values that are indexed. Indexes store redundant data that points to nodes with the specific property value or values.
Create a single-property index; used for equality checks, range comparisons, listing membership, string comparisons, etc.
CREATE INDEX MovieReleased FOR (m:Movie) ON (m.released)Create a composite index; used for when there can be duplication for a set of property values and you want faster access to them.
CREATE INDEX MovieReleasedVideoFormat FOR (m:Movie) ON (m.released, m.videoFormat)Both a single-property index and a composite index can be removed using the DROP clause.
DROP INDEX MovieReleasedVideoFormatA full-text schema index is based upon string values only, and can be used for:
- node/relationship properties
- single/multiple properties
- single/multiple types of nodes(labels)
- single/multiple types of relationships
Create a full-text schema index for nodes by using the createNodeIndex() function.
CALL db.index.fulltext.createNodeIndex(
'MovieTitlePersonName', // name of full-text schema index
['Movie', 'Person'], // nodes to be indexed
['title', 'name']) // node properties to be indexedCall the query procedure using the full-text schema index for nodes by using the queryNodes() function.
CALL db.index.fulltext.queryNodes('MovieTitlePersonName', 'Jerry')
YIELD node // YIELD allows access to the property values in the nodes returned
RETURN node // returns nodes in the graph with either a title property or name property containing 'Jerry'Use the AND , OR clauses to query multiple strings in the full-text schema index.
CALL db.index.fulltext.queryNodes('MovieTitlePersonName', 'Jerry OR Matrix') // will find titles/names with either 'Jerry' or 'Matrix' in them
YIELD node // YIELD allows access to the property values in the nodes returned
RETURN node // returns nodes in the graph with either a title property or name property containing 'Jerry'You can specify which property you want to search for in the defined full-text schema index.
CALL db.index.fulltext.queryNodes('MovieTitlePersonName', 'name: Jerry')
YIELD node // returns nodes in the graph with ONLY the name property containing 'Jerry'
RETURN nodeYou can retrieve a "hit score" that represents the closeness of the values in the graph to the query string.
CALL db.index.fulltext.queryNodes('MovieTitlePersonName', 'Matrix')
YIELD node, score // returns a Lucene (index provider) score based upon how much 'Matrix' was part of the title property
RETURN node.title, scoreCreate a full-text schema index for relationships by using the createRelationshipIndex() function.
// creating a full-text schema index for relationship
CALL db.index.fulltext.createRelationshipIndex(
'ActedRoleWriterName', // name of full-text schema index
['ACTED', 'WROTE'], // relationships to be indexed
['role', 'pseudonym']) // relationship properties to be indexedYou can drop a full-text schema index by using the DROP clause.
CALL db.index.fulltext.drop('MovieTitlePersonName')Use parameters (denoted by $) in your queries as a change to a Cypher statement requires recompilation of the code which can be expensive
:param actorName => 'Tom Hanks'MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName
RETURN m.released, m.title ORDER BY m.released DESCYou can set multiple parameters to be used in a query.
:params {actorName: 'Tom Cruise', movieName: 'Top Gun'} // JSON style syntax for multiple parameters is also allowedMATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name = $actorName AND m.title = $movieName
RETURN p, mView the first 10 lines of a file/URL, with headers included and ',' as the delimeter
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/v4.0-intro-neo4j/people.csv'
AS line
RETURN line LIMIT 10You can filter the data prior to import using conditionals
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/v4.0-intro-neo4j/people.csv'
AS line
WITH line WHERE line.birthYear > "1999" // filters csv for rows where conditional is met
RETURN line LIMIT 10Transform field values as needed as LOAD CSV as data is default interpreted as a string or null.
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/v4.0-intro-neo4j/movies1.csv'
AS line
RETURN
toFloat(line.avgVote), // converts field values to FLOAT
line.genres,
toInteger(line.movieId), // converts field values to INTEGER
line.title,
toInteger(line.releaseYear) // converts field values to INTEGER
LIMIT 10You can split lists (delimited with :) using the split() and coalesce() methods
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/v4.0-intro-neo4j/movies1.csv'
AS line
RETURN
toFloat(line.avgVote),
split(coalesce(line.genres,""), ":"), // if some fields have an empty list, use split() with coalesce()
toInteger(line.movieId),
line.title,
toInteger(line.releaseYear)
LIMIT 10Use auto: USING PERIODIC COMMIT N clause to import data over 100k rows; affected by "eager" operators (ie. collect(), count(), ORDER BY, DISTINCT)
:auto USING PERIODIC COMMIT 500 // 500 rows will be uploaded at one time
LOAD CSV WITH HEADERS FROM
'https://data.neo4j.com/v4.0-intro-neo4j/movies1.csv' as row
MERGE (m:Movie {id:toInteger(row.movieId)})
ON CREATE SET // assigning the string csv data to properties of Movie node
m.title = row.title,
m.avgVote = toFloat(row.avgVote),
m.releaseYear = toInteger(row.releaseYear),
m.genres = split(row.genres,":")Importing relationships related to data already imported (Movie data)
LOAD CSV WITH HEADERS FROM
'https://data.neo4j.com/v4.0-intro-neo4j/directors.csv' AS row
MATCH (movie:Movie {id:toInteger(row.movieId)}) // for each row read, find the Movie node
MATCH (person:Person {id: toInteger(row.personId)}) // for each row read, find the Person node
MERGE (person)-[:DIRECTED]->(movie) // create DIRECTED relationship between nodes
ON CREATE SET person:Director // add DIRECTOR label to Person node| Syntax | Meaning |
|---|---|
CALL apoc.meta.graph() |
Retrieves high-level metadata from teh graph to inspect how nodes and relationships are used in the graph |
CALL apoc.meta.stats() |
Retrieves count store data which provides information about the number of nodes/relationships of each type |
Use APOC clear the database of all constraints, indexes, nodes, and relationships; helpful when preforming multiple imports to database during testing.
CALL apoc.schema.assert( {}, {}, true); // removes all constraints and indexes
CALL apoc.periodic.iterate( // removes all nodes and relationships
'MATCH (n) RETURN n',
'DETACH DELETE n',
{batchSize: 500}
)Use APOC to create a conditional when uploading denormalized data using the apoc.do.when() procedure
CREATE CONSTRAINT UniqueMovieIdConstraint ON (m:Movie) ASSERT m.id IS UNIQUE;
CREATE CONSTRAINT UniquePersonIdConstraint ON (p:Person) ASSERT p.id IS UNIQUE;
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/v4.0-intro-neo4j/movies2.csv' AS row
WITH
row.movieId as movieId, // properties to be used for Movie node
row.title AS title,
row.genres AS genres,
toInteger(row.releaseYear) AS releaseYear,
toFloat(row.avgVote) AS avgVote,
collect({ // properties to be used for Person node; being collected to define later in the query
id: row.personId,
name:row.name,
born: toInteger(row.birthYear),
died: toInteger(row.deathYear),personType: row.personType,
roles: split(coalesce(row.characters,""),':')}) AS people
MERGE (m:Movie {id:movieId}) // creating the Movie node, and defining its properties
ON CREATE SET
m.title=title,
m.avgVote=avgVote,
m.releaseYear=releaseYear,
m.genres=split(genres,":")
WITH * // carries all variables forward in the query
UNWIND people AS person // all "peope" collected unwinded as rows
MERGE (p:Person {id: person.id}) // creating the Person node, and defining its properties
ON CREATE SET
p.name = person.name,
p.born = person.born,
p.died = person.died
WITH m, person, p
CALL apoc.do.when(person.personType = 'ACTOR', // personType property used as conditional
"MERGE (p)-[:ACTED_IN {roles: person.roles}]->(m) // create 'ACTED_IN' relationship if personType == 'ACTOR'
ON CREATE SET p:Actor",
"MERGE (p)-[:DIRECTED]->(m) // create 'DIRECTED' relationship if personType != 'ACTOR'
ON CREATE SET p:Director",
{m:m, p:p, person:person}) YIELD value // describes mapping of variables outside/inside of the call; for simplicity the same
RETURN count(*) // certain apoc calls cannot end a cypher query so count(*) is placed at the endUse APOC to import data from a CSV that cannot be uploaded using the LOAD CSV function
CALL apoc.periodic.iterate(
"CALL apoc.load.csv('https://data.neo4j.com/v4.0-intro-neo4j/movies2.csv' ) YIELD map AS row
RETURN row", // using apoc.load.csv() to upload data, returns as row
"WITH
row.movieId as movieId, // properties to be used for Movie node
row.title AS title,
row.genres AS genres,
toInteger(row.releaseYear) AS releaseYear,
toFloat(row.avgVote) AS avgVote,
collect({ // properties to be used for Person node; being collected to define later in the query
id: row.personId,
name:row.name,
born: toInteger(row.birthYear),
died: toInteger(row.deathYear),personType: row.personType,
roles: split(coalesce(row.characters,""),':')}) AS people
MERGE (m:Movie {id:movieId}) // creating the Movie node, and defining its properties
ON CREATE SET
m.title=title,
m.avgVote=avgVote,
m.releaseYear=releaseYear,
m.genres=split(genres,":")
WITH * // carries all variables forward in the query
UNWIND people AS person // all "peope" collected unwinded as rows
MERGE (p:Person {id: person.id}) // creating the Person node, and defining its properties
ON CREATE SET
p.name = person.name,
p.born = person.born,
p.died = person.died
WITH m, person, p
CALL apoc.do.when(person.personType = 'ACTOR', // personType property used as conditional
"MERGE (p)-[:ACTED_IN {roles: person.roles}]->(m) // create 'ACTED_IN' relationship if personType == 'ACTOR'
ON CREATE SET p:Actor",
"MERGE (p)-[:DIRECTED]->(m) // create 'DIRECTED' relationship if personType != 'ACTOR'
ON CREATE SET p:Director",
{m:m, p:p, person:person}) YIELD value // describes mapping of variables outside/inside of the call; for simplicity the same
RETURN count(*)",
{batchsize: 500} // defines the size of the batch
)