Release Graph Data Science 2.4.0 PREVIEW · neo4j/graph-data-science

Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.

Breaking changes

Pass concurreny when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
- gds.beta.pipeline.linkPrediction.train
- gds.beta.pipeline.nodeClassification.train
- gds.alpha.pipeline.nodeClassification.train

New features

You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:
- gds.wcc.stream
- gds.louvain.stream
- gds.labelPropagation.stream
- gds.beta.k1coloring.[stream|write]
- gds.beta.leiden.[stream|write]
- gds.beta.modularityOptimization.[stream|write]
- gds.alpha.maxkcut.stream
Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
Added Bellman-Ford algorithm:
- gds.bellmanFord.stream
- gds.bellmanFord.stream.estimate
- gds.bellmanFord.stats
- gds.bellmanFord.stats.estimate
- gds.bellmanFord.mutate
- gds.bellmanFord.mutate.estimate
- gds.bellmanFord.write
- gds.bellmanFord.write.estimate
Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects gds.alpha.model.store and gds.alpha.model.load.
Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
Added K-Core Decomposition algorithm:
- gds.kcore.stats
- gds.kcore.stats.estimate
- gds.kcore.stream
- gds.kcore.stream.estimate
- gds.kcore.mutate
- gds.kcore.mutate.estimate
- gds.kcore.write
- gds.kcore.write.estimate
Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
Added new Common Neighbour Aware Random Walk graph sampling algorithm gds.graph.sample.cnarw. Available under beta tier.
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
  - The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The procedure name is losing the alpha qualifier and is now called gds.graph.project.
- The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
- The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
- The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
- The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.

Bug fixes

Fixed: Arrow server doesn't enable to project graphs with blank names anymore
Fixed: Arrow validates dangling relationships when creating an in-memory graph.

Improvements

Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
Improve automatic conversion of array property values during graph projection.
The Yens algorithm can now be run in parallel.
The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
The scale properties algorithm has been promoted:
- Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
  - The scalers L1Norm and L2Norm are not supported in the new procedures.
- Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
- Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
- Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
- Added new parameter offset to the log scaler. This also affects procedures:
  - gds.pageRank
  - gds.eigenvector
  - gds.articleRank
- Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
- Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
Reduced memory allocation for the Spanning Tree algorithm.
A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
Improve runtime of gds.alpha.hits for concurrency > 1 due to a better partitioning.
Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
Improve parallel runtime of gds.beta.graph.project.subgraph when filtering relationships due to a better partitioning.
Improve parallel runtime of gds.beta.pipeline.linkPrediction.predict if sampleRate = 0 due to a better partitioning.
Improve memory usage when projecting very large graphs with very high degree nodes.
Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph Data Science 2.4.0 PREVIEW

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Breaking changes

New features

Bug fixes

Improvements

Contributors

Uh oh!