Graph Data Science 2.4.0 PREVIEW
Pre-release
Pre-release
·
5481 commits
to master
since this release
Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.
Breaking changes
- Pass
concurrenywhen training a pipeline to the node property steps. Before they were executed with the default concurrency of4if not overridden. This affectsgds.beta.pipeline.linkPrediction.traingds.beta.pipeline.nodeClassification.traingds.alpha.pipeline.nodeClassification.train
New features
- You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.writeby placing them inside a map in the formnodeProperty: 'renamedProperty'. - Added
minCommunitySize|minComponentSizeparameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:gds.wcc.streamgds.louvain.streamgds.labelPropagation.streamgds.beta.k1coloring.[stream|write]gds.beta.leiden.[stream|write]gds.beta.modularityOptimization.[stream|write]gds.alpha.maxkcut.stream
- Added new procedure
gds.alpha.drop.cypherdbto drop created in-memory databases - Added Bellman-Ford algorithm:
gds.bellmanFord.streamgds.bellmanFord.stream.estimategds.bellmanFord.statsgds.bellmanFord.stats.estimategds.bellmanFord.mutategds.bellmanFord.mutate.estimategds.bellmanFord.writegds.bellmanFord.write.estimate
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects
gds.alpha.model.storeandgds.alpha.model.load. - Added
upperDegreeCutoffparameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. - Added
aggregationtogds.beta.toUndirectedto allow the aggregation of the new undirected relationships. - Added new optional parameter
storeModelToDiskthat automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].trainandgds.beta.graphSage.train. - Added K-Core Decomposition algorithm:
gds.kcore.statsgds.kcore.stats.estimategds.kcore.streamgds.kcore.stream.estimategds.kcore.mutategds.kcore.mutate.estimategds.kcore.writegds.kcore.write.estimate
- Added procedure
gds.graph.relationshipProperties.writethat allows writing relationships with multiple properties to Neo4j. - Added new Common Neighbour Aware Random Walk graph sampling algorithm
gds.graph.sample.cnarw. Available underbetatier. - Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alphaqualifier and is now calledgds.graph.project. - The old name
gds.alpha.graph.projectis deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfigandrelationshipConfighave been merged into a singledataConfigparameter. - The
propertiesconfiguration key in this mergeddataConfigparameter has been renamed torelationshipProperties. - The overall projection configuration (e.g.
readConcurrency) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph.
Improvements
- Improve progress tracking for
gds.beta.graphSage.train. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabelsandrelationshipTypesfor procedures supporting memory estimation. - Allow running
gds.debug.sysInfoandgds.debug.arrowto run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetPropertyvalues provided are valid when callinggds.alpha.pipeline.nodeRegression.train. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]which replacegds.alpha.scaleProperties.[stream,mutate]that are now deprecated- The scalers
L1NormandL2Normare not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]return statistics from the performed scale computation - Added new parameter
offsetto thelogscaler. This also affects procedures:gds.pageRankgds.eigenvectorgds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimatefor estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
nullorNaN) are now omitted in the scale computation. Their scale value is set toNaNin the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForestwhich replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForestthat are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve runtime of
gds.alpha.hitsfor concurrency > 1 due to a better partitioning. - Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improve parallel runtime of
gds.beta.graph.project.subgraphwhen filtering relationships due to a better partitioning. - Improve parallel runtime of
gds.beta.pipeline.linkPrediction.predictifsampleRate = 0due to a better partitioning. - Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.