Skip to content

Graph Data Science 2.4.0 PREVIEW

Pre-release
Pre-release

Choose a tag to compare

@Mats-SX Mats-SX released this 02 Jun 13:00
· 5481 commits to master since this release

Neo4j Graph Data Science version 2.4.0 is compatible with Neo4j version 4.4 and Neo4j versions 5.1 through 5.8.

Breaking changes

  • Pass concurreny when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
    • gds.beta.pipeline.linkPrediction.train
    • gds.beta.pipeline.nodeClassification.train
    • gds.alpha.pipeline.nodeClassification.train

New features

  • You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.
  • Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon) This includes:
    • gds.wcc.stream
    • gds.louvain.stream
    • gds.labelPropagation.stream
    • gds.beta.k1coloring.[stream|write]
    • gds.beta.leiden.[stream|write]
    • gds.beta.modularityOptimization.[stream|write]
    • gds.alpha.maxkcut.stream
  • Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases
  • Added Bellman-Ford algorithm:
    • gds.bellmanFord.stream
    • gds.bellmanFord.stream.estimate
    • gds.bellmanFord.stats
    • gds.bellmanFord.stats.estimate
    • gds.bellmanFord.mutate
    • gds.bellmanFord.mutate.estimate
    • gds.bellmanFord.write
    • gds.bellmanFord.write.estimate
  • Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable. This affects gds.alpha.model.store and gds.alpha.model.load.
  • Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.
  • Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.
  • Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.
  • Added K-Core Decomposition algorithm:
    • gds.kcore.stats
    • gds.kcore.stats.estimate
    • gds.kcore.stream
    • gds.kcore.stream.estimate
    • gds.kcore.mutate
    • gds.kcore.mutate.estimate
    • gds.kcore.write
    • gds.kcore.write.estimate
  • Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.
  • Added new Common Neighbour Aware Random Walk graph sampling algorithm gds.graph.sample.cnarw. Available under beta tier.
  • Cypher Aggregation has graduated, which comes with a new name and API changes:
    • The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
      • The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
    • The procedure name is losing the alpha qualifier and is now called gds.graph.project.
    • The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
    • The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
    • The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
    • The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.

Bug fixes

  • Fixed: Arrow server doesn't enable to project graphs with blank names anymore
  • Fixed: Arrow validates dangling relationships when creating an in-memory graph.

Improvements

  • Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
  • Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
  • Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
  • Improve automatic conversion of array property values during graph projection.
  • The Yens algorithm can now be run in parallel.
  • The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
  • The scale properties algorithm has been promoted:
    • Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
      • The scalers L1Norm and L2Norm are not supported in the new procedures.
    • Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
    • Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
    • Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
    • Added new parameter offset to the log scaler. This also affects procedures:
      • gds.pageRank
      • gds.eigenvector
      • gds.articleRank
    • Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
    • Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
  • Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
  • Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
  • Reduced memory allocation for the Spanning Tree algorithm.
  • A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
  • Improve runtime of gds.alpha.hits for concurrency > 1 due to a better partitioning.
  • Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
    • FastRP
    • HashGNN
    • Leiden
    • Approxmaxkcut
    • Conductance
    • LinkPrediction training
    • ToUndirected
  • Improve parallel runtime of gds.beta.graph.project.subgraph when filtering relationships due to a better partitioning.
  • Improve parallel runtime of gds.beta.pipeline.linkPrediction.predict if sampleRate = 0 due to a better partitioning.
  • Improve memory usage when projecting very large graphs with very high degree nodes.
  • Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.