Releases: neo4j/graph-data-science
GDS 1.6 Preview
Release Date: 20 May 2021
GDS 1.6 is compatible with Neo4j 4.0, 4.1, and 4.2 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- Degree centrality has been promoted to the product tier
- Added procedures:
gds.degree.stream.estimategds.degree.write.estimategds.degree.mutategds.degree.mutate.estimategds.degree.statsgds.degree.stats.estimate
- Removed alpha procedures:
gds.alpha.degree.streamGds.alpha.degree.write
- Added procedures:
- Article Rank has been promoted to the product tier
- Added procedures:
gds.articleRank.streamgds.articleRank.stream.estimategds.articleRank.writegds.articleRank.write.estimategds.articleRank.mutategds.articleRank.mutate.estimategds.articleRank.statsgds.articleRank.stats.estimate
- Removed alpha procedures:
gds.alpha.articleRank.streamgds.alpha.articleRank.write
- Added procedures:
- Eigenvector Centrality has been promoted to the product tier
- Added procedures:
gds.eigenvector.streamgds.eigenvector.stream.estimategds.eigenvector.writegds.eigenvector.write.estimategds.eigenvector.mutategds.eigenvector.mutate.estimategds.eigenvector.statsgds.eigenvector.stats.estimate
- Removed alpha procedures:
gds.alpha.eigenvector.streamGds.alpha.eigenvector.write
- Added procedures:
- AStar has been promoted to the product tier
- Added procedures:
gds.astar.streamgds.astar.stream.estimategds.astar.writegds.astar.write.estimategds.astar.mutategds.astar.mutate.estimate
- Removed alpha procedures:
gds.beta.astar.streamgds.beta.astar.stream.estimategds.beta.astar.writegds.beta.astar.write.estimategds.beta.astar.mutategds.beta.astar.mutate.estimate
- The parameter
pathwas removed. The path computation is controlled by the YIELD.
- Added procedures:
- Yens K Shortest Paths has been promoted to the product tier:
- Added procedures:
gds.yens.streamgds.yens.stream.estimategds.yens.writegds.yens.write.estimategds.yens.mutategds.yens.mutate.estimate
- Removed alpha procedures:
gds.beta.yens.streamgds.beta.yens.stream.estimategds.beta.yens.writegds.beta.yens.write.estimategds.beta.yens.mutategds.beta.yens.mutate.estimate
- The parameter
pathwas removed. The path computation is controlled by the cypher YIELD sub-clause.
- Added procedures:
- Dijkstra Source-Target has been promoted to the product tier:
- Added procedures:
gds.shortestPath.dijkstra.streamgds.shortestPath.dijkstra.stream.estimategds.shortestPath.dijkstra.writegds.shortestPath.dijkstra.write.estimategds.shortestPath.dijkstra.mutategds.shortestPath.dijkstra.mutate.estimate
- Removed alpha procedures:
gds.beta.shortestPath.dijkstra.streamgds.beta.shortestPath.dijkstra.stream.estimategds.beta.shortestPath.dijkstra.writegds.beta.shortestPath.dijkstra.write.estimategds.beta.shortestPath.dijkstra.mutategds.beta.shortestPath.dijkstra.mutate.estimate
- The parameter
pathwas removed. The path computation is controlled by the cypher YIELD sub-clause.
- Added procedures:
- Dijkstra Single-Source has been promoted to the product tier:
- Added procedures:
gds.allShortestPath.dijkstra.streamgds.allShortestPath.dijkstra.stream.estimategds.allShortestPath.dijkstra.writegds.allShortestPath.dijkstra.write.estimategds.allShortestPath.dijkstra.mutategds.allShortestPath.dijkstra.mutate.estimate
- Removed alpha procedures:
gds.beta.allShortestPath.dijkstra.streamgds.beta.allShortestPath.dijkstra.stream.estimategds.beta.allShortestPath.dijkstra.writegds.beta.allShortestPath.dijkstra.write.estimategds.beta.allShortestPath.dijkstra.mutategds.beta.allShortestPath.dijkstra.mutate.estimate
- The parameter
pathwas removed. The path computation is controlled by the cypher YIELD sub-clause.
- Added procedures:
- Node2Vec has been promoted to the beta tier
- Added procedures:
gds.beta.node2vec.streamgds.beta.node2vec.stream.estimategds.beta.node2vec.writegds.beta.node2vec.write.estimategds.beta.node2vec.mutategds.beta.node2vec.mutate.estimate
- Removed alpha procedures:
gds.alpha.node2vec.streamgds.alpha.node2vec.write
- Added procedures:
- The parameter
centerSamplingFactoris renamed topositiveSamplingFactor - The parameter
contextSamplingExponentis renamed tonegativeSamplingExponent
maxStreakCountconfiguration parameter is renamed topatience. It is used in the train modes of Node Classification and Link Prediction.maxIterationsandminIterationsconfiguration parameters are renamed tomaxEpochsandminEpochs. It is used in the train modes of Node Classification and Link Prediction.windowSizeconfiguration parameters is removed from the train modes of Node Classification and Link Prediction.
gds.alpha.ml.linkPrediction.train configuration parameter classRatio is renamed to negativeClassWeight. It is also mandatory now.
degreeAsProperty configuration parameter from GraphSAGE
- The same effect can be achieved by using
gds.degree.mutateand use the mutated property as feature for GraphSAGE training. - Important: GraphSAGE models persisted with earlier versions of GDS are not compatible with this version.
New features
- New ScaleProperties procedures to transform and scale node properties. Available scalers: Min-max, Max, Mean, Log, Standard Score, L1 Norm, L2 Norm
gds.alpha.scaleProperties.streamgds.alpha.scaleProperties.mutate
- Added ability to create new in-memory graphs by filtering existing named graphs based on node and relationship properties with new catalog procedure
gds.beta.graph.create.subgraph - Two new centrality algorithms for influence maximization were contributed by community member @xkitsios
gds.alpha.influenceMaximization.celf.streamgds.alpha.influenceMaximization.greedy.stream
- Link Prediction:
- Added support for storing, loading and publishing Link Prediction models.
- Added progress logging for
gds.alpha.ml.linkPrediction.trainandgds.alpha.ml.linkPrediction.predict. - Added write and stream modes to
gds.alpha.ml.linkPrediction.predictgds.alpha.ml.linkPrediction.streamgds.alpha.ml.linkPrediction.write
- Added estimate mode for Link Prediction:
gds.alpha.ml.linkPrediction.train.estimategds.alpha.ml.lin...
1.5.2
Release Date: 11 May 2021
GDS 1.5 is compatible with Neo4j 4.0, 4.1, and 4.2 but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Bug fixes
- Fixed a bug in FastRPExtended concerning implementation internals, especially when propertyDimesion == embeddingDimension output contained NaNs.
- Fixed a bug where Alpha similarity algorithms in some cases could fail on division by 0 when writing results back.
- Fixed an issue where gds.graph.drop could take a long time when the graph contained node embeddings.
- Fixed a bug where gds.beta.graphSage.train was failing in the presence of array properties.
1.5.1
Release Date: 3 March, 2021
GDS 1.5 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Bug fixes
- Fixed a bug which caused
gds.graph.listandgds.graph.dropto throw an error when specifying a graph with duplicate property keys by failing early. - Fixed potential ArrayIndexOutOfBoundsException when running
gds.triangleCounton a relationship-filtered graph. - Fixed a bug that can lead to inconsistencies when writing or mutating new relationships created from a label-filtered graph.
Improvements
- Progress logging: Removed a "disabled" log message from the database startup when GDS was running in its default configuration. It is replaced with a more elaborate "enabled" message when the progress tracking feature is enabled.
- We now return the name of the current database in the error message if graph is not found.
1.5.0
Release Date: 9 February, 2021
GDS 1.5 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- Promote several shortest path algorithms to
betatier: Dijkstra, A*, and Yens k-shortest paths. The APIs have been standardized, and all include the ability to return source/target nodes, nodes traversed, and paths.- This adds procedures
gds.beta.shortestPath.dijkstra.mutategds.beta.shortestPath.dijkstra.mutate.estimategds.beta.shortestPath.dijkstra.streamgds.beta.shortestPath.dijkstra.stream.estimategds.beta.shortestPath.dijkstra.writegds.beta.shortestPath.dijkstra.write.estimategds.beta.shortestPath.astar.mutategds.beta.shortestPath.astar.mutate.estimategds.beta.shortestPath.astar.streamgds.beta.shortestPath.astar.stream.estimategds.beta.shortestPath.astar.writegds.beta.shortestPath.astar.write.estimategds.beta.shortestPath.yens.mutategds.beta.shortestPath.yens.mutate.estimategds.beta.shortestPath.yens.streamgds.beta.shortestPath.yens.stream.estimategds.beta.shortestPath.yens.writegds.beta.shortestPath.yens.write.estimategds.beta.allShortestPaths.dijkstra.mutategds.beta.allShortestPaths.dijkstra.mutate.estimategds.beta.allShortestPaths.dijkstra.streamgds.beta.allShortestPaths.dijkstra.stream.estimategds.beta.allShortestPaths.dijkstra.writegds.beta.allShortestPaths.dijkstra.write.estimate
- And removes alpha procedures
gds.alpha.shortestPath.streamgds.alpha.shortestPath.writegds.alpha.shortestPath.astar.streamgds.alpha.kShortestPaths.streamgds.alpha.kShortestPaths.writegds.alpha.shortestPaths.streamgds.alpha.shortestPaths.write
- This adds procedures
- GDS will now throw an error when a user tries to use a mutate procedure on graphs not stored in the graph catalog (anonymous graphs)
New Features
- Introduced machine learning based multi-class node classification procedures:
- Add
gds.alpha.ml.nodeClassification.trainto train a model to predict a node label - Add
gds.alpha.ml.nodeClassification.predict.mutateto make predictions using a trained model
- Add
- Introduced machine learning based link prediction procedures:
- Add
gds.alpha.linkPrediction.trainprocedure for training Link Prediction models. - Added
gds.alpha.linkPrediction.predict.mutateprocedure for predicting relationships based on a trained Link Prediction model.
- Add
- Added support for list properties as features for
gds.alpha.nodeClassificationgds.beta.fastRPExtendedgds.beta.graphSage
- Added support for storing trained models on disk (Enterprise only)
gds.alpha.model.storegds.alpha.model.loadgds.alpha.model.delete
- Added procedure for publishing trained models (Enterprise only)
gds.alpha.model.publish
- Added HITS algorithm to the alpha tier
gds.alpha.hits.mutateandgds.alpha.hits.mutate.estimategds.alpha.hits.statsandgds.alpha.hits.stats.estimategds.alpha.hits.streamandgds.alpha.hits.stream.estimategds.alpha.hits.writeandgds.alpha.hits.write.estimate
- Added Speaker-Listener Label Propagation Algorithm (SLLPA) to the alpha tier
gds.alpha.sllpa.mutateandgds.alpha.sllpa.mutate.estimategds.alpha.sllpa.statsandgds.alpha.sllpa.stats.estimategds.alpha.sllpa.streamandgds.alpha.sllpa.stream.estimategds.alpha.sllpa.writeandgds.alpha.sllpa.write.estimate
- Added CSV export capabilities with the
gds.beta.graph.export.csvprocedure to allow users to export their in-memory graph to CSV - Add message reducer capability to Pregel framework to improve memory consumption and computation runtime.
- Added a progress logging procedure with
gds.beta.listProgress, to return status of running algorithms. This is turned off by default, but can be enabled withgds.progress_tracking_enabledin the config. - Add a new
BitIdMapdata structure to represent node id mappings (Enterprise only)- The data structure can lead to a significant reduction in required heap space for an in-memory graph.
- The data structure is used for native graph projections and in some algorithms, e.g., Louvain.
- The data structure is not used in Cypher projections.
- The feature is enabled by default on GDS Enterprise Edition and can be disabled using the
USE_BIT_ID_MAPfeature toggle.
Bug fixes
- Adding projection parameters as additional configuration in
gds.graph.createandgds.graph.create.cypherwill throw an exception if improperly configured, instead of being silently ignored. - Fixed a bug in
gds.alpha.articleRankwhere centrality scores were not normalized correctly - Fixed a bug in path stream procedures where the path object (
path: true) used incorrect node identifiers. - Fixed a bug in path write procedures where the relationship property
nodeIdscontained incorrect node identifiers. - Fixed a race condition that could cause exceptions thrown by scheduled tasks to be supressed.
Improvements
- Improved progress logging to write progress per individual node label in
gds.graph.writeNodeProperties. - When a named graph does not exist, the graph catalog will display similarly named stored graphs.
- When a saved model does not exist, the model catalog will display similarly named stored graphs.
- Added
centralityDistributionto the return fields for the write mode of the alpha centrality algorithms. gds.beta.graph.generateusingrelationshipDistribution: 'POWER_LAW'applies the distribution to the native orientation.- Added
centralityDistributionas a return field ingds.betweenness.[write/mutate/stats] - Added
getNeighboursandisMultiGraphto the Pregel-API. - Added new message queue implementations for the Pregel framework, which
- replace the previously used JCTools queue and work with primitive double arrays instead of boxed values.
- lead to 3x to 5x faster runtimes for Pregel based algorithms.
- reduce GC pressure due to less object allocations which leads to more predictable runtimes.
- support synchronous and asynchronous Pregel computations.
Other Changes
- The PageRank configuration parameter
cacheWeightshas been deprecated. The parameter had no effect. - Deprecated
minimumScore, maximumScore, scoreSumreturn fields ingds.betweenness.[write/mutate/stats]
GDS 1.5 Preview
Release date: 29 January, 2021
Warning: This is a preview release and not intended for production use. If you have any feedback, please let us know: https://github.com/neo4j/graph-data-science/issues
GDS 1.5 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- Promote several shortest path algorithms to
betatier: Dijkstra, A*, and Yens k-shortest paths. The APIs have been standardized, and all include the ability to return source/target nodes, nodes traversed, and paths.- This adds procedures
gds.beta.shortestPath.dijkstra.mutategds.beta.shortestPath.dijkstra.mutate.estimategds.beta.shortestPath.dijkstra.streamgds.beta.shortestPath.dijkstra.stream.estimategds.beta.shortestPath.dijkstra.writegds.beta.shortestPath.dijkstra.write.estimategds.beta.shortestPath.astar.mutategds.beta.shortestPath.astar.mutate.estimategds.beta.shortestPath.astar.streamgds.beta.shortestPath.astar.stream.estimategds.beta.shortestPath.astar.writegds.beta.shortestPath.astar.write.estimategds.beta.shortestPath.yens.mutategds.beta.shortestPath.yens.mutate.estimategds.beta.shortestPath.yens.streamgds.beta.shortestPath.yens.stream.estimategds.beta.shortestPath.yens.writegds.beta.shortestPath.yens.write.estimategds.beta.allShortestPaths.dijkstra.mutategds.beta.allShortestPaths.dijkstra.mutate.estimategds.beta.allShortestPaths.dijkstra.streamgds.beta.allShortestPaths.dijkstra.stream.estimategds.beta.allShortestPaths.dijkstra.writegds.beta.allShortestPaths.dijkstra.write.estimate
- And removes alpha procedures
gds.alpha.shortestPath.streamgds.alpha.shortestPath.writegds.alpha.shortestPath.astar.streamgds.alpha.kShortestPaths.streamgds.alpha.kShortestPaths.writegds.alpha.shortestPaths.streamgds.alpha.shortestPaths.write
- This adds procedures
- GDS will now throw an error when a user tries to use a mutate procedure on graphs not stored in the graph catalog (anonymous graphs)
New Features
- Introduced machine learning based multi-class node classification procedures:
- Add
gds.alpha.ml.nodeClassification.trainto train a model to predict a node label - Add
gds.alpha.ml.nodeClassification.predict.mutateto make predictions using a trained model
- Add
- Introduced machine learning based link prediction procedures:
- Add
gds.alpha.linkPrediction.trainprocedure for training Link Prediction models. - Added
gds.alpha.linkPrediction.predict.mutateprocedure for predicting relationships based on a trained Link Prediction model.
- Add
- Added support for list properties as features for
gds.alpha.nodeClassificationgds.beta.fastRPExtendedgds.beta.graphSage
- Added support for storing trained models on disk (Enterprise only)
gds.alpha.model.storegds.alpha.model.loadgds.alpha.model.delete
- Added procedure for publishing trained models (Enterprise only)
Gds.alpha.model.publish
- Added HITS algorithm to the alpha tier
gds.alpha.hits.mutateandgds.alpha.hits.mutate.estimategds.alpha.hits.statsandgds.alpha.hits.stats.estimategds.alpha.hits.streamandgds.alpha.hits.stream.estimategds.alpha.hits.writeandgds.alpha.hits.write.estimate
- Added Speaker-Listener Label Propagation Algorithm (SLLPA) to the alpha tier
gds.alpha.sllpa.mutateandgds.alpha.sllpa.mutate.estimategds.alpha.sllpa.statsandgds.alpha.sllpa.stats.estimategds.alpha.sllpa.streamandgds.alpha.sllpa.stream.estimategds.alpha.sllpa.writeandgds.alpha.sllpa.write.estimate
- Added CSV export capabilities with the
gds.beta.graph.export.csvprocedure to allow users to export their in-memory graph to CSV - Added a progress logging procedure with
gds.beta.listProgress, to return status of running algorithms. This is turned off by default, but can be enabled withgds.progress_tracking_enabledin the config. - Add message reducer capability to Pregel framework to improve memory consumption and computation runtime.
- Add a new
BitIdMapdata structure to represent node id mappings (Enterprise only)- The data structure can lead to a significant reduction in required heap space for an in-memory graph.
- The data structure is used for native graph projections and in some algorithms, e.g., Louvain.
- The data structure is not used in Cypher projections.
- The feature is enabled by default on GDS Enterprise Edition and can be disabled using the
USE_BIT_ID_MAPfeature toggle.
Bug fixes
- Adding projection parameters as additional configuration in
gds.graph.createandgds.graph.create.cypherwill throw an exception if improperly configured, instead of being silently ignored. - Fixed a bug in
gds.alpha.articleRankwhere centrality scores were not normalized correctly
Improvements
- Improved progress logging to write progress per individual node label in
gds.graph.writeNodeProperties. - When a named graph does not exist, the graph catalog will display similarly named stored graphs.
- When a saved model does not exist, the model catalog will display similarly named stored graphs.
- Add
centralityDistributionto the return fields for the write mode of the alpha centrality algorithms. gds.beta.graph.generateusingrelationshipDistribution: 'POWER_LAW'applies the distribution to the native orientation.- Add
getNeighboursandisMultiGraphto the Pregel-API. - Add
centralityDistributionas a return field ingds.betweenness.[write/mutate/stats]
Other Changes
- The PageRank configuration parameter
cacheWeightshas been deprecated. The parameter had no effect. - Deprecate
minimumScore, maximumScore, scoreSumreturn fields ingds.betweenness.[write/mutate/stats]
GDS 1.4.1
Release date: 7 December, 2020
GDS 1.4.1 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Bug fixes
- Fixed a bug in progress logging for
gds.graph.writeNodeProperties()andgds.graph.writeRelationships()where some percentages were missed, or others reported multiple times. - Fixed a bug where
gds.graph.writeNodeProperties()andgds.alpha.shortestPathDeltaStepping.write()were single threaded by default - Fixed a bug where
gds.alpha.node2vecignored relationships for graphs with multiple projected relationship types. - Fixed a bug where
gds.pagerank.*.estimatewould fail for very large node counts. - Fixed a bug where using float array node properties (e.g. after running
gds.fastRP.mutate) would fail in some situations. - Fixed a bug where a graph with multiple labels and all nodes sharing at least one label could lead to either an exception or a wrongly mapped Neo4j id.
Improvements
gds.pageRankwill now select batches more dynamically to properly respect the requested concurrency.
1.3.5
Release date: 23 November, 2020
GDS 1.3.5 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x or 4.2. For a 3.5 compatible release, please see GDS 1.1.6. For a 4.2 compatible release, please see GDS 1.4.0
See also 1.3.0 release notes, 1.3.1 release notes, 1.3.2 release notes, 1.3.3 release notes, and 1.3.4 release notes,
Bug fixes
- Fixed a bug in
gds.graph.exportwhere at most one relationship property per relationship type would be exported. - Fixed a bug in Louvain where changes to
maxIterationswere ignored. - Fixed a bug where
gds.alpha.node2vecwould ignore relationships for graphs with multiple projected relationship types.
GDS 1.4.0
Release date: 5 November, 2020
GDS 1.4.0 is compatible with Neo4j 4.0 and 4.1, but not Neo4j 3.5.x. For a 3.5 compatible release, please see GDS 1.1.6.
Breaking changes
- License key configuration was renamed from
licenseFiletolicense_filefor consistency with Bloom - Removed sparsity parameter from
gds.alpha.randomProjection.* - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Renamed
embeddingSizeparameter toembeddingDimensionfor fastRP, GraphSAGE and Node2Vec. - Renamed
projectedFeatureSizetoprojectedFeatureDimensionfor GraphSAGE - Renamed
nodePropertyNameshas been renamed tofeaturePropertiesingds.beta.fastRPExtendedandgds.beta.graphSage.train - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Default parameters for
gds.fastRPhave changed on the following configuration parameters:iterationWeightsnow has default[0.0, 1.0, 1.0]normalizeL2has been removed and its effect is always applied
- Removed alpha procedures for GraphSage (replaced with
betatier, see New Features section)gds.alpha.graphSage.streamgds.alpha.graphSage.write
- GraphSage no longer directly calculates embeddings, instead it has been split into
train(to generate a named model) andwrite, mutate, andstreamto apply the model predictions to your data. - Due to the creation of a
trainmode for graph sage, the following configuration parameters were moved:embeddingSize- moved as configuration parameter ofgds.beta.graphSage.trainaggregator- moved as configuration parameter ofgds.beta.graphSage.trainactivationFunction- moved as configuration parameter ofgds.beta.graphSage.trainsampleSizes- moved as configuration parameter ofgds.beta.graphSage.trainnodePropertyNames- moved as configuration parameter ofgds.beta.graphSage.traintolerance- moved as configuration parameter ofgds.beta.graphSage.trainlearningRate- moved as configuration parameter ofgds.beta.graphSage.trainepochs- moved as configuration parameter ofgds.beta.graphSage.trainmaxIterations- moved as configuration parameter ofgds.beta.graphSage.trainsearchDepth- moved as configuration parameter ofgds.beta.graphSage.trainnegativeSampleWeight- moved as configuration parameter ofgds.beta.graphSage.traindegreeAsProperty- moved as configuration parameter ofgds.beta.graphSage.train
gds.beta.graphSage.streamprocedure now requiresmodelNameconfiguration parameter.gds.beta.graphSage.writeprocedure requiresmodelNameconfiguration parameter.- Removed
startLossandepochLossesfrom the result columns ofgds.beta.graphSage.write. - Added the graph create config as a return field to the train procedure, affecting
gds.beta.graphSage.train - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Removed configuration parameter
maxCostfromgds.alpha.bfs/dfs. - Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
- Removed
degreeDistributionfromgds.graph.dropreturn columns. gds.pageRanknow respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.- Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.
New features
- Promote GraphSage to
betatier and added support for inductive models with thetrainmode- This adds procedures
gds.beta.graphSage.mutategds.beta.graphSage.mutate.estimategds.beta.graphSage.streamgds.beta.graphSage.stream.estimategds.beta.graphSage.traingds.beta.graphSage.train.estimategds.beta.graphSage.writegds.beta.graphSage.write.estimate
- And removes alpha procedures
gds.alpha.graphSage.streamgds.alpha.graphSage.write
- This adds procedures
- GraphSage supports relationship weights, driven by
relationshipWeightProperty - GraphSage supports node labels via
projectedFeatureSize - Introduced the model catalog to manage trained models, including:
gds.beta.model.exists- a procedure to check if a model exists in the catalogGds.beta.model.list- list all available modelsgds.beta.model.drop- removes a model from the catalog
- The Random Projection algorithm has been promoted to the product tier and we have added:
gds.fastRP.statsgds.fastRP.mutategds.fastRP.estimate- Added procedures for
statsandmutatemode, as well as,estimatesfor all modes.
- FastRP has been extended to support relationship weights and directions
- FastRP supports integer configuration for iteration weights.
- We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
gds.beta.fastRPExtended.mutategds.beta.fastRPExtended.streamgds.beta.fastRPExtended.statsgds.beta.fastRPExtended.writegds.beta.fastRPExtended.mutate.estimategds.beta.fastRPExtended.stream.estimategds.beta.fastRPExtended.stats.estimategds.beta.fastRPExtended.write.estimate
- We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier
gds.beta.knn.mutateandgds.beta.knn.mutate.estimategds.beta.knn.statsandgds.beta.knn.stats.estimategds.beta.knn.streamandgds.beta.knn.stream.estimategds.beta.knn.writeandgds.beta.knn.write.estimate- The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
- Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in
computemethods. - Split context into Init- and ComputeContext and simplified API.
- Removed
K1ColoringExamplestandalone project. - Added
pregel-bootstrapstandalone project. - Added
pregel-examplesmodule.
- Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
- Added
densityproperty to the output of graph ingraph.list. - Added a
failIfMissingflag togds.graph.drop
Bug fixes
- Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
- Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
gds.alpha.closenessgds.alpha.closeness.harmonicgds.alpha.allShortestPaths
- Fixed a bug in
gds.alpha.shortestPath.deltaSteppingwhere large relationship weights led to incorrect results - Weakly connected components:
- Fixed a bug in WCC where
componentCountwould be negative when the graph is empty. - Fixed a regression where WCC could run more slowly with increased concurrency.
- Fixed a bug in WCC where
- Fixed bugs in Louvain:
-
communityCountis no longer negative when the graph is empty. - changes to
maxIterationsare no longer ignored.
-
- Fixed a bug in LabelPropagation where
communityCountwould be negative when the graph is empty. - Fixed a bug in KNN where it failed when run on graphs with filtere...
GDS 1.4 Preview
Breaking changes
- Removed sparsity parameter from
gds.alpha.randomProjection.* - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Renamed
embeddingSizeparameter toembeddingDimensionfor fastRP, GraphSAGE and Node2Vec. - Renamed
gds.alpha.randomProjectiontogds.fastRPdue to productization. - Default parameters for
gds.fastRPhave changed on the following configuration parameters:iterationWeightsnow has default[0.0, 1.0, 1.0]normalizeL2has been removed and its effect is always applied
- Removed alpha procedures for GraphSage (replaced with
betatier, see New Features section)gds.alpha.graphSage.streamgds.alpha.graphSage.write
- GraphSage no longer directly calculates embeddings, instead it has been split into
train(to generate a named model) andwrite, mutate, andstreamto apply the model predictions to your data. - Due to the creation of a
trainmode for graph sage, the following configuration parameters were moved:embeddingSize- moved as configuration parameter ofgds.beta.graphSage.trainaggregator- moved as configuration parameter ofgds.beta.graphSage.trainactivationFunction- moved as configuration parameter ofgds.beta.graphSage.trainsampleSizes- moved as configuration parameter ofgds.beta.graphSage.trainnodePropertyNames- moved as configuration parameter ofgds.beta.graphSage.traintolerance- moved as configuration parameter ofgds.beta.graphSage.trainlearningRate- moved as configuration parameter ofgds.beta.graphSage.trainepochs- moved as configuration parameter ofgds.beta.graphSage.trainmaxIterations- moved as configuration parameter ofgds.beta.graphSage.trainsearchDepth- moved as configuration parameter ofgds.beta.graphSage.trainnegativeSampleWeight- moved as configuration parameter ofgds.beta.graphSage.traindegreeAsProperty- moved as configuration parameter ofgds.beta.graphSage.train
gds.beta.graphSage.streamprocedure now requiresmodelNameconfiguration parameter.gds.beta.graphSage.writeprocedure requiresmodelNameconfiguration parameter.- Removed
startLossandepochLossesfrom the result columns ofgds.beta.graphSage.write. - Added the graph create config as a return field to the train procedure, affecting
gds.beta.graphSage.train - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Removed configuration parameter
maxCostfromgds.alpha.bfs/dfs. - Unlocking the Enterprise Edition of the Graph Data Science library requires a license key. The previous config setting has been removed.
- Removed
degreeDistributionfromgds.graph.dropreturn columns. gds.pageRanknow respects the concurrency setting. It will not run if there is insufficient memory for the given concurrency setting.- Alpha similarity algorithms no longer accept graph name as a parameter. The algorithm never used the named graph, and now the possibility to specify one is removed.
New features
- Promote GraphSage to
betatier and added support for inductive models with thetrainmode- This adds procedures
gds.beta.graphSage.mutategds.beta.graphSage.mutate.estimategds.beta.graphSage.streamgds.beta.graphSage.stream.estimategds.beta.graphSage.traingds.beta.graphSage.train.estimategds.beta.graphSage.writegds.beta.graphSage.write.estimate
- And removes alpha procedures
gds.alpha.graphSage.streamgds.alpha.graphSage.write
- This adds procedures
- GraphSage supports relationship weights, driven by
relationshipWeightProperty - GraphSage supports node labels via
projectedFeatureSize - Introduced the model catalog to manage trained models, including:
gds.beta.model.exists- a procedure to check if a model exists in the catalogGds.beta.model.list- list all available modelsgds.beta.model.drop- removes a model from the catalog
- The Random Projection algorithm has been promoted to the product tier and we have added:
gds.fastRP.statsgds.fastRP.mutategds.fastRP.estimate- Added procedures for
statsandmutatemode, as well as,estimatesfor all modes.
- FastRP has been extended to support relationship weights and directions
- FastRP supports integer configuration for iteration weights.
- We’ve added support for node property features for FastRP in the beta namespace with FastRPExtended:
gds.beta.fastRPExtended.mutategds.beta.fastRPExtended.streamgds.beta.fastRPExtended.statsgds.beta.fastRPExtended.writegds.beta.fastRPExtended.mutate.estimategds.beta.fastRPExtended.stream.estimategds.beta.fastRPExtended.stats.estimategds.beta.fastRPExtended.write.estimate
- We’ve added the K-Nearest Neighbors (KNN) algorithm to the beta tier
gds.beta.knn.mutateandgds.beta.knn.mutate.estimategds.beta.knn.statsandgds.beta.knn.stats.estimategds.beta.knn.streamandgds.beta.knn.stream.estimategds.beta.knn.writeandgds.beta.knn.write.estimate- The in memory graph can now support list properties, enabling embedding results to be stored in memory, or loading embeddings from nodes for KNN or similarity calculations.
- Pregel framework
- Added Pregel annotation processor to generate GDS procedures for custom Pregel algorithms.
- Pregel now supports long and double array node values.
- Add support for composite node state to allow complex data types on nodes.
- Reduced memory consumption.
- Improved memory estimation.
- Simplified message iteration in
computemethods. - Split context into Init- and ComputeContext and simplified API.
- Removed
K1ColoringExamplestandalone project. - Added
pregel-bootstrapstandalone project. - Added
pregel-examplesmodule.
- Licensing: GDS Enterprise edition now requires license keys issued by Neo4j to unlock enterprise features
- Added
densityproperty to the output of graph ingraph.list. - Added a
failIfMissingflag togds.graph.drop
Bug fixes
- Pregel:
- Fixed a bug in Pregel that could lead to incorrect results when running in parallel.
- Fix cast exception when returning array node properties in generated Pregel procedures.
- Fixed a bug in a multi-source BFS traversal strategy that could affect the following procedures:
gds.alpha.closenessgds.alpha.closeness.harmonicgds.alpha.allShortestPaths
- Weakly connected components:
- Fixed a bug in WCC where
componentCountwould be negative when the graph is empty. - Fixed a regression where WCC could run more slowly with increased concurrency.
- Fixed a bug in WCC where
- Fixed bugs in Louvain:
communityCountis no longer negative when the graph is empty.- changes to
maxIterationsare no longer ignored.
- Fixed a bug in LabelPropagation where
communityCountwould be negative when the graph is empty. - Fixed a bug in
gds.graph.exportwhere at most one relationship property per relationship type would be exported. - Graph loading:
- Fixed a bug where using node label projections including properties on large graphs and high concurrency could lead to loss of some properties.
- Fixed bug in graph creation which could cause an AIOOB exception during node loading.
- The
readConcurrencyconfig parameter can no longer be overwritten by theconcurrencyparam when it is explicitly set in an implicit graph creation config
- Fixed a bug in memory estimation of large anonymous fictitious graphs.
- Fixed bug in
gds.alpha.dfs/bfs, where the algorithm did not terminate for graphs containing loops. - Fixed result column name
embeddingstoembeddingin GraphSAGE, to align with the other embeddings. - Fixed a bug in Node2Vec where many disconnected nodes would cause a StackOverflowError
- Fixed a bug in RandomProjection each iteration weight was multiplied all previous iteration weights.
- Similarity algorithms:
- Fixed a bug where Alpha Similarity algorithms would load a graph even though it was not needed
- Fixed a bug where similarity algorithms would not remove the placeholder graph if config validation fails on invalid user input.
- Fixed a bug where community statistic computation could overflow for large community ids.
- Fixed a bug where DegreeCentrality returned incorrect values when concurrency > 1.
- Fixed a bug where ClosenessCentrality was using a slightly incorrect formula for Wasserman-Faust algorithm.
- Fixed a bug that affected
gds.triangleCount()andgds.alpha.triangles()where not all triangles would be counted under certain conditions. - Parallel edges in a graph no longer lead to incorrect Local Clustering Coefficient and Triangle Count results.
Improvements
gds.fastRPnow accepts integer iterationWeights- If
graphSage.trainis run on a graph without relationships, GDS now fails gracefully with an appropriate error message - Added validation that properties used by GraphSage exist on graph
- Added validation for <code>embeddingSize</code>>=1
- Added a failIfExists flag to graph creation to enable a user to specify that if a graph already exists, it should be overwritten without failing.
- Progress logging:
- We now log progress in equally spaced percentages. This is 0-100% either in steps of 1, or in ...
1.4.0-alpha06
Tagging for 1.4.0-alpha06