|
| 1 | +--- |
| 2 | +title: K-nearest neighbours (KNN) |
| 3 | +description: Discover Memgraph's k-nearest neighbors algorithm for finding similar nodes based on cosine similarity. Access comprehensive documentation and examples to efficiently identify the most similar nodes in your graph data. |
| 4 | +--- |
| 5 | + |
| 6 | +import { Callout } from 'nextra/components' |
| 7 | +import { Steps } from 'nextra/components' |
| 8 | +import { Cards } from 'nextra/components' |
| 9 | +import GitHub from '/components/icons/GitHub' |
| 10 | + |
| 11 | +# K-nearest neighbours |
| 12 | + |
| 13 | +The **k-nearest neighbors** algorithm finds the k most similar nodes to each node in the graph based |
| 14 | +on cosine similarity between their properties. The algorithm is based on the paper |
| 15 | +["Efficient k-nearest neighbor graph construction for generic similarity measures"](https://dl.acm.org/doi/abs/10.1145/1963405.1963487) |
| 16 | +and offers efficient parallel computation. |
| 17 | + |
| 18 | +The algorithm calculates cosine similarity between node properties. If multiple properties are specified, |
| 19 | +the similarities are averaged to produce a single similarity score. This makes it particularly useful for |
| 20 | +finding nodes with similar embeddings, features, or other vector-based properties. |
| 21 | + |
| 22 | +<Cards> |
| 23 | + <Cards.Card |
| 24 | + icon={<GitHub />} |
| 25 | + title="Source code" |
| 26 | + href="https://github.com/memgraph/mage/tree/main/cpp/knn_module" |
| 27 | + /> |
| 28 | +</Cards> |
| 29 | + |
| 30 | +| Trait | Value | |
| 31 | +| ------------------- | ------------------- | |
| 32 | +| **Module type** | algorithm | |
| 33 | +| **Implementation** | C++ | |
| 34 | +| **Graph direction** | directed/undirected | |
| 35 | +| **Edge weights** | unweighted | |
| 36 | +| **Parallelism** | parallel | |
| 37 | + |
| 38 | +## Procedures |
| 39 | + |
| 40 | +<Callout type="info"> |
| 41 | +You can execute this algorithm on [graph projections, subgraphs or portions of the graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph). |
| 42 | +</Callout> |
| 43 | + |
| 44 | +### `get()` |
| 45 | + |
| 46 | +The procedure finds the k most similar neighbors for each node based on cosine similarity between their properties. |
| 47 | + |
| 48 | +{<h4 className="custom-header"> Input: </h4>} |
| 49 | + |
| 50 | +- `subgraph: Graph` (**OPTIONAL**) ➡ A specific subgraph, which is an [object of type Graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph) returned by the `project()` function, on which the algorithm is run. |
| 51 | +If subgraph is not specified, the algorithm is computed on the entire graph by default. |
| 52 | + |
| 53 | +- `nodeProperties: string | List[string]` ➡ Property name(s) to calculate similarity on. If multiple properties are provided, similarities will be averaged. This field is required, and the properties must all be of List[double] type. |
| 54 | +- `topK: integer` ➡ Number of nearest neighbors to find for each node. |
| 55 | +- `similarityCutoff: double (default=0.0)` ➡ Minimum similarity threshold. Neighbors with similarity below this value will not be returned. |
| 56 | +- `concurrency: integer (default=1)` ➡ Number of parallel threads to use for computation. |
| 57 | +- `maxIterations: integer (default=100)` ➡ Number of iterations algorithm will perform, if not yet converted. |
| 58 | +- `sampleRate: double (default=0.5)` ➡ Sampling rate used to introduce new neighbours to respective nodes. |
| 59 | +- `deltaThreshold: double (default=0.001)` ➡ Early termination parameter based on the algorithm paper for convergence. |
| 60 | +- `randomSeed: integer` ➡ Random seed for deterministic results. If not specified, the seed will be randomly generated. |
| 61 | + |
| 62 | +{<h4 className="custom-header"> Output: </h4>} |
| 63 | + |
| 64 | +- `node` ➡ Source node for which neighbors are found. |
| 65 | +- `neighbour` ➡ Neighbor node that is similar to the source node. |
| 66 | +- `similarity` ➡ Cosine similarity score between the source node and neighbor (0.0 to 1.0). |
| 67 | + |
| 68 | +{<h4 className="custom-header"> Usage: </h4>} |
| 69 | + |
| 70 | +To find k-nearest neighbors for all nodes in the graph: |
| 71 | + |
| 72 | +```cypher |
| 73 | +CALL knn.get({nodeProperties: "embedding", concurrency: 10, topK: 2}) |
| 74 | +YIELD node, neighbour, similarity |
| 75 | +CREATE (node)-[:IS_SIMILAR_TO {similarity: similarity}]->(neighbour); |
| 76 | +``` |
| 77 | + |
| 78 | +To find k-nearest neighbors on a subgraph: |
| 79 | + |
| 80 | +```cypher |
| 81 | +MATCH (n:SpecialNode) |
| 82 | +WITH collect(n) as special_nodes |
| 83 | +WITH project(special_nodes, []) as subgraph |
| 84 | +CALL knn.get(subgraph, {nodeProperties: "embedding", concurrency: 10, topK: 2}) |
| 85 | +YIELD node, neighbour, similarity |
| 86 | +CREATE (node)-[:IS_SIMILAR_TO {similarity: similarity}]->(neighbour); |
| 87 | +``` |
| 88 | + |
| 89 | +## Example |
| 90 | + |
| 91 | +<Steps> |
| 92 | + |
| 93 | +{<h3 className="custom-header"> Database state </h3>} |
| 94 | + |
| 95 | +The database contains nodes with embedding properties: |
| 96 | + |
| 97 | +```cypher |
| 98 | +CREATE (:Node {id: 1, embedding: [0.1, 0.2, 0.3, 0.4]}); |
| 99 | +CREATE (:Node {id: 2, embedding: [0.15, 0.25, 0.35, 0.45]}); |
| 100 | +CREATE (:Node {id: 3, embedding: [0.9, 0.8, 0.7, 0.6]}); |
| 101 | +CREATE (:Node {id: 4, embedding: [0.95, 0.85, 0.75, 0.65]}); |
| 102 | +CREATE (:Node {id: 5, embedding: [0.2, 0.1, 0.4, 0.3]}); |
| 103 | +``` |
| 104 | + |
| 105 | +{<h3 className="custom-header"> Find k-nearest neighbors </h3>} |
| 106 | + |
| 107 | +Find the 2 most similar neighbors for each node: |
| 108 | + |
| 109 | +```cypher |
| 110 | +CALL knn.get({nodeProperties: "embedding", topK: 2, concurrency: 4}) |
| 111 | +YIELD node, neighbour, similarity |
| 112 | +RETURN node.id as source_node, neighbour.id as neighbor_node, similarity |
| 113 | +ORDER BY source_node, similarity DESC; |
| 114 | +``` |
| 115 | + |
| 116 | +Results: |
| 117 | + |
| 118 | +```plaintext |
| 119 | ++-------------------------+-------------------------+-------------------------+ |
| 120 | +| source_node | neighbor_node | similarity | |
| 121 | ++-------------------------+-------------------------+-------------------------+ |
| 122 | +| 1 | 2 | 0.999999 | |
| 123 | +| 1 | 5 | 0.894427 | |
| 124 | +| 2 | 1 | 0.999999 | |
| 125 | +| 2 | 5 | 0.894427 | |
| 126 | +| 3 | 4 | 0.999999 | |
| 127 | +| 3 | 1 | 0.447214 | |
| 128 | +| 4 | 3 | 0.999999 | |
| 129 | +| 4 | 2 | 0.447214 | |
| 130 | +| 5 | 1 | 0.894427 | |
| 131 | +| 5 | 2 | 0.894427 | |
| 132 | ++-------------------------+-------------------------+-------------------------+ |
| 133 | +``` |
| 134 | + |
| 135 | +{<h3 className="custom-header"> Create similarity relationships </h3>} |
| 136 | + |
| 137 | +Create relationships between similar nodes: |
| 138 | + |
| 139 | +```cypher |
| 140 | +CALL knn.get({nodeProperties: "embedding", topK: 2, similarityCutoff: 0.8}) |
| 141 | +YIELD node, neighbour, similarity |
| 142 | +CREATE (node)-[:SIMILAR_TO {similarity: similarity}]->(neighbour); |
| 143 | +``` |
| 144 | + |
| 145 | +{<h3 className="custom-header"> Multiple properties example </h3>} |
| 146 | + |
| 147 | +When using multiple properties, similarities are averaged: |
| 148 | + |
| 149 | +```cypher |
| 150 | +CALL knn.get({nodeProperties: ["embedding", "features"], topK: 3, concurrency: 8}) |
| 151 | +YIELD node, neighbour, similarity |
| 152 | +RETURN node.id as source_node, neighbour.id as neighbor_node, similarity |
| 153 | +ORDER BY source_node, similarity DESC; |
| 154 | +``` |
| 155 | + |
| 156 | +</Steps> |
0 commit comments