Skip to content

Commit 1db83bb

Browse files
Josipmrdenmatea16
andauthored
Add knn implementation (#1440)
* Add knn implementation * Update pages/advanced-algorithms/available-algorithms/knn.mdx --------- Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
1 parent 03b8ca8 commit 1db83bb

File tree

2 files changed

+157
-0
lines changed

2 files changed

+157
-0
lines changed

pages/advanced-algorithms/available-algorithms/_meta.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ export default {
2929
"igraphalg": "igraphalg",
3030
"import_util": "import_util",
3131
"json_util": "json_util",
32+
"knn": "knn",
3233
"katz_centrality_online": "katz_centrality_online",
3334
"katz_centrality": "katz_centrality",
3435
"kmeans_clustering": "kmeans_clustering",
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
title: K-nearest neighbours (KNN)
3+
description: Discover Memgraph's k-nearest neighbors algorithm for finding similar nodes based on cosine similarity. Access comprehensive documentation and examples to efficiently identify the most similar nodes in your graph data.
4+
---
5+
6+
import { Callout } from 'nextra/components'
7+
import { Steps } from 'nextra/components'
8+
import { Cards } from 'nextra/components'
9+
import GitHub from '/components/icons/GitHub'
10+
11+
# K-nearest neighbours
12+
13+
The **k-nearest neighbors** algorithm finds the k most similar nodes to each node in the graph based
14+
on cosine similarity between their properties. The algorithm is based on the paper
15+
["Efficient k-nearest neighbor graph construction for generic similarity measures"](https://dl.acm.org/doi/abs/10.1145/1963405.1963487)
16+
and offers efficient parallel computation.
17+
18+
The algorithm calculates cosine similarity between node properties. If multiple properties are specified,
19+
the similarities are averaged to produce a single similarity score. This makes it particularly useful for
20+
finding nodes with similar embeddings, features, or other vector-based properties.
21+
22+
<Cards>
23+
<Cards.Card
24+
icon={<GitHub />}
25+
title="Source code"
26+
href="https://github.com/memgraph/mage/tree/main/cpp/knn_module"
27+
/>
28+
</Cards>
29+
30+
| Trait | Value |
31+
| ------------------- | ------------------- |
32+
| **Module type** | algorithm |
33+
| **Implementation** | C++ |
34+
| **Graph direction** | directed/undirected |
35+
| **Edge weights** | unweighted |
36+
| **Parallelism** | parallel |
37+
38+
## Procedures
39+
40+
<Callout type="info">
41+
You can execute this algorithm on [graph projections, subgraphs or portions of the graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph).
42+
</Callout>
43+
44+
### `get()`
45+
46+
The procedure finds the k most similar neighbors for each node based on cosine similarity between their properties.
47+
48+
{<h4 className="custom-header"> Input: </h4>}
49+
50+
- `subgraph: Graph` (**OPTIONAL**) ➡ A specific subgraph, which is an [object of type Graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph) returned by the `project()` function, on which the algorithm is run.
51+
If subgraph is not specified, the algorithm is computed on the entire graph by default.
52+
53+
- `nodeProperties: string | List[string]` ➡ Property name(s) to calculate similarity on. If multiple properties are provided, similarities will be averaged. This field is required, and the properties must all be of List[double] type.
54+
- `topK: integer` ➡ Number of nearest neighbors to find for each node.
55+
- `similarityCutoff: double (default=0.0)` ➡ Minimum similarity threshold. Neighbors with similarity below this value will not be returned.
56+
- `concurrency: integer (default=1)` ➡ Number of parallel threads to use for computation.
57+
- `maxIterations: integer (default=100)` ➡ Number of iterations algorithm will perform, if not yet converted.
58+
- `sampleRate: double (default=0.5)` ➡ Sampling rate used to introduce new neighbours to respective nodes.
59+
- `deltaThreshold: double (default=0.001)` ➡ Early termination parameter based on the algorithm paper for convergence.
60+
- `randomSeed: integer` ➡ Random seed for deterministic results. If not specified, the seed will be randomly generated.
61+
62+
{<h4 className="custom-header"> Output: </h4>}
63+
64+
- `node` ➡ Source node for which neighbors are found.
65+
- `neighbour` ➡ Neighbor node that is similar to the source node.
66+
- `similarity` ➡ Cosine similarity score between the source node and neighbor (0.0 to 1.0).
67+
68+
{<h4 className="custom-header"> Usage: </h4>}
69+
70+
To find k-nearest neighbors for all nodes in the graph:
71+
72+
```cypher
73+
CALL knn.get({nodeProperties: "embedding", concurrency: 10, topK: 2})
74+
YIELD node, neighbour, similarity
75+
CREATE (node)-[:IS_SIMILAR_TO {similarity: similarity}]->(neighbour);
76+
```
77+
78+
To find k-nearest neighbors on a subgraph:
79+
80+
```cypher
81+
MATCH (n:SpecialNode)
82+
WITH collect(n) as special_nodes
83+
WITH project(special_nodes, []) as subgraph
84+
CALL knn.get(subgraph, {nodeProperties: "embedding", concurrency: 10, topK: 2})
85+
YIELD node, neighbour, similarity
86+
CREATE (node)-[:IS_SIMILAR_TO {similarity: similarity}]->(neighbour);
87+
```
88+
89+
## Example
90+
91+
<Steps>
92+
93+
{<h3 className="custom-header"> Database state </h3>}
94+
95+
The database contains nodes with embedding properties:
96+
97+
```cypher
98+
CREATE (:Node {id: 1, embedding: [0.1, 0.2, 0.3, 0.4]});
99+
CREATE (:Node {id: 2, embedding: [0.15, 0.25, 0.35, 0.45]});
100+
CREATE (:Node {id: 3, embedding: [0.9, 0.8, 0.7, 0.6]});
101+
CREATE (:Node {id: 4, embedding: [0.95, 0.85, 0.75, 0.65]});
102+
CREATE (:Node {id: 5, embedding: [0.2, 0.1, 0.4, 0.3]});
103+
```
104+
105+
{<h3 className="custom-header"> Find k-nearest neighbors </h3>}
106+
107+
Find the 2 most similar neighbors for each node:
108+
109+
```cypher
110+
CALL knn.get({nodeProperties: "embedding", topK: 2, concurrency: 4})
111+
YIELD node, neighbour, similarity
112+
RETURN node.id as source_node, neighbour.id as neighbor_node, similarity
113+
ORDER BY source_node, similarity DESC;
114+
```
115+
116+
Results:
117+
118+
```plaintext
119+
+-------------------------+-------------------------+-------------------------+
120+
| source_node | neighbor_node | similarity |
121+
+-------------------------+-------------------------+-------------------------+
122+
| 1 | 2 | 0.999999 |
123+
| 1 | 5 | 0.894427 |
124+
| 2 | 1 | 0.999999 |
125+
| 2 | 5 | 0.894427 |
126+
| 3 | 4 | 0.999999 |
127+
| 3 | 1 | 0.447214 |
128+
| 4 | 3 | 0.999999 |
129+
| 4 | 2 | 0.447214 |
130+
| 5 | 1 | 0.894427 |
131+
| 5 | 2 | 0.894427 |
132+
+-------------------------+-------------------------+-------------------------+
133+
```
134+
135+
{<h3 className="custom-header"> Create similarity relationships </h3>}
136+
137+
Create relationships between similar nodes:
138+
139+
```cypher
140+
CALL knn.get({nodeProperties: "embedding", topK: 2, similarityCutoff: 0.8})
141+
YIELD node, neighbour, similarity
142+
CREATE (node)-[:SIMILAR_TO {similarity: similarity}]->(neighbour);
143+
```
144+
145+
{<h3 className="custom-header"> Multiple properties example </h3>}
146+
147+
When using multiple properties, similarities are averaged:
148+
149+
```cypher
150+
CALL knn.get({nodeProperties: ["embedding", "features"], topK: 3, concurrency: 8})
151+
YIELD node, neighbour, similarity
152+
RETURN node.id as source_node, neighbour.id as neighbor_node, similarity
153+
ORDER BY source_node, similarity DESC;
154+
```
155+
156+
</Steps>

0 commit comments

Comments
 (0)