Skip to content

Commit 1fe5530

Browse files
author
周哲超
committed
docs: readme add toc
1 parent 315a3df commit 1fe5530

File tree

1 file changed

+52
-13
lines changed

1 file changed

+52
-13
lines changed

README.md

Lines changed: 52 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,31 @@
55

66
A library implementing different string similarity, distance and sortMatch measures. A dozen of algorithms (including Levenshtein edit distance and sibblings, Longest Common Subsequence, cosine similarity etc.) are currently implemented. Check the summary table below for the complete list...
77

8-
[TOC]
8+
- [string-comparison](#string-comparison)
9+
- [Download & Usage](#download--usage)
10+
- [OverView](#overview)
11+
- [Normalized, metric, similarity and distance](#normalized-metric-similarity-and-distance)
12+
- [(Normalized) similarity and distance](#normalized-similarity-and-distance)
13+
- [Levenshtein](#levenshtein)
14+
- [Longest Common Subsequence](#longest-common-subsequence)
15+
- [Metric Longest Common Subsequence](#metric-longest-common-subsequence)
16+
- [Cosine similarity](#cosine-similarity)
17+
- [Sorensen-Dice coefficient](#sorensen-dice-coefficient)
18+
- [API](#api)
19+
- [similarity](#similarity)
20+
- [params](#params)
21+
- [return](#return)
22+
- [distance](#distance)
23+
- [params](#params-1)
24+
- [return](#return-1)
25+
- [sortMatch](#sortmatch)
26+
- [params](#params-2)
27+
- [return](#return-2)
28+
- [Params](#params)
29+
- [Return](#return)
30+
- [Release Notes](#release-notes)
31+
- [1.x version](#1x-version)
32+
- [MIT](#mit)
933

1034
## Download & Usage
1135

@@ -33,7 +57,8 @@ console.log(cos.sortMatch(Thanos, Avengers))
3357

3458
```
3559

36-
## OverViews
60+
## OverView
61+
3762
The main characteristics of each implemented algorithm are presented below. The "cost" column gives an estimation of the computational cost to compute the similarity between two strings of length m and n respectively.
3863

3964
| | Measure(s) | Normalized? | Metric? | Type | Cost | Typical usage |
@@ -45,6 +70,7 @@ The main characteristics of each implemented algorithm are presented below. The
4570
| [Jaro-Winkler](https://github.com/luozhouyang/python-string-similarity/blob/master/README.md#jaro-winkler) | similarity distance<br />sortMatch | Yes | No | | O(m*n) | typo correction |
4671

4772
## Normalized, metric, similarity and distance
73+
4874
Although the topic might seem simple, a lot of different algorithms exist to measure text similarity or distance. Therefore the library defines some interfaces to categorize them.
4975

5076
### (Normalized) similarity and distance
@@ -118,6 +144,7 @@ console.log(lcs.sortMatch(Thanos, Avengers))
118144
```
119145

120146
## Metric Longest Common Subsequence
147+
121148
Distance metric based on Longest Common Subsequence, from the notes "An LCS-based string metric" by Daniel Bakkelund.
122149
http://heim.ifi.uio.no/~danielry/StringMetric.pdf
123150

@@ -142,6 +169,7 @@ console.log(mlcs.sortMatch(Thanos, Avengers))
142169
{ member: 'sealed', index: 1, rating: 0.8333333333333334 }
143170
]
144171
```
172+
145173
## Cosine similarity
146174

147175
Like Q-Gram distance, the input strings are first converted into sets of n-grams (sequences of n characters, also called k-shingles), but this time the cardinality of each n-gram is not taken into account. Each input string is simply a set of n-grams. The Jaccard index is then computed as |V1 inter V2| / |V1 union V2|.
@@ -150,45 +178,58 @@ Distance is computed as 1 - similarity.
150178
Jaccard index is a metric distance.
151179

152180
## Sorensen-Dice coefficient
181+
153182
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|).
154183

155184
Distance is computed as 1 - similarity.
156185

157-
158186
## API
159187
* `similarity`.
160188
* `distance`.
161189
* `sortMatch`
162190

163-
### `similarity`
191+
### similarity
164192

165193
Implementing algorithms define a similarity between strings
166194

167-
#### Params
195+
#### params
168196

169197
1. thanos [String]
170198
2. rival [String]
171199

172-
#### Return
200+
#### return
173201

174202
Return a similarity between 0.0 and 1.0
175203

176-
### `distance`
204+
### distance
177205

178206
Implementing algorithms define a distance between strings (0 means strings are identical)
179207

180-
#### Params
208+
#### params
181209

182210
1. thanos [String]
183211
2. rival [String]
184212

185-
#### Return
213+
#### return
186214

187215
Return a number
188216

189-
### `sortMatch`
217+
### sortMatch
218+
219+
#### params
190220

191-
介绍
221+
1. thanos [String]
222+
2. avengers [Array]
223+
224+
#### return
225+
Return an array of objects
226+
```js
227+
[
228+
{ member: 'edward', index: 0, rating: 0.5 },
229+
{ member: 'theatre', index: 2, rating: 0.6153846153846154 },
230+
{ member: 'sealed', index: 1, rating: 0.8333333333333334 }
231+
]
232+
```
192233

193234
#### Params
194235

@@ -220,7 +261,5 @@ Return an array of objects. ex:
220261
* Add function sortMatch()
221262

222263

223-
224-
225264
## MIT
226265
[MIT](./LICENCE)

0 commit comments

Comments
 (0)