Skip to content

Commit c4ae6aa

Browse files
author
Ke Chen
committed
add readme and example inputs
1 parent 4856756 commit c4ae6aa

File tree

4 files changed

+52
-0
lines changed

4 files changed

+52
-0
lines changed

experiments/.github/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
### Code and scripts for reproducing experimental results
2+
3+
- Comparison with Order Min Hash (OMH):
4+
- OMH sketches are computed with `omh_sketch` of [omh_compute-0.0.2](https://github.com/Kingsford-Group/omhismb2019/releases/tag/v0.0.2). To accommodate its input format, `N` pairs of length-`n` strings with edit distance `d` are stored in a fasta file with `2N` records; records labeled `>x_1` and `>x_2` form a pair, see [20mers-ed3.fa](../example_data/20mers-ed3.fa) for an example.
5+
- To generate hash codes with OMH, run `omh_sketch -k5 -l2 -m20 -o 20mers-ed3.omh-out 20mers-ed3.fa`.
6+
- To compute number of pairs that are assigned to a same bucket according to the OMH sketches, run `python countOMHCollisions.py 20mers-ed3.omh-out`.
7+
- Comparison with WFA:
8+
- The [WFA2-lib](https://github.com/smarco/WFA2-lib) library is used to compare the running time with WFA on the barcode experiment.
9+
- Once WFA2-lib is compiled, [pairwise_ed.cpp](../pairwise_ed.cpp) can be compiled (assuming it is inside example/ of the WFA2-lib directory) with `g++ -L../lib -I.. pairwise_ed.cpp -o bin/pairwise_ed.out -lwfacpp -fopenmp -lm`.
10+
- The program takes two files as input and compute pairwise edit distances by WFA between the sequences in the two files. Pairs that have an edit distance at most the provided threshold are output to a file. Two sample input files are provided in [example_data](../example_data).
11+
- For the barcode experiment, run `pairwise_ed.out example_data/whitelist.txt example_data/mismatches.txt 2 output.txt`.
12+
- For the larger-scale pairwise comparison between the mismatched barcodes, run `pairwise_ed.out example_data/mismatches.txt example_data/mismatches.txt 2 output.txt`.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
>1_1
2+
TTAGCATCCGGGTTAGTTGG
3+
>1_2
4+
TTAGCTTCCGGGTTAGTTAC
5+
>2_1
6+
ACACTTCGGCCCCGCCTTTA
7+
>2_2
8+
AGACTTTCGGCCCCCCTTTA
9+
>3_1
10+
ACGAAAGGTACGGAAAGCTA
11+
>3_2
12+
ACGAAAAGTACCGAAATCTA
13+
>4_1
14+
TATATGAACTGGCCTAGGGA
15+
>4_2
16+
TAGATGAACTCGACTAGGGA
17+
>5_1
18+
CTTCTTTCCCGGCGTACATT
19+
>5_2
20+
CTCTTTCCCGTCTGTACATT
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
AAAAAAAAAAAAAAAA
2+
AAAAAAAAAAAAAGCA
3+
AAAAAAAAAAAAGACA
4+
AAAAAAAAAAATAGTA
5+
AAAAAAAAAAGACATT
6+
AAAAAAAAAAGTTTAG
7+
AAAAAAAAACAGAAAG
8+
AAAAAAAAACCAAAAA
9+
AAAAAAAAAGACATTT
10+
AAAAAAAACGAACAGC
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
AGAGCAGTCTGGGCCA
2+
CATTCATAGGACTAAT
3+
TCATTTGAGGTACATA
4+
CACTAAGCACATTCTT
5+
ACACAGTCAGCCGGTT
6+
CGGCAGTCAACTTCTT
7+
TGATCAGCACATGGTT
8+
AGGGTTTAGGCCATAG
9+
CCTAACCGTGGACCTC
10+
ATGAAAGAGAGCATAT

0 commit comments

Comments
 (0)