|
1 | | -# test-datasets: `circdna` |
| 1 | +#  |
| 2 | +Test data to be used for automated testing with the nf-core pipelines |
2 | 3 |
|
3 | | -This branch contains test data to be used for automated testing with the [nf-core/circdna](https://github.com/nf-core/circdna) pipeline. |
| 4 | +## Introduction |
4 | 5 |
|
5 | | -## Content of this repository |
| 6 | +nf-core is a collection of high quality Nextflow pipelines. This repository contains various files for CI and unit testing of nf-core pipelines and infrastructure. |
6 | 7 |
|
7 | | -`reference/`: Genome reference files (iGenomes R64-1-1 Ensembl release) |
| 8 | +The principle for nf-core test data is as small as possible, as large as necessary. Always ask for guidance on the [nf-core slack](https://nf-co.re/join) before adding new test data. |
8 | 9 |
|
9 | | -`testdata/` : 200,000 FastQ paired-end reads |
| 10 | +## Documentation |
10 | 11 |
|
11 | | -## Minimal test dataset origin |
12 | | -The data set was generated using Circle-Map Simulate (see [Circle-Map](https://github.com/iprada/Circle-Map) and InSilicoSeq (see [InSilicoSeq](https://github.com/HadrienG/InSilicoSeq). Circle-Map simulated 120,000 paired-end reads originated from circle-seq data and InSilicoSeq simulated 80,000 random reads from the reference genome. |
| 12 | +nf-core/test-datasets comes with documentation in the `docs/` directory: |
13 | 13 |
|
14 | | -### Data Generation |
| 14 | +01. [Add a new test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/ADD_NEW_DATA.md) |
| 15 | +02. [Use an existing test dataset](https://github.com/nf-core/test-datasets/blob/master/docs/USE_EXISTING_DATA.md) |
15 | 16 |
|
16 | | -The example below was used to generate the raw paired-end FastQ files. |
| 17 | +## Downloading test data |
17 | 18 |
|
18 | | -``` bash |
19 | | -Circle-Map Simulate -c 200 -g genome.fa -N 120000 -r 150 -b cm_1 -p 10 |
20 | | -Circle-Map Simulate -c 200 -g genome.fa -N 120000 -r 150 -b cm_2 -p 10 |
21 | | -Circle-Map Simulate -c 200 -g genome.fa -N 120000 -r 150 -b cm_3 -p 10 |
22 | | -wgsim -1 150 -2 150 -N 80000 genome.fa wgsim_1_R1.fastq wgsim_1_R2.fastq -S 1 |
23 | | -wgsim -1 150 -2 150 -N 80000 genome.fa wgsim_2_R1.fastq wgsim_2_R2.fastq -S 1 |
24 | | -wgsim -1 150 -2 150 -N 80000 genome.fa wgsim_3_R1.fastq wgsim_3_R2.fastq -S 1 |
25 | | -cat cm_1_2.fastq wgsim_1_R2.fastq | gzip --no-name > ../testdata/circdna_1_R2.fastq.gz |
26 | | -cat cm_2_2.fastq wgsim_2_R2.fastq | gzip --no-name > ../testdata/circdna_2_R2.fastq.gz |
27 | | -cat cm_3_2.fastq wgsim_3_R2.fastq | gzip --no-name > ../testdata/circdna_3_R2.fastq.gz |
| 19 | +Due the large number of large files in this repository for each pipeline, we highly recommend cloning only the branches you would use. |
28 | 20 |
|
29 | | -cat cm_1_1.fastq wgsim_1_R1.fastq | gzip --no-name > ../testdata/circdna_1_R1.fastq.gz |
30 | | -cat cm_2_1.fastq wgsim_2_R1.fastq | gzip --no-name > ../testdata/circdna_2_R1.fastq.gz |
31 | | -cat cm_3_1.fastq wgsim_3_R1.fastq | gzip --no-name > ../testdata/circdna_3_R1.fastq.gz |
| 21 | +```bash |
| 22 | +git clone <url> --single-branch --branch <pipeline/modules/branch_name> |
32 | 23 | ``` |
33 | 24 |
|
34 | | -### Expected output |
| 25 | +To subsequently clone other branches[^1] |
35 | 26 |
|
36 | | -To track and test the reproducibility of the pipeline with default parameters below are some of the expected outputs. |
37 | | - |
38 | | -### Number of `Circle-Map Realign` circles |
39 | | - |
40 | | -| sample | circles | |
41 | | -|-----------------------|-------| |
42 | | -| circdna_1 | 275 | |
43 | | -| circdna_2 | 280 | |
44 | | -| circdna_3 | 279 | |
45 | | - |
46 | | -### Number of `Circexplorer2` circles |
47 | | - |
48 | | -| sample | circles | |
49 | | -|-----------------------|-------| |
50 | | -| circdna_1 | 392 | |
51 | | -| circdna_2 | 328 | |
52 | | -| circdna_3 | 393 | |
53 | | - |
54 | | -### Number of `circle_finder` circles |
55 | | - |
56 | | -| sample | circles | |
57 | | -|-----------------------|-------| |
58 | | -| circdna_1 | 267 | |
59 | | -| circdna_2 | 275 | |
60 | | -| circdna_3 | 266 | |
61 | | - |
62 | | -### Number of `unicycler` circles |
| 27 | +```bash |
| 28 | +git remote set-branches --add origin [remote-branch] |
| 29 | +git fetch |
| 30 | +``` |
63 | 31 |
|
64 | | -| sample | circles | |
65 | | -|-----------------------|-------| |
66 | | -| circdna_1 | 1 | |
67 | | -| circdna_2 | 0 | |
68 | | -| circdna_3 | 0 | |
| 32 | +## Support |
69 | 33 |
|
70 | | -These are just guidelines and will change with the use of different software, and with any restructuring of the pipeline away from the current defaults. |
| 34 | +For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging). |
71 | 35 |
|
| 36 | +[^1]: From [stackoverflow](https://stackoverflow.com/a/60846265/11502856) |
0 commit comments