You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+116-1Lines changed: 116 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -361,7 +361,122 @@ There are two kinds of HTML reports provided by the pipeline.
361
361
362
362
If you stop a BDS pipeline with `Ctrl+C` while calling peaks with `spp`. Temporary files generated by `Rscript` are not removed and they are still on `$TMP` (or `/tmp` if not explicitly exported). You need to manually remove them.
363
363
364
-
## Programming with BDS
364
+
365
+
# Output directory structure and file naming
366
+
367
+
For more details, refer to the file table section in an HTML report generated by the pipeline.
368
+
369
+
```
370
+
out # root dir. of outputs
371
+
│
372
+
├ *report.html # HTML report
373
+
├ *tracks.json # Tracks datahub (JSON) for WashU browser
374
+
├ ENCODE_summary.json # Metadata of all datafiles and QC results
│ │ ├ *.nodup.flagstat.qc # Flagstat QC for filtered bam
440
+
│ │ ├ *M.cc.qc # Cross-correlation analysis score for tagAlign
441
+
│ │ └ *M.cc.plot.pdf/png # Cross-correlation analysis plot for tagAlign
442
+
│ ...
443
+
│
444
+
├ signal # signal tracks
445
+
│ ├ macs2 # signal tracks generated by MACS2
446
+
│ │ ├ rep1 # for true replicate 1
447
+
│ │ │ ├ *.pval.signal.bigwig (E) # signal track for p-val
448
+
│ │ │ └ *.fc.signal.bigwig (E) # signal track for fold change
449
+
│ ...
450
+
│ └ pooled_rep # for pooled replicate
451
+
│
452
+
└ report # files for HTML report
453
+
```
454
+
455
+
## QC metrics spreadsheet (TSV) generation
456
+
457
+
For each pipeline rune, `ENCODE_summary.json` file is generated under the output directory (`-out_dir`). This JSON file includes all metadata and QC metrics.
458
+
459
+
`./utils/parse_summary_qc_recursively.py` recursively finds `ENCODE_summary.json` files and parse them to generate one big TSV spreadsheet for QC metrics.
460
+
461
+
```
462
+
$ python parse_summary_qc_recursively.py -h
463
+
usage: ENCODE_summary.json parser for QC [-h][--out-file OUT_FILE]
464
+
[--search-dir SEARCH_DIR]
465
+
[--json-file JSON_FILE]
466
+
467
+
Recursively find ENCODE_summary.json, parse it and make a TSV spreadsheet of
468
+
QC metrics.
469
+
470
+
optional arguments:
471
+
-h, --help show this help message and exit
472
+
--out-file OUT_FILE Output TSV filename)
473
+
--search-dir SEARCH_DIR
474
+
Root directory to search for ENCODE_summary.json
475
+
--json-file JSON_FILE
476
+
Specify json file name to be parsed
477
+
```
478
+
479
+
# Programming with BDS
365
480
366
481
* [Using genomic pipeline modules in Kundaje lab](https://kundajelab.github.io/bds_pipeline_modules/programming.html)
"This format is used to provide called regions of signal enrichment based on pooled, normalized (interpreted) data where the regions may be spliced or incorporate gaps in the genomic sequence. It is a BED12+3 format."
3
+
(
4
+
stringchrom; "Reference sequence chromosome or scaffold"
0 commit comments