Skip to content

Commit fb2a56b

Browse files
committed
update README
1 parent 76368ed commit fb2a56b

File tree

2 files changed

+20
-9
lines changed

2 files changed

+20
-9
lines changed

README.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,23 @@ somatic variant calling. These workflows are implemented in the Nextflow (Di Tom
1212
Find the documentation here [![Documentation Status](https://readthedocs.org/projects/tronflow-docs/badge/?version=latest)](https://tronflow-docs.readthedocs.io/en/latest/?badge=latest)
1313

1414

15-
The aim of this workflow is the preprocessing of BAM files based on Picard and GATK (DePristo, 2011).
15+
The aim of this workflow is to preprocess BAM files based on Picard and GATK (DePristo, 2011) best practices.
1616

1717

1818
## Background
1919

20-
In order to have a variant calling ready BAM file there are a number of operations that need to be applied on the BAM. This pipeline depends on the particular variant caller, but there are some common operations.
20+
In order to have a variant calling ready BAM file there are a number of operations that need to be applied on the BAM.
21+
This pipeline depends on the particular variant caller, but there are some common operations.
2122

22-
GATK has been providing a well known best practices document on BAM preprocessing, the latest best practices for GATK4 (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165) does not perform anymore realignment around indels as opposed to best practices for GATK3 (https://software.broadinstitute.org/gatk/documentation/article?id=3238). This pipeline is based on both Picard and GATK. These best practices have been implemented a number of times, see for instance this implementation in Workflow Definition Language https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.wdl.
23+
GATK has been providing a well known best practices document on BAM preprocessing, the latest best practices for
24+
GATK4 (https://software.broadinstitute.org/gatk/best-practices/workflow?id=11165) does not perform anymore realignment around indels as opposed to best practices for GATK3 (https://software.broadinstitute.org/gatk/documentation/article?id=3238). This pipeline is based on both Picard and GATK. These best practices have been implemented a number of times, see for instance this implementation in Workflow Definition Language https://github.com/gatk-workflows/gatk4-data-processing/blob/master/processing-for-variant-discovery-gatk4.wdl.
2325

2426

2527
## Objectives
2628

27-
We aim at providing a single implementation of the BAM preprocessing pipeline that can be used across different situations. For this purpose there are some required steps and some optional steps. This is implemented as a Nextflow pipeline to simplify parallelization of execution in the cluster.
29+
We aim at providing a single implementation of the BAM preprocessing pipeline that can be used across different
30+
use cases.
31+
For this purpose there are some required steps and some optional steps.
2832

2933
The input can be either a tab-separated values file (`--input_files`) where each line corresponds to one input BAM or a single BAM (`--input_bam` and `--input_name`).
3034

@@ -38,7 +42,7 @@ Steps:
3842
* **Mark duplicates** (optional). Identify the PCR and the optical duplications and marks those reads. This uses the parallelized version on Spark, it is reported to scale linearly up to 16 CPUs.
3943
* **Realignment around indels** (optional). This procedure is important for locus based variant callers, but for any variant caller doing haplotype assembly it is not needed. This is computing intensive as it first finds regions for realignment where there are indication of indels and then it performs a local realignment over those regions. Implemented in GATK3, deprecated in GATK4
4044
* **Base Quality Score Recalibration (BQSR)** (optional). It aims at correcting systematic errors in the sequencer when assigning the base call quality errors, as these scores are used by variant callers it improves variant calling in some situations. Implemented in GATK4
41-
* **Metrics** (optional). A number of metrics are obtained over the BAM file with Picard's CollectMetrics (eg: duplication, insert size, alignment, etc.).
45+
* **Metrics** (optional). A number of metrics are obtained from the BAM file with Picard's CollectMetrics, CollectHsMetrics and samtools' coverage and depth.
4246

4347
![Pipeline](figures/bam_preprocessing2.png)
4448

@@ -49,8 +53,9 @@ Base Quality Score Recalibration (BQSR) requires dbSNP to avoid extracting error
4953
Realignment around indels requires a set of known indels (`--known_indels1` and `--known_indels2`).
5054
These resources can be fetched from the GATK bundle https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle.
5155

52-
Optionally, in order to run Picard's CollectHsMetrics an intervals file will need to be provided (`--intervals`).
53-
This can be built from a BED file using Picard's BedToIntervalList (https://gatk.broadinstitute.org/hc/en-us/articles/360036883931-BedToIntervalList-Picard-)
56+
Optionally, in order to run Picard's CollectHsMetrics a BED file will need to be provided (`--intervals`).
57+
This BED file will also be used for `samtools coverage`.
58+
5459

5560
## How to run it
5661

@@ -108,8 +113,11 @@ Computational resources:
108113
109114
Optional output:
110115
* Recalibration report
116+
* Deduplication metrics
111117
* Realignment intervals
112-
* Metrics
118+
* GATK multiple metrics
119+
* HS metrics
120+
* Horizontal and vertical coverage metrics
113121
```
114122

115123

nextflow.config

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,9 @@ Computational resources:
111111
112112
Optional output:
113113
* Recalibration report
114+
* Deduplication metrics
114115
* Realignment intervals
115-
* Metrics
116+
* GATK multiple metrics
117+
* HS metrics
118+
* Horizontal and vertical coverage metrics
116119
"""

0 commit comments

Comments
 (0)