You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/output.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ nextflow run nf-core/rnaseq -profile test_full,<docker/singularity/institute>
11
11
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
12
12
13
13
:::tip
14
-
Many of the BAM files produced by this pipeline can be reused as input for future runs. This is particularly useful for reprocessing data or running downstream analysis steps without repeating computationally expensive alignment. See the [usage documentation](https://nf-co.re/rnaseq/usage#using-bam-files-as-input) for details on using BAM files as input.
14
+
Many of the BAM files produced by this pipeline can be reused as input for future runs with `--skip_alignment`. This is particularly useful for reprocessing data or running downstream analysis steps without repeating computationally expensive alignment. See the [usage documentation](https://nf-co.re/rnaseq/usage#bam-input-for-reprocessing-workflow) for details on using BAM files as input.
15
15
:::
16
16
17
17
## Pipeline overview
@@ -280,16 +280,16 @@ The STAR section of the MultiQC report shows a bar plot with alignment rates: go
280
280
-`rsem.merged.transcript_tpm.tsv`: Matrix of isoform-level TPM values across all samples.
281
281
-`*.genes.results`: RSEM gene-level quantification results for each sample.
282
282
-`*.isoforms.results`: RSEM isoform-level quantification results for each sample.
283
-
-`*.STAR.genome.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
284
-
-`*.transcript.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory. These files can be reused as `transcriptome_bam` input in future pipeline runs.
283
+
-`*.STAR.genome.bam`: If `--save_align_intermeds` is specified the BAM file from STAR alignment containing read alignments to the reference genome will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
284
+
-`*.transcript.bam`: If `--save_align_intermeds` is specified the BAM file from STAR alignment containing read alignments to the transcriptome will be placed in this directory. These files can be reused as `transcriptome_bam` input in future pipeline runs.
285
285
-`star_rsem/<SAMPLE>.stat/`
286
286
-`*.cnt`, `*.model`, `*.theta`: RSEM counts and statistics for each sample.
287
287
-`star_rsem/log/`
288
288
-`*.log`: STAR alignment report containing the mapping results summary.
289
289
290
290
</details>
291
291
292
-
[RSEM](https://github.com/deweylab/RSEM) is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.
292
+
[RSEM](https://github.com/deweylab/RSEM) is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. When using `--aligner star_rsem`, the pipeline first runs STAR alignment with RSEM-compatible parameters to generate genome and transcriptome BAM files, then RSEM quantifies expression using these pre-aligned BAMs via the `--alignments` mode. This approach ensures optimal compatibility while maintaining RSEM's ability to effectively use ambiguously-mapping reads.
293
293
294
294
You can choose to align and quantify your data with RSEM by providing the `--aligner star_rsem` parameter.
295
295
@@ -869,7 +869,7 @@ A number of genome-specific files are generated by the pipeline because they are
869
869
- Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
870
870
- Parameters used by the pipeline run: `params.json`.
871
871
-`samplesheets/`
872
-
-`samplesheet_with_bams.csv`: **Auto-generated complete samplesheet** (only created when using `--save_align_intermeds`) containing all samples with BAM file paths. For samples processed from FASTQ, includes paths to newly generated BAMs; for samples that were BAM input, preserves the original input paths. This comprehensive samplesheet can be used directly for future pipeline runs, enabling efficient reprocessing without re-alignment.
872
+
-`samplesheet_with_bams.csv`: **Auto-generated samplesheet for BAM reprocessing** (only created when using `--save_align_intermeds`) containing all samples with BAM file paths. For samples processed from FASTQ, includes paths to newly generated BAMs; for samples that were BAM input, preserves the original input paths. This samplesheet can be used directly for future pipeline runs with `--skip_alignment`, enabling efficient reprocessing without re-alignment.
Copy file name to clipboardExpand all lines: docs/usage.md
+8-4Lines changed: 8 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -136,7 +136,7 @@ nextflow run nf-core/rnaseq \
136
136
-profile docker
137
137
```
138
138
139
-
The pipeline will skip alignment and indexing steps, putting the BAM files through post-processing and quantification only.
139
+
The `--skip_alignment` flag tells the pipeline to skip alignment, and in this situation it will use any provided BAM files instead of performing alignment, putting them through post-processing and quantification only.
> **⚠️ Warning**: This feature is designed specifically for BAM files generated by this pipeline. Using arbitrary BAM files from other sources is **not officially supported** and will likely only work via the two-step workflow described above. Users attempting to use other BAMs do so at their own risk.
154
154
155
+
> **⚠️ Warning**: You cannot mix quantifier types between BAM generation and reprocessing runs. BAM files generated with `--aligner star_salmon` must be reprocessed with `--aligner star_salmon`. Similarly, BAM files from `--aligner star_rsem` must be reprocessed with `--aligner star_rsem`. Mixing quantifier types will likely produce incorrect results due to incompatible alignment parameters.
156
+
155
157
**Key technical details:**
156
158
159
+
- BAM files are only used when `--skip_alignment` is specified
157
160
- The pipeline automatically indexes provided BAM files
158
161
- You can provide just `genome_bam`, just `transcriptome_bam`, or both
159
-
- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
162
+
- Mixed samplesheets are supported, but samples with BAM files require `--skip_alignment`
163
+
- Without `--skip_alignment`, the pipeline will perform alignment even if BAM files are provided
160
164
- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
161
165
162
166
This workflow is ideal for tweaking downstream processing steps (quantification methods, QC parameters, differential expression analysis) without repeating time-consuming alignment.
@@ -181,7 +185,7 @@ If you would like to reduce the number of reads used in the analysis, for exampl
181
185
The `--aligner hisat2` option is not currently supported using ARM architecture ('-profile arm')
182
186
:::
183
187
184
-
By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) (i.e. `--aligner star_salmon`) to map the raw FastQ reads to the reference genome, project the alignments onto the transcriptome and to perform the downstream BAM-level quantification with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). STAR is fast but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome. Since the [RSEM](https://github.com/deweylab/RSEM) (i.e.`--aligner star_rsem`) workflow in the pipeline also uses STAR you should use the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner (i.e. `--aligner hisat2`) if you have memory limitations.
188
+
By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) (i.e. `--aligner star_salmon`) to map the raw FastQ reads to the reference genome, project the alignments onto the transcriptome and to perform the downstream BAM-level quantification with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). STAR is fast but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome. Both `--aligner star_salmon` and`--aligner star_rsem` use STAR for alignment, so you should use the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner (i.e. `--aligner hisat2`) if you have memory limitations.
185
189
186
190
You also have the option to pseudoalign and quantify your data directly with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by specifying `salmon` or `kallisto` to the `--pseudo_aligner` parameter. The selected pseudoaligner will then be run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. By default, the pipeline will use the genome fasta and gtf file to generate the transcripts fasta file, and then to build the Salmon index. You can override these parameters using the `--transcript_fasta` and `--salmon_index` parameters, respectively.
187
191
@@ -313,7 +317,7 @@ Notes:
313
317
- If `--gff` is provided as input then this will be converted to a GTF file, or the latter will be used if both are provided.
314
318
- If `--gene_bed` is not provided then it will be generated from the GTF file.
315
319
- If `--additional_fasta` is provided then the features in this file (e.g. ERCC spike-ins) will be automatically concatenated onto both the reference FASTA file as well as the GTF annotation before building the appropriate indices.
316
-
- When using `--aligner star_rsem`, both the STAR and RSEM indices should be present in the path specified by `--rsem_index` (see [#568](https://github.com/nf-core/rnaseq/issues/568)).
320
+
- When using `--aligner star_rsem`, the pipeline will build separate STAR and RSEM indices. STAR performs alignment with RSEM-compatible parameters, then RSEM quantifies from the resulting BAM files using `--alignments` mode.
317
321
- If the `--skip_alignment` option is used along with `--transcript_fasta`, the pipeline can technically run without providing the genomic FASTA (`--fasta`). However, this approach is **not recommended** with `--pseudo_aligner salmon`, as any dynamically generated Salmon index will lack decoys. To ensure optimal indexing with decoys, it is **highly recommended** to include the genomic FASTA (`--fasta`) with Salmon, unless a pre-existing decoy-aware Salmon index is supplied. For more details on the benefits of decoy-aware indexing, refer to the [Salmon documentation](https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode).
0 commit comments