Skip to content

Commit 9738a2d

Browse files
authored
Merge pull request #1607 from nf-core/dev
Dev -> Master for 3.21.0
2 parents b971ba1 + 13c6e21 commit 9738a2d

File tree

66 files changed

+3215
-3021
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+3215
-3021
lines changed

.nf-core.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ nf_core_version: 3.3.2
1414
repository_type: pipeline
1515
template:
1616
author: "Harshil Patel, Phil Ewels, Rickard Hammarén"
17-
description: RNA sequencing analysis pipeline for gene/isoform quantification
18-
and extensive quality control.
17+
description: RNA sequencing analysis pipeline for gene/isoform quantification and
18+
extensive quality control.
1919
force: false
2020
is_nfcore: true
2121
name: rnaseq
2222
org: nf-core
2323
outdir: .
24-
version: 3.20.0
24+
version: 3.21.0

CHANGELOG.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,32 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
## 3.20.0
6+
## [[3.21.0](https://github.com/nf-core/rnaseq/releases/tag/3.21.0)] - 2025-09-18
7+
8+
### Credits
9+
10+
Special thanks to the following for their contributions to the release:
11+
12+
- [Edmund Miller](https://github.com/edmundmiller)
13+
- [Friederike Hanssen](https://github.com/friederikehanssen)
14+
- [Maxime Garcia](https://github.com/maxulysse)
15+
- [Jonathan Manning](https://github.com/pinin4fjords)
16+
17+
### Enhancements & fixes
18+
19+
- [PR #1597](https://github.com/nf-core/rnaseq/pull/1597) - Bump version after release 3.20.0
20+
- [PR #1603](https://github.com/nf-core/rnaseq/pull/1603) - Add bam input pathway
21+
- [PR #1604](https://github.com/nf-core/rnaseq/pull/1604) - Enable BAM input for RSEM
22+
- [PR #1605](https://github.com/nf-core/rnaseq/pull/1605) - Fix default for umi_discard_read to prevent validation errors in Platform
23+
- [PR #1606](https://github.com/nf-core/rnaseq/pull/1606) - Bump version to 3.21.0 ahead of release
24+
25+
### Software dependencies
26+
27+
| Dependency | Old version | New version |
28+
| ---------- | ----------- | ----------- |
29+
| `MultiQC` | 1.30 | 1.31 |
30+
31+
## [[3.20.0](https://github.com/nf-core/rnaseq/releases/tag/3.20.0)] - 2025-08-18
732

833
### Credits
934

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
## Introduction
2222

23-
**nf-core/rnaseq** is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet and FASTQ files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.
23+
**nf-core/rnaseq** is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet with FASTQ files or pre-aligned BAM files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.
2424

2525
![nf-core/rnaseq metro map](docs/images/nf-core-rnaseq_metro_map_grey_animated.svg)
2626

@@ -76,6 +76,8 @@ CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,a
7676

7777
Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and will be automatically inferred if set to `auto`.
7878

79+
The pipeline supports a two-step reprocessing workflow using BAM files from previous runs. Run initially with `--save_align_intermeds` to generate a samplesheet with BAM paths, then reprocess using `--skip_alignment` for efficient downstream analysis without repeating expensive alignment steps. This feature is designed specifically for pipeline-generated BAMs.
80+
7981
> [!WARNING]
8082
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
8183

assets/schema_input.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,27 @@
3232
"errorMessage": "Strandedness must be provided and be one of 'auto', 'forward', 'reverse' or 'unstranded'",
3333
"enum": ["forward", "reverse", "unstranded", "auto"],
3434
"meta": ["strandedness"]
35+
},
36+
"genome_bam": {
37+
"type": "string",
38+
"format": "file-path",
39+
"exists": true,
40+
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.(bam|BAM)$",
41+
"errorMessage": "Genome BAM file cannot contain spaces and must have extension '.bam'"
42+
},
43+
"transcriptome_bam": {
44+
"type": "string",
45+
"format": "file-path",
46+
"exists": true,
47+
"pattern": "^([\\S\\s]*\\/)?[^\\s\\/]+\\.(bam|BAM)$",
48+
"errorMessage": "Transcriptome BAM file cannot contain spaces and must have extension '.bam'"
49+
},
50+
"percent_mapped": {
51+
"type": "number",
52+
"minimum": 0,
53+
"maximum": 100,
54+
"errorMessage": "Percent mapped must be a number between 0 and 100",
55+
"meta": "percent_mapped"
3556
}
3657
},
3758
"required": ["sample", "fastq_1", "strandedness"]

conf/test.config

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@ params {
3535
bbsplit_fasta_list = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a/reference/bbsplit_fasta_list.txt'
3636
hisat2_index = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a/reference/hisat2.tar.gz'
3737
salmon_index = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a/reference/salmon.tar.gz'
38-
rsem_index = 'https://raw.githubusercontent.com/nf-core/test-datasets/626c8fab639062eade4b10747e919341cbf9b41a/reference/rsem.tar.gz'
3938

4039
// Other parameters
4140
skip_bbsplit = false

docs/output.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ nextflow run nf-core/rnaseq -profile test_full,<docker/singularity/institute>
1010

1111
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
1212

13+
:::tip
14+
Many of the BAM files produced by this pipeline can be reused as input for future runs with `--skip_alignment`. This is particularly useful for reprocessing data or running downstream analysis steps without repeating computationally expensive alignment. See the [usage documentation](https://nf-co.re/rnaseq/usage#bam-input-for-reprocessing-workflow) for details on using BAM files as input.
15+
:::
16+
1317
## Pipeline overview
1418

1519
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
@@ -213,8 +217,8 @@ When `--remove_ribo_rna` is specified, the pipeline uses [SortMeRNA](https://git
213217
<summary>Output files</summary>
214218

215219
- `star_salmon/`
216-
- `*.Aligned.out.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
217-
- `*.Aligned.toTranscriptome.out.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.
220+
- `*.Aligned.out.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
221+
- `*.Aligned.toTranscriptome.out.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory. These files can be reused as `transcriptome_bam` input in future pipeline runs.
218222
- `salmon.merged.gene_counts.tsv`: Matrix of gene-level raw counts across all samples.
219223
- `salmon.merged.gene_tpm.tsv`: Matrix of gene-level TPM values across all samples.
220224
- `salmon.merged.gene.SummarizedExperiment.rds`: RDS object that can be loaded in R that contains a [SummarizedExperiment](https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html) container with the abundance TPM (`tpm`), estimated counts (`counts`) and gene length (`length`), estimated library size-scaled counts (`counts_scaled`), estimated length-scaled counts (`counts_length_scaled`) in the assays slot for genes.
@@ -276,16 +280,16 @@ The STAR section of the MultiQC report shows a bar plot with alignment rates: go
276280
- `rsem.merged.transcript_tpm.tsv`: Matrix of isoform-level TPM values across all samples.
277281
- `*.genes.results`: RSEM gene-level quantification results for each sample.
278282
- `*.isoforms.results`: RSEM isoform-level quantification results for each sample.
279-
- `*.STAR.genome.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
280-
- `*.transcript.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.
283+
- `*.STAR.genome.bam`: If `--save_align_intermeds` is specified the BAM file from STAR alignment containing read alignments to the reference genome will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
284+
- `*.transcript.bam`: If `--save_align_intermeds` is specified the BAM file from STAR alignment containing read alignments to the transcriptome will be placed in this directory. These files can be reused as `transcriptome_bam` input in future pipeline runs.
281285
- `star_rsem/<SAMPLE>.stat/`
282286
- `*.cnt`, `*.model`, `*.theta`: RSEM counts and statistics for each sample.
283287
- `star_rsem/log/`
284288
- `*.log`: STAR alignment report containing the mapping results summary.
285289

286290
</details>
287291

288-
[RSEM](https://github.com/deweylab/RSEM) is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.
292+
[RSEM](https://github.com/deweylab/RSEM) is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. When using `--aligner star_rsem`, the pipeline first runs STAR alignment with RSEM-compatible parameters to generate genome and transcriptome BAM files, then RSEM quantifies expression using these pre-aligned BAMs via the `--alignments` mode. This approach ensures optimal compatibility while maintaining RSEM's ability to effectively use ambiguously-mapping reads.
289293

290294
You can choose to align and quantify your data with RSEM by providing the `--aligner star_rsem` parameter.
291295

@@ -299,7 +303,7 @@ You can choose to align and quantify your data with RSEM by providing the `--ali
299303
<summary>Output files</summary>
300304

301305
- `hisat2/`
302-
- `<SAMPLE>.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
306+
- `<SAMPLE>.bam`: If `--save_align_intermeds` is specified the original BAM file containing read alignments to the reference genome will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
303307
- `hisat2/log/`
304308
- `*.log`: HISAT2 alignment report containing the mapping results summary.
305309
- `hisat2/unmapped/`
@@ -323,7 +327,7 @@ The pipeline has been written in a way where all the files generated downstream
323327
<summary>Output files</summary>
324328

325329
- `<ALIGNER>/`
326-
- `<SAMPLE>.sorted.bam`: If `--save_align_intermeds` is specified the original coordinate sorted BAM file containing read alignments will be placed in this directory.
330+
- `<SAMPLE>.sorted.bam`: If `--save_align_intermeds` is specified the original coordinate sorted BAM file containing read alignments will be placed in this directory. These files can be reused as `genome_bam` input in future pipeline runs.
327331
- `<SAMPLE>.sorted.bam.bai`: If `--save_align_intermeds` is specified the BAI index file for the original coordinate sorted BAM file will be placed in this directory.
328332
- `<SAMPLE>.sorted.bam.csi`: If `--save_align_intermeds --bam_csi_index` is specified the CSI index file for the original coordinate sorted BAM file will be placed in this directory.
329333
- `<ALIGNER>/samtools_stats/`
@@ -864,6 +868,8 @@ A number of genome-specific files are generated by the pipeline because they are
864868
- Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
865869
- Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
866870
- Parameters used by the pipeline run: `params.json`.
871+
- `samplesheets/`
872+
- `samplesheet_with_bams.csv`: **Auto-generated samplesheet for BAM reprocessing** (only created when using `--save_align_intermeds`) containing all samples with BAM file paths. For samples processed from FASTQ, includes paths to newly generated BAMs; for samples that were BAM input, preserves the original input paths. This samplesheet can be used directly for future pipeline runs with `--skip_alignment`, enabling efficient reprocessing without re-alignment.
867873

868874
</details>
869875

0 commit comments

Comments
 (0)