Add support for Cell Ranger #101

arteymix · 2025-09-13T18:32:43Z

TODO

integrate CellRanger in bioluigi
detect layout of runs in SRA
prevent Cell Ranger from creating MRO files in the current directory, we should probably ditch the --output-dir option and change the directory for the execution
run bamtofastq for BAM-file SRA submissions

arteymix · 2025-09-16T18:17:28Z

.github/workflows/build.yml

-      run: |
-        conda env update --file environment.yml --name base
+        activate-environment: rnaseq-pipeline
+        environment-file: environment.yml


This should be included immediately in the trunk.

setup.py

Parse SRA metadata from its XML format so that we can infer the role that each file plays in fastq-dump output. Add typing and fix many bugs. Retrieve the SRA public dir from a configuration Improve layout detection from SRA metadata Detect bcl2fastq standard filenames and also commonly used names. Add a fallback that checks for the presence of I1/I2/R1/R2, but warns since this is very unreliable. Track issues encountered in runs using an enumerated flag. Make resolution of test resources relative Allow some of the parameters for filtering cells to be overwritten if needed. Use CellRangerCount task from bioluigi Fix unpacking of singleton for single-run experiments Remove cell_ranger_bin from config, it's declared in bioluigi

Add more metadata.

Rename fastq_file_types to read_types and add an enumerated type for possible values.

Detect which pipeline branch to take by looking up the assay type of a dataset. Add a special case for FAC-sorted single-cell datasets that should be treated as bulk. Add support for 10x BAM SRA submissions. This is done by looking up the header of the BAM files to infer the sequencing layout and calling bamtofastq downstream on the original submission. Temporarily use the branch of bioluigi with improved sratools support and Cell Ranger.

arteymix · 2025-10-09T16:43:01Z

example.luigi.cfg

+[rnaseq_pipeline.sources.sra]
+# location where tools like prefetch and fastq-dump will store downloaded SRA files
+# you can get this value with vdb-config -p
+ncbi_public_dir=/cosmos/scratch/ncbi/public


I've encountered issues with parsing the output of vdb-config, so this is a more robust solution overall.

arteymix · 2025-10-09T16:44:35Z

rnaseq_pipeline/rnaseq_utils.py

+                  is_single_end: bool = False, is_paired: bool = False):
+    """Detects the layout of the sequencing run files based on their names and various additional information.
+
+    :param run_id: Identifier for the run


Mention here that a run is akin to a lane.

…rget

…ual files

Add a minimal configuration for tests and provide a mock for gemma-cli.

…for submitting data

arteymix force-pushed the feature-cell-ranger branch 3 times, most recently from a9daf79 to 89ec252 Compare September 15, 2025 22:31

arteymix commented Sep 16, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

This was linked to issues Sep 18, 2025

Handle SRA experiments with multiple lanes mapped on distinct runs #94

Closed

Remove --clip and --skip-technical from fastq-dump #87

Closed

arteymix force-pushed the feature-cell-ranger branch 2 times, most recently from 9e01957 to f250d94 Compare September 21, 2025 23:50

arteymix added 6 commits September 23, 2025 16:15

Use efetch directory with -id instead of esearch

8a285bc

Use conda-incubator/setup-miniconda

4a35d2b

Fix missing GemmaTaskMixin import

6b21de9

Skip checking GemmaDatasetHasBatch since it requires credentials

8ad0e47

Ignore SRA runs that do not contain transcriptomic RNA-Seq data

749eea6

arteymix force-pushed the feature-cell-ranger branch from 43c04ed to a987102 Compare September 23, 2025 23:21

Parse the --readTypes option

1d0932d

Add more metadata.

arteymix force-pushed the feature-cell-ranger branch from a987102 to 1d0932d Compare September 23, 2025 23:41

arteymix and others added 4 commits September 24, 2025 11:09

Improve and fix logging for extracting SRA metadata

bb11982

Rename fastq_file_types to read_types and add an enumerated type for possible values.

Validate SRA metadata by reading it prior to writing it to disk

7861dcc

Do not open the browser in Google OAuth flow

0035f85

arteymix commented Oct 9, 2025

View reviewed changes

arteymix added 6 commits October 9, 2025 10:14

Update Python to 3.12

819368a

Improvements for local source

9310e18

fixup! Add support for single-cell RNA-Seq datasets

f4d0588

Mark test data as generated

46cd60e

Add missing test data file

77f595b

Fix Makefile

1396137

arteymix mentioned this pull request Oct 9, 2025

Add a task for retrieving GEO platform metadata #99

Closed

arteymix added 4 commits October 9, 2025 11:46

sra: Cache BAM headers

f8df3d2

Delete organized single-cell data implement remove() to DownloadRunTa…

0757134

…rget

Fix double-printing of the task summary

f4877ef

Use the new RNASEQ_PIPELINE_REPORT file type

c31b580

arteymix self-assigned this Oct 9, 2025

arteymix added this to the 2.2.0 milestone Oct 9, 2025

arteymix added 6 commits October 20, 2025 12:35

Add wrapped tools

1e25051

Add a task to reorganize a split experiment

3428a5d

Remove unused ALIGNQCDIR

32139a5

Rename output files of bamtofastq not ending in '_001.fastq.gz'

5fb0325

Check if read_types is provided when detecting layout

5e8006a

Rename wrapped tools config section

8e3fbaa

arteymix linked an issue Oct 23, 2025 that may be closed by this pull request

Copy the Cell Ranger reference locally #103

Closed

5 tasks

arteymix and others added 16 commits October 26, 2025 08:47

More work

cfd4a30

Add missing test data

ba16abc

Make it possible to delete an entire run directory instead of individ…

31f87bd

…ual files

sra: Include the SRA run identifier when dumping FASTQ files from a BAM

bd43c52

Reduce the amount of configuration needed for the pipeline

f84c3b1

Add a minimal configuration for tests and provide a mock for gemma-cli.

gemma: Add targets for specific QTs existing and use those as target …

04af7e2

…for submitting data

Update cutadapt and MultiQC

a3a7806

Remove redundant task definition and add keyword parameters

d8fa72c

Migrate to pyproject.toml

c9f2945

Move gsheet and webviewer in optional dependencies

dd711b6

Remove unused IlluminaFastqHeader and CheckAfterCompleteMixin

6ba8539

Add a chemistry option to AlignSingleCellSample

2e9afee

Fix types and imports in tasks.py

916e168

fixup! Remove unused IlluminaFastqHeader and CheckAfterCompleteMixin

251b7b6

Fix incorrect logger usage in sra.py

d5e46ae

Downgrade warning for no fastq-load.py options to info

dcadc15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Cell Ranger #101

Add support for Cell Ranger #101

Uh oh!

arteymix commented Sep 13, 2025 •

edited

Loading

Uh oh!

arteymix Sep 16, 2025

Uh oh!

Uh oh!

arteymix Oct 9, 2025

Uh oh!

arteymix Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for Cell Ranger #101

Are you sure you want to change the base?

Add support for Cell Ranger #101

Uh oh!

Conversation

arteymix commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

arteymix Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arteymix Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

arteymix Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arteymix commented Sep 13, 2025 •

edited

Loading