-
Notifications
You must be signed in to change notification settings - Fork 5
Add support for Cell Ranger #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
arteymix
wants to merge
45
commits into
master
Choose a base branch
from
feature-cell-ranger
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
8a285bc
Use efetch directory with -id instead of esearch
arteymix 4a35d2b
Use conda-incubator/setup-miniconda
arteymix 6b21de9
Fix missing GemmaTaskMixin import
arteymix 8ad0e47
Skip checking GemmaDatasetHasBatch since it requires credentials
arteymix 960c2a3
Add support for single-cell RNA-Seq datasets
arteymix 749eea6
Ignore SRA runs that do not contain transcriptomic RNA-Seq data
arteymix 1d0932d
Parse the --readTypes option
arteymix bb11982
Improve and fix logging for extracting SRA metadata
arteymix 7861dcc
Validate SRA metadata by reading it prior to writing it to disk
arteymix 0035f85
Do not open the browser in Google OAuth flow
arteymix 3e88020
Add support for 10x BAM submissions to SRA
arteymix 819368a
Update Python to 3.12
arteymix 9310e18
Improvements for local source
arteymix f4d0588
fixup! Add support for single-cell RNA-Seq datasets
arteymix 46cd60e
Mark test data as generated
arteymix 77f595b
Add missing test data file
arteymix 1396137
Fix Makefile
arteymix 29a3cfb
Replace luigi-wrapper with a simple CLI tool
arteymix cb32904
Skip fac-sorted dataset test since it's not public
arteymix f8df3d2
sra: Cache BAM headers
arteymix 0757134
Delete organized single-cell data implement remove() to DownloadRunTa…
arteymix f4877ef
Fix double-printing of the task summary
arteymix c31b580
Use the new RNASEQ_PIPELINE_REPORT file type
arteymix 1e25051
Add wrapped tools
arteymix 3428a5d
Add a task to reorganize a split experiment
arteymix 32139a5
Remove unused ALIGNQCDIR
arteymix 5fb0325
Rename output files of bamtofastq not ending in '_001.fastq.gz'
arteymix 5e8006a
Check if read_types is provided when detecting layout
arteymix 8e3fbaa
Rename wrapped tools config section
arteymix cfd4a30
More work
arteymix ba16abc
Add missing test data
arteymix 31f87bd
Make it possible to delete an entire run directory instead of individ…
arteymix bd43c52
sra: Include the SRA run identifier when dumping FASTQ files from a BAM
arteymix f84c3b1
Reduce the amount of configuration needed for the pipeline
arteymix 04af7e2
gemma: Add targets for specific QTs existing and use those as target …
arteymix a3a7806
Update cutadapt and MultiQC
arteymix d8fa72c
Remove redundant task definition and add keyword parameters
arteymix c9f2945
Migrate to pyproject.toml
arteymix dd711b6
Move gsheet and webviewer in optional dependencies
arteymix 6ba8539
Remove unused IlluminaFastqHeader and CheckAfterCompleteMixin
arteymix 2e9afee
Add a chemistry option to AlignSingleCellSample
arteymix 916e168
Fix types and imports in tasks.py
arteymix 251b7b6
fixup! Remove unused IlluminaFastqHeader and CheckAfterCompleteMixin
arteymix d5e46ae
Fix incorrect logger usage in sra.py
arteymix dcadc15
Downgrade warning for no fastq-load.py options to info
arteymix File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| /tests/data/* linguist-generated=true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,9 +27,12 @@ submit_data_jobs=1 | |
| submit_batch_info_jobs=2 | ||
|
|
||
| [bioluigi] | ||
| scheduler=slurm | ||
| scheduler=local | ||
| scheduler_partition= | ||
| scheduler_extra_args=[] | ||
| # Default tools, override as needed | ||
| #cutadapt_bin=cutadapt | ||
| #cell_ranger_bin=cellranger | ||
|
|
||
| # | ||
| # This section contains the necessary variables for the pipeline execution | ||
|
|
@@ -40,19 +43,33 @@ scheduler_extra_args=[] | |
| OUTPUT_DIR=pipeline-output | ||
| GENOMES=genomes | ||
| REFERENCES=references | ||
| SINGLE_CELL_REFERENCES=references-single-cell | ||
| METADATA=metadata | ||
| DATA=data | ||
| DATAQCDIR=data-qc | ||
| ALIGNDIR=aligned | ||
| ALIGNQCDIR=aligned-qc | ||
| QUANTDIR=quantified | ||
| BATCHINFODIR=batch-info | ||
|
|
||
| # RSEM | ||
| RSEM_DIR=contrib/RSEM | ||
| rsem_calculate_expression_bin=contrib/RSEM/rsem-calculate-expression | ||
|
|
||
| SLACK_WEBHOOK_URL= | ||
|
|
||
| [rnaseq_pipeline.wrapped_tools] | ||
| rsem_calculate_expression_bin=rsem-calculate-expression | ||
| cellranger_bin=cellranger | ||
|
|
||
| [rnaseq_pipeline.sources.sra] | ||
| # location where tools like prefetch and fastq-dump will store downloaded SRA files | ||
| # you can get this value with vdb-config -p | ||
| ncbi_public_dir=/cosmos/scratch/ncbi/public | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've encountered issues with parsing the output of vdb-config, so this is a more robust solution overall. |
||
| samtools_bin=samtools | ||
| bamtofastq_bin=bamtofastq | ||
| # location where BAM headers downloaded from SRA will be cached | ||
| bam_headers_cache_dir=bam_headers | ||
|
|
||
| [rnaseq_pipeline.gemma] | ||
| cli_bin=gemma-cli | ||
| # values for $JAVA_HOME and $JAVA_OPTS environment variables | ||
|
|
@@ -63,3 +80,6 @@ appdata_dir=/space/gemmaData | |
| human_reference_id=hg38_ncbi | ||
| mouse_reference_id=mm10_ncbi | ||
| rat_reference_id=rn7_ncbi | ||
| human_single_cell_reference_id=refdata-gex-GRCh38-2024-A | ||
| mouse_single_cell_reference_id=refdata-gex-GRCm39-2024-A | ||
| rat_single_cell_reference_id=refdata-gex-mRatBN7-2-2024-A | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| [build-system] | ||
| requires = ["setuptools"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [project] | ||
| name = "rnaseq-pipeline" | ||
| version = "2.1.12" | ||
| description = "RNA-Seq pipeline for the Pavlidis Lab" | ||
| authors = [ | ||
| {name = "Guillaume Poirier-Morency", email = "poirigui@msl.ubc.ca"} | ||
| ] | ||
| readme = "README.md" | ||
| license = "Unlicense" | ||
| license-files = ["LICENSE"] | ||
| requires-python = "==3.12.*" | ||
| dependencies = ['luigi', 'python-daemon<3.0.0', | ||
| 'bioluigi@git+https://github.com/PavlidisLab/bioluigi@master', | ||
| 'requests', 'pandas'] | ||
|
|
||
| [project.optional-dependencies] | ||
| gsheet = ['google-api-python-client', 'google-auth-httplib2', 'google-auth-oauthlib', 'pyxdg'] | ||
| webviewer = ['Flask', 'gunicorn'] | ||
|
|
||
| [dependency-groups] | ||
| dev = ["pytest", "mypy"] | ||
|
|
||
| [project.scripts] | ||
| rnaseq-pipeline-cli = "rnaseq_pipeline.cli:main" | ||
| rnaseq-pipeline-cellranger = "rnaseq_pipeline.wrapped_tools:cellranger_wrapper" | ||
| rnaseq-pipeline-rsem-calculate-expression = "rnaseq_pipeline.wrapped_tools:rsem_calculate_expression_wrapper" | ||
|
|
||
| [tool.setuptools] | ||
| packages = ["rnaseq_pipeline", "rnaseq_pipeline.sources", "rnaseq_pipeline.webviewer"] | ||
|
|
||
| [tool.mypy] | ||
| plugins = ["luigi.mypy"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| [pytest] | ||
| testpaths=tests | ||
| log_cli=1 | ||
| log_cli_level=info |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,7 @@ | ||
| import luigi | ||
|
|
||
| luigi.auto_namespace(scope=__name__) | ||
|
|
||
| from rnaseq_pipeline.tasks import * | ||
| from rnaseq_pipeline.sources.sra import * | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| import argparse | ||
| import sys | ||
| import os | ||
| from contextlib import contextmanager | ||
|
|
||
| import luigi | ||
| import luigi.cmdline | ||
|
|
||
| from rnaseq_pipeline.tasks import SubmitExperimentToGemma, SubmitExperimentsFromGoogleSpreadsheetToGemma, \ | ||
| SubmitExperimentBatchInfoToGemma | ||
|
|
||
| @contextmanager | ||
| def umask(umask): | ||
| print(f'Setting umask to 0x{umask:03o}') | ||
| prev_umask = os.umask(umask) | ||
| try: | ||
| yield None | ||
| finally: | ||
| print(f'Restoring umask to 0x{prev_umask:03o}') | ||
| os.umask(prev_umask) | ||
|
|
||
| def parse_octal(s): | ||
| return int(s, 8) | ||
|
|
||
| def run_luigi_task(task, args): | ||
| with umask(args.umask): | ||
| luigi.build([task], workers=args.workers, detailed_summary=True, local_scheduler=args.local_scheduler) | ||
|
|
||
| def run(args): | ||
| with umask(0o002): | ||
| luigi.run(args) | ||
|
|
||
| def submit_experiment(argv): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument('--experiment-id', required=True, help='Experiment ID to submit to Gemma') | ||
| parser.add_argument('--rerun', action='store_true', default=False, help='Rerun the experiment') | ||
| parser.add_argument('--priority', type=int, default=100) | ||
| parser.add_argument('--umask', type=parse_octal, default='002', | ||
| help='Set a umask (defaults to 002 to make created files group-writable)') | ||
| parser.add_argument('--workers', type=int, default=30, help='Number of workers to use (defaults to 30)') | ||
| parser.add_argument('--local-scheduler', action='store_true', default=False) | ||
| args = parser.parse_args(argv) | ||
| run_luigi_task(SubmitExperimentToGemma(experiment_id=args.experiment_id, rerun=args.rerun, priority=args.priority), | ||
| args) | ||
|
|
||
| def submit_experiment_batch_info(argv): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument('--experiment-id', required=True, help='Experiment ID to submit to Gemma') | ||
| parser.add_argument('--ignored-samples', nargs='+', default=[]) | ||
| parser.add_argument('--rerun', action='store_true', default=False, help='Rerun the experiment') | ||
| parser.add_argument('--umask', type=parse_octal, default='002', | ||
| help='Set a umask (defaults to 002 to make created files group-writable)') | ||
| parser.add_argument('--workers', type=int, default=30, help='Number of workers to use (defaults to 30)') | ||
| parser.add_argument('--local-scheduler', action='store_true', default=False) | ||
| args = parser.parse_args(argv) | ||
| print(args.ignored_samples) | ||
| run_luigi_task( | ||
| SubmitExperimentBatchInfoToGemma(experiment_id=args.experiment_id, ignored_samples=args.ignored_samples, | ||
| rerun=args.rerun), args) | ||
|
|
||
| def submit_experiments_from_gsheet(argv): | ||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument('--spreadsheet-id', required=True, help='Spreadsheet ID') | ||
| parser.add_argument('--sheet-name', required=True, help='Sheet name') | ||
| parser.add_argument('--umask', type=parse_octal, default='002', | ||
| help='Set a umask (defaults to 002 to make created files group-writable)') | ||
| parser.add_argument('--workers', type=int, default=200, help='Number of workers to use (defaults to 200)') | ||
| parser.add_argument('--ignore-priority', action='store_true', help='Ignore the priority column in the spreadsheet') | ||
| parser.add_argument('--local-scheduler', action='store_true', default=False) | ||
| args = parser.parse_args(argv) | ||
| run_luigi_task(SubmitExperimentsFromGoogleSpreadsheetToGemma(args.spreadsheet_id, args.sheet_name, | ||
| ignore_priority=args.ignore_priority), args) | ||
|
|
||
| def main(): | ||
| if len(sys.argv) < 2: | ||
| print('Usage: rnaseq-pipeline-cli <command>') | ||
| return 1 | ||
| command = sys.argv[1] | ||
| if command == 'run': | ||
| return run(sys.argv[2:]) | ||
| elif command == 'submit-experiment': | ||
| return submit_experiment(sys.argv[2:]) | ||
| elif command == 'submit-experiment-batch-info': | ||
| return submit_experiment_batch_info(sys.argv[2:]) | ||
| elif command == 'submit-experiments-from-gsheet': | ||
| return submit_experiments_from_gsheet(sys.argv[2:]) | ||
| else: | ||
| print( | ||
| f'Unknown command {command}. Possible values are: submit-experiment, submit-experiment-batch-info, submit-experiments-from-gsheet.') | ||
| return 1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,21 +1,28 @@ | ||
| from typing import Optional | ||
|
|
||
| import luigi | ||
|
|
||
| # see luigi.cfg for details | ||
| class rnaseq_pipeline(luigi.Config): | ||
| task_namespace = '' | ||
| class Config(luigi.Config): | ||
| @classmethod | ||
| def get_task_family(cls): | ||
| return 'rnaseq_pipeline' | ||
|
|
||
| OUTPUT_DIR: str = luigi.Parameter(default='pipeline-output') | ||
|
|
||
| GENOMES = luigi.Parameter() | ||
| GENOMES: str = luigi.Parameter(default='genomes') | ||
| REFERENCES: str = luigi.Parameter(default='references') | ||
| SINGLE_CELL_REFERENCES: str = luigi.Parameter(default='references-single-cell') | ||
| METADATA: str = luigi.Parameter(default='metadata') | ||
| DATA: str = luigi.Parameter(default='data') | ||
| DATAQCDIR: str = luigi.Parameter(default='data-qc') | ||
| ALIGNDIR: str = luigi.Parameter(default='aligned') | ||
| QUANTDIR: str = luigi.Parameter(default='quantified') | ||
| QUANT_SINGLE_CELL_DIR: str = luigi.Parameter(default='quantified-single-cell') | ||
| BATCHINFODIR: str = luigi.Parameter(default='batch-info') | ||
|
|
||
| OUTPUT_DIR = luigi.Parameter() | ||
| REFERENCES = luigi.Parameter() | ||
| METADATA = luigi.Parameter() | ||
| DATA = luigi.Parameter() | ||
| DATAQCDIR = luigi.Parameter() | ||
| ALIGNDIR = luigi.Parameter() | ||
| ALIGNQCDIR = luigi.Parameter() | ||
| QUANTDIR = luigi.Parameter() | ||
| BATCHINFODIR = luigi.Parameter() | ||
| RSEM_DIR: str = luigi.Parameter(default='contrib/RSEM') | ||
|
|
||
| RSEM_DIR = luigi.Parameter() | ||
| rsem_calculate_expression_bin: str = luigi.Parameter(default='contrib/RSEM/rsem-calculate-expression') | ||
|
|
||
| SLACK_WEBHOOK_URL = luigi.OptionalParameter(default=None) | ||
| SLACK_WEBHOOK_URL: Optional[str] = luigi.OptionalParameter(default=None) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be included immediately in the trunk.