- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5
Add support for Cell Ranger #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a9daf79    to
    89ec252      
    Compare
  
    | run: | | ||
| conda env update --file environment.yml --name base | ||
| activate-environment: rnaseq-pipeline | ||
| environment-file: environment.yml | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be included immediately in the trunk.
9e01957    to
    f250d94      
    Compare
  
    Parse SRA metadata from its XML format so that we can infer the role that each file plays in fastq-dump output. Add typing and fix many bugs. Retrieve the SRA public dir from a configuration Improve layout detection from SRA metadata Detect bcl2fastq standard filenames and also commonly used names. Add a fallback that checks for the presence of I1/I2/R1/R2, but warns since this is very unreliable. Track issues encountered in runs using an enumerated flag. Make resolution of test resources relative Allow some of the parameters for filtering cells to be overwritten if needed. Use CellRangerCount task from bioluigi Fix unpacking of singleton for single-run experiments Remove cell_ranger_bin from config, it's declared in bioluigi
43c04ed    to
    a987102      
    Compare
  
    Add more metadata.
a987102    to
    1d0932d      
    Compare
  
    Rename fastq_file_types to read_types and add an enumerated type for possible values.
Detect which pipeline branch to take by looking up the assay type of a dataset. Add a special case for FAC-sorted single-cell datasets that should be treated as bulk. Add support for 10x BAM SRA submissions. This is done by looking up the header of the BAM files to infer the sequencing layout and calling bamtofastq downstream on the original submission. Temporarily use the branch of bioluigi with improved sratools support and Cell Ranger.
| [rnaseq_pipeline.sources.sra] | ||
| # location where tools like prefetch and fastq-dump will store downloaded SRA files | ||
| # you can get this value with vdb-config -p | ||
| ncbi_public_dir=/cosmos/scratch/ncbi/public | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've encountered issues with parsing the output of vdb-config, so this is a more robust solution overall.
| is_single_end: bool = False, is_paired: bool = False): | ||
| """Detects the layout of the sequencing run files based on their names and various additional information. | ||
| :param run_id: Identifier for the run | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention here that a run is akin to a lane.
Add a minimal configuration for tests and provide a mock for gemma-cli.
…for submitting data
TODO
--output-diroption and change the directory for the executionbamtofastqfor BAM-file SRA submissions