nfcore/chipseq is a bioinformatics analysis pipeline used for Chromatin ImmunopreciPitation sequencing (ChIP-seq) data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
- Raw read QC (FastQC)
- Adapter trimming (Trim Galore!)
- Alignment (BWA)
- Mark duplicates (picard)
- Merge alignments from multiple libraries of the same sample (picard)- Re-mark duplicates (picard)
- Filtering to remove:
- reads mapping to blacklisted regions (SAMtools,BEDTools)
- reads that are marked as duplicates (SAMtools)
- reads that arent marked as primary alignments (SAMtools)
- reads that are unmapped (SAMtools)
- reads that map to multiple locations (SAMtools)
- reads containing > 4 mismatches (BAMTools)
- reads that have an insert size > 2kb (BAMTools; paired-end only)
- reads that map to different chromosomes (Pysam; paired-end only)
- reads that arent in FR orientation (Pysam; paired-end only)
- reads where only one read of the pair fails the above criteria (Pysam; paired-end only)
 
- reads mapping to blacklisted regions (
- Alignment-level QC and estimation of library complexity (picard,Preseq)
- Create normalised bigWig files scaled to 1 million mapped reads (BEDTools,bedGraphToBigWig)
- Generate gene-body meta-profile from bigWig files (deepTools)
- Calculate genome-wide IP enrichment relative to control (deepTools)
- Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC (phantompeakqualtools)
- Call broad/narrow peaks (MACS2)
- Annotate peaks relative to gene features (HOMER)
- Create consensus peakset across all samples and create tabular file to aid in the filtering of the data (BEDTools)
- Count reads in consensus peaks (featureCounts)
- Differential binding analysis, PCA and clustering (R,DESeq2)
 
- Re-mark duplicates (
- Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation (IGV).
- Present QC for raw read, alignment, peak-calling and differential binding results (MultiQC,R)
- 
Install nextflow
- 
Install any of Docker,SingularityorPodmanfor full pipeline reproducibility (please only useCondaas a last resort; see docs)
- 
Download the pipeline and test it on a minimal dataset with a single command: nextflow run nf-core/chipseq -profile test,<docker/singularity/podman/conda/institute> Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute>in your command. This will enable eitherdockerorsingularityand set the appropriate execution settings for your local compute environment.
- 
Start running your own analysis! nextflow run nf-core/chipseq -profile <docker/singularity/podman/conda/institute> --input design.csv --genome GRCh37 
See usage docs for all of the available options when running the pipeline.
The nf-core/chipseq pipeline comes with documentation about the pipeline: usage and output.
These scripts were originally written by Chuan Wang (@chuan-wang) and Phil Ewels (@ewels) for use at the National Genomics Infrastructure at SciLifeLab in Stockholm, Sweden. The pipeline has since been re-implemented by Harshil Patel (@drpatelh) from The Bioinformatics & Biostatistics Group at The Francis Crick Institute, London.
Many thanks to others who have helped out and contributed along the way too, including (but not limited to): @apeltzer, @bc2zb, @crickbabs, @drejom, @houghtos, @KevinMenden, @mashehu, @pditommaso, @Rotholandus, @sofiahaglund, @tiagochst and @winni2k.
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #chipseq channel (you can join with this invite).
If you use nf-core/chipseq for your analysis, please cite it using the following doi: 10.5281/zenodo.3240506
You can cite the nf-core publication as follows:
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
