Skip to content

Commit a4b02b9

Browse files
committed
Updated README
1 parent f4f0bde commit a4b02b9

File tree

1 file changed

+142
-5
lines changed

1 file changed

+142
-5
lines changed

README.md

Lines changed: 142 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,13 @@ User can check the following papers or links for understanding ChIP-seq QCs:
2929

3030
3) https://www.biostars.org/p/205576/
3131

32-
Required packages
33-
-------------------
32+
4) https://sites.google.com/site/anshulkundaje/projects/idr#TOC-Latest-pipeline (for IDR analysis)
3433

35-
ChIPLine requires the following packages / libraries to be installed in the system.
34+
Required packages for executing basic ChIP-seq pipeline
35+
-------------------------------------------------------
36+
37+
When executing basic ChIP-seq pipeline, user should install following
38+
packages / libraries in the system:
3639

3740
1) Bowtie2 (we have used version 2.3.3.1) http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
3841

@@ -57,8 +60,22 @@ for example, is provided in this link: http://hgdownload.soe.ucsc.edu/admin/exe/
5760
User should include the PATH of above mentioned libraries / packages inside their SYSTEM PATH variable.
5861
Some of these PATHS are also to be mentioned in a separate configuration file (mentioned below).
5962

60-
Execution
61-
------------
63+
64+
Required packages for executing IDR code
65+
------------------------------------------
66+
67+
In addition, when user requires to execute the IDR code,
68+
following packages / libraries are to be installed in the system:
69+
70+
1) sambamba (we have used version 0.6.7) http://lomereiter.github.io/sambamba/
71+
72+
2) The package IDRCode (available in https://drive.google.com/file/d/0B_ssVVyXv8ZSX3luT0xhV3ZQNWc/view?usp=sharing). Unzip the archieve and store in convenient location. Path of this
73+
archieve is to be provided for executing IDR code.
74+
75+
76+
77+
Execution of basic ChIP-seq pipeline
78+
------------------------------------
6279

6380
Current package includes a sample script file "pipeline_exec.sh". It conains sample commands required to
6481
invoke the main executable named "pipeline.sh", which is provided within the folder "bin".
@@ -331,6 +348,126 @@ Sample execution command:
331348

332349
Rscript ResSummary2.r /home/sourya/ChIPResults/ 0 1 0 2
333350

351+
Which means that
352+
353+
OutBaseDir=/home/sourya/ChIPResults/
354+
355+
BAMRead=0
356+
357+
Tagmentation=1
358+
359+
OldMethod=0
360+
361+
ControlPeak=2
362+
363+
364+
365+
366+
Command for executing IDR codes
367+
---------------------------------
368+
369+
Current pipeline supports IDR analysis between either a list of ChIP-seq peak files
370+
or between a list of alignment (BAM) files. In the second case, first the BAM files
371+
are analyzed and subsampled to contain equal number of reads (minimum number of reads
372+
contained in the inputs), and subsequently, peaks are estimated from these
373+
(subsampled) BAM files using MACS2. These peaks are then applied for IDR analysis.
374+
375+
The script "sample_IDR_Script.sh" included within this package
376+
shows calling following two functions (both are included within the folder
377+
"IDR_Codes"):
378+
379+
1) IDRMain.sh
380+
381+
2) IDR_SubSampleBAM_Main.sh
382+
383+
The first script, IDRMain.sh, performs IDR between two or more
384+
input peak files (we have used peaks estimated from MACS2). The parameters
385+
corresponding to this script are as follows:
386+
387+
-I InpFile
388+
A list of input peak files (obtained from MACS2 - in .narrowPeak or .narrowPeak.gz format).
389+
At least two peak files are required.
390+
391+
-P PathIDRCode
392+
Path of the IDRCode package (Kundaje et. al. after its installation).
393+
Please check the "Required packages" section for the details.
394+
395+
-d OutDir
396+
Output directory (absolute path preferred) which will store the IDR results.
397+
398+
-n PREFIX
399+
Prefix of output files. Default 'IDR_ChIP'.
400+
401+
A sample execution of this script is as follows:
402+
403+
./IDRMain.sh -I peak1.narrowPeak -I peak2.narrowPeak -I peak3.narrowPeak -P /home/sourya/packages/idrCode/ -d /home/sourya/OutDir_IDR -n 'IDR_test'
404+
405+
406+
407+
The second script, IDR_SubSampleBAM_Main.sh, takes input of two or more BAM files,
408+
estimates peaks from these BAM files, and then performs IDR analysis. The parameters
409+
corresponding to this script are as follows:
410+
411+
-I InpFile
412+
A list of input BAM files. At least two BAM files are required.
413+
414+
-P PathIDRCode
415+
Path of the IDRCode package (Kundaje et. al. after its installation).
416+
Please check the "Required packages" section for the details.
417+
418+
-d OutDir
419+
Output directory (absolute path preferred) which will store the IDR results.
420+
421+
-n PREFIX
422+
Prefix of output files. Default 'IDR_ChIP'.
423+
424+
-c CountPeak
425+
No of peaks in both replicates that will be compared for IDR analysis.
426+
Default 25000.
427+
428+
-T Tagmentation
429+
Binary variable. If 1, the input is a ChiPMentation data
430+
where the TAG Align files are created by
431+
shifting the strands a bit. Default 0.
432+
Tag align files are used for estimating peaks using MACS2.
433+
434+
-C CONTROLBAM
435+
Control file (in eiher .BAM or tagalign file in .gz format)
436+
used to estimate the peaks from MACS2. User may leave this field
437+
blank if no control file is available.
438+
439+
A sample execution of this script is as follows:
440+
441+
./IDR_SubSampleBAM_Main.sh -I inpfile1.bam -I inpfile2.bam -P /home/sourya/packages/idrCode/ -d /home/sourya/OutDir_IDR -n 'IDR_test' -c 25000 -T 1 -C control.bam
442+
443+
444+
Describing output of IDR analysis
445+
----------------------------------
446+
447+
In the specified output directory "OutDir" mentioned in the IDR script, following
448+
files (f) and folders (F) exist:
449+
450+
F1: Folders of the name $i$_and_$j$ where 0 <= i < N and 1 <= j <= N, where N is
451+
the number of replicates analyzed. Individual folders contain results for
452+
pairwise IDR analysis. For example, folder 0_and_1 contain IDR analysis
453+
for the sample 0 (first replicate) and the sample 1 (second replicate).
454+
455+
f1 : "Replicate_Names.txt" : names of the replicate samples used for IDR analysis.
456+
457+
f2: Input_Peak_Statistics.txt: number of peaks and the peak containing replicates.
458+
459+
f3: IDR_Batch_Plot-plot.pdf: final IDR plot. Here individual pairs (whose results
460+
are stored in the above mentioned folders) are numbered 1, 2, ...
461+
Consideing N = 3, the number of pairs possible is also 3. Here,
462+
the number 1 denotes the folder (pair) 0_and_1,
463+
2 denotes the folder (pair) 0_and_2, and 3 denotes the
464+
folder (pair) 1_and_2.
465+
466+
467+
468+
469+
470+
334471
Contact
335472
-----------
336473

0 commit comments

Comments
 (0)