Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.

Similar presentations


Presentation on theme: "Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory."— Presentation transcript:

1 Practice:submit the ChIP_Streamline.pbs 1.Replace xxxx@ufl.edy with your email addressxxxx@ufl.edy 2.Make sure the.fastq files are in your GMS6014 directory. 3.Load the edited file to HPC your GMS6014 directory. 4.log in to HPC, cd to GMS6014/, submit the job by: "$> qsub ChIP_Streamline.pbs"

2 Practice: Annotate the binding site identification 1.Observe and modify the annotate.sh file in your text editor. 2.Upload to HPC /scratch/lfs/xxx/GMS6014. 3.Run “$> bash annotate.sh”

3 Recording sequence reads from the machine – FASTQ FASTA: >My_sequence AATTACGCGCGATACGAT FASTQ: @My_sequence AATTACGCGCGATACGAT +My_sequence quality efcfffffcfeeYBBsdf Recording of quality assessment allows filtering based on sequence quality.

4 Recording sequence and quality information FASTQ format = FASTA + Quality @HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTT +HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 !"#$%&'()*+,-./0123456789:; ?@ABCDEFGHIJKLMNOPQabdefghadfda Two identification lines (@, +) for each sequence. Identification line format depends on specific sequencing platform. Quality line using characters representing integer values.

5 Paint the sequence reads to the genome HTS reads @reads_1 AATTACGCGCGATACGAT + efcfffffcfeeYBBsdf @reads_2 ACCGAGGCGCGTATGTCT + efcfffffcfeeYBBsea …. @reads_1,000,001 Corresponding location on the genome ELAND (Illumina) Bowtie, etc. ChIP-Seq; RNA-Seq De novo assembly of genomes, chromatin conformation, genomic abnormality, etc…

6 HTS data – map to genome  “bwa” or “bowtie” are the two most popular software that implement a similar strategy (Burrows-Wheeler Transform).  Can benefit from multi-processor. map the reads to hg19. bowtie2 -x hg19 -U SRR1186251.fastq -p 2 -S Input.sam bowtie2 -x hg19 -U SRR1186252.fastq -p 2 -S P53ChIP.sam

7 ChIP-Seq – identifying TF binding sites  MACS- Model-based Analysis of ChIP-Seq Practice: Identifying peacks macs2 callpeak -t P53ChIP.sam -c Input.sam -f SAM -g dm -n P53_GM00011 -B

8 Representation of (HTS) data – BED (Browser Extensible Data) file chr2 1000019210000217U00+ chr21000022710000252U10- chr21000031010000335U20+ chr31000049610000521U10- chr21000055610000581U20+ Chrom.Start EndnameScorStrand With the completion of the genome, there is no need to record the base pair identity (if it is the same as the reference genome). Detailed description of genomic data formats: http://genome.ucsc.edu/FAQ/FAQformat.html http://genome.ucsc.edu/FAQ/FAQformat.html

9 How to gain knowledge from HTS data  Visualization of HTS data.  Discovering genomic patterns.  Identifying novel mechanism – hypothesis generation.

10 Visualization of HTS data. Simple visualization - distribution of tags (or normalized values). Barski et al. (2007) Cell chr4 02000 chr42004002 chr4400600 13 chr460080035 chr4800100027 Chr.ChrStart ChrEndValue BedGraph file (Wig)

11 Visualizing Deep Seq data with UCSC genome browser Practice & Observe I: 1.Load the track file as custom track to the browser by copy/past the URL link or upload the file. 2.View ‘dense’ and then ‘full’ presentation of the track.

12 Visualization of HTS data. Advanced visualization – depending on purpose of comparison. Berger et al. (2011) Nature Example - Circos plot depicts genomic location, chromosomal copy number (red, copy gain; blue, copy loss). Inter- chromosomal translocations (purple) and intra-chromosomal (green) rearrangements observed in primary prostate cancers

13 Identifying histone modification pattern and TF binding sites Peak-calling programs:  MACS (Model-based Analysis of ChIP-Seq ): http://liulab.dfci.harvard.edu/MACS/ http://liulab.dfci.harvard.edu/MACS/  ChIP-seq processing pipeline: http://compbio.med.harvard.edu/Supplements/ChIP-seq/ http://compbio.med.harvard.edu/Supplements/ChIP-seq/

14 HTS data validation Brain teaser: For TF binding sites identified by two biological ChIP-Seq replicates, which level of overlap is considered as “acceptable”? A.99% B.90% C.50% Replicate_1 Replicate_2

15 Discovering genomic patterns Usually requires some programming (scripting). As a biologist, you need to clearly define your question, and the logic to obtain the data summary. Barski et al. (2007) Cell

16 Discovering genomic patterns Q: Is H3K4me3 associated with TSS? Is such an association related to gene expression status? Logic: 1.Group genes based on expression levels obtained with a microarray study (Su et al, 2004). 2.For each gene, obtain the normalized H3K4me3 ChIP-Seq counts within [-2k, +2k] of the TSS. 3.For each of the expression group, plot the average value along the [-2k, +2k] interval.

17 HTS data interpretation Brain teaser: Since Gene A is expressed in higher level than Gene B, there must be more PolII binding at Gene A’s TSS. A.True B.False

18 Functional Analysis of HTS data  Gene Ontology – http://www.geneontology.org/ http://www.geneontology.org/  Regulatory pathways.  Modeling & Systems Biology.

19 Gene Ontology – hierarchical framework of terms / concepts

20 Gene Ontology Goal – “produce a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” – GO consortium Ontology: “ The branch of metaphysics that deals with the nature of being” – The American Heritage Dictionary

21 Implications of Gene Ontology (I) Monitoring biological processes or molecular functions beyond individual gene. Example: 1.) Which biological process (mol. Function) is activated/suppressed following a treatment?

22 Gene Expression Profile Differences between the two long cancer cell lines A549 and H23

23 Implications of Gene Ontology (II) Basis for cross genome comparison and integrating knowledge from different model systems.

24 Tools associated with GO A comprehensive list at GO web site.list Tools for browsing, AmiGO, QuickGO at EBI, etc. Tools for analyzing array data such as FuncAssociate, etc.

25 Using GO to gain comprehensive understanding of cellular differences Practice: Load a probe set list to FuncAssociate to identify over-represented GO


Download ppt "Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory."

Similar presentations


Ads by Google