Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics.
Microarray Data Analysis Day 2
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
Asking translational research questions using ontology enrichment analysis Nigam Shah
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Copyright OpenHelix. No use or reproduction without express written consent1.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
NGS Analysis Using Galaxy
Department of Biomedical Informatics Biomedical Data Visualization Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared.
Li and Dewey BMC Bioinformatics 2011, 12:323
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Detecting enriched regions (Chip- seq, RIP-seq) Statistical evaluation of enriched regions Data displayed in Genome Browser Detection of enriched motifs.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
RNA-seq workshop ALIGNMENT
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
EDACC Primary Analysis Pipelines Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Sackler Medical School
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Clustering Algorithms to make sense of Microarray data: Systems Analyses in Biology Doug Welsh and Brian Davis BioQuest Workshop Beloit Wisconsin, June.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
A collaborative tool for sequence annotation. Contact:
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Accessing and visualizing genomics data
The Genome Genome Browser Training Materials developed by: Warren C. Lathe, Ph.D. and Mary Mangan, Ph.D. Part 2.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
TRACKSTER &CIRCSTER DEMO Slides: /g/funcgen/trainings/visualization/Demos/Trackster+Circster.ppt Galaxy: Galaxy Dev:
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Visualizing data from Galaxy
Chip – Seq Peak Calling in Galaxy
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
GE3M25: Data Analysis, Class 4
Functional Annotation of the Horse Genome
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
High-Resolution Profiling of Histone Methylations in the Human Genome
High-Resolution Profiling of Histone Methylations in the Human Genome
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Volume 23, Issue 1, Pages 9-22 (January 2013)
ChIP-seq Robert J. Trumbly
Volume 132, Issue 2, Pages (January 2008)
Dynamic Regulation of Nucleosome Positioning in the Human Genome
Differential binding of H3K36me3 in G34-mutant KNS42 cells drives pediatric GBM expression signatures. Differential binding of H3K36me3 in G34-mutant KNS42.
Differential protein, mRNA, lncRNA and miRNA regulation by p53.
ChIP-seq analyses in primary in prostate tissue.
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
CaQTL analysis identifies genetic variants affecting human islet cis-RE use. caQTL analysis identifies genetic variants affecting human islet cis-RE use.
HOXA9 and STAT5 co-occupy similar genomic regions and increase JAK/STAT signaling. HOXA9 and STAT5 co-occupy similar genomic regions and increase JAK/STAT.
Presentation transcript:

Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory. 3.Load the edited file to HPC your GMS6014 directory. 4.log in to HPC, cd to GMS6014/, submit the job by: "$> qsub ChIP_Streamline.pbs"

Practice: Annotate the binding site identification 1.Observe and modify the annotate.sh file in your text editor. 2.Upload to HPC /scratch/lfs/xxx/GMS Run “$> bash annotate.sh”

Recording sequence reads from the machine – FASTQ FASTA: >My_sequence AATTACGCGCGATACGAT AATTACGCGCGATACGAT +My_sequence quality efcfffffcfeeYBBsdf Recording of quality assessment allows filtering based on sequence quality.

Recording sequence and quality information FASTQ format = FASTA + TTAATTGGTAAATAAATCTCCTAATAGCTTAGATNTTACCTTNNNNNNNNNNTAGTTTCTT +HWI-EAS209_0006_FC706VJ:5:58:5894:21141#ATCACG/1 !"#$%&'()*+,-./ :; Two identification lines +) for each sequence. Identification line format depends on specific sequencing platform. Quality line using characters representing integer values.

Paint the sequence reads to the genome HTS AATTACGCGCGATACGAT + ACCGAGGCGCGTATGTCT + efcfffffcfeeYBBsea Corresponding location on the genome ELAND (Illumina) Bowtie, etc. ChIP-Seq; RNA-Seq De novo assembly of genomes, chromatin conformation, genomic abnormality, etc…

HTS data – map to genome  “bwa” or “bowtie” are the two most popular software that implement a similar strategy (Burrows-Wheeler Transform).  Can benefit from multi-processor. map the reads to hg19. bowtie2 -x hg19 -U SRR fastq -p 2 -S Input.sam bowtie2 -x hg19 -U SRR fastq -p 2 -S P53ChIP.sam

ChIP-Seq – identifying TF binding sites  MACS- Model-based Analysis of ChIP-Seq Practice: Identifying peacks macs2 callpeak -t P53ChIP.sam -c Input.sam -f SAM -g dm -n P53_GM B

Representation of (HTS) data – BED (Browser Extensible Data) file chr U00+ chr U10- chr U20+ chr U10- chr U20+ Chrom.Start EndnameScorStrand With the completion of the genome, there is no need to record the base pair identity (if it is the same as the reference genome). Detailed description of genomic data formats:

How to gain knowledge from HTS data  Visualization of HTS data.  Discovering genomic patterns.  Identifying novel mechanism – hypothesis generation.

Visualization of HTS data. Simple visualization - distribution of tags (or normalized values). Barski et al. (2007) Cell chr chr chr chr chr Chr.ChrStart ChrEndValue BedGraph file (Wig)

Visualizing Deep Seq data with UCSC genome browser Practice & Observe I: 1.Load the track file as custom track to the browser by copy/past the URL link or upload the file. 2.View ‘dense’ and then ‘full’ presentation of the track.

Visualization of HTS data. Advanced visualization – depending on purpose of comparison. Berger et al. (2011) Nature Example - Circos plot depicts genomic location, chromosomal copy number (red, copy gain; blue, copy loss). Inter- chromosomal translocations (purple) and intra-chromosomal (green) rearrangements observed in primary prostate cancers

Identifying histone modification pattern and TF binding sites Peak-calling programs:  MACS (Model-based Analysis of ChIP-Seq ):  ChIP-seq processing pipeline:

HTS data validation Brain teaser: For TF binding sites identified by two biological ChIP-Seq replicates, which level of overlap is considered as “acceptable”? A.99% B.90% C.50% Replicate_1 Replicate_2

Discovering genomic patterns Usually requires some programming (scripting). As a biologist, you need to clearly define your question, and the logic to obtain the data summary. Barski et al. (2007) Cell

Discovering genomic patterns Q: Is H3K4me3 associated with TSS? Is such an association related to gene expression status? Logic: 1.Group genes based on expression levels obtained with a microarray study (Su et al, 2004). 2.For each gene, obtain the normalized H3K4me3 ChIP-Seq counts within [-2k, +2k] of the TSS. 3.For each of the expression group, plot the average value along the [-2k, +2k] interval.

HTS data interpretation Brain teaser: Since Gene A is expressed in higher level than Gene B, there must be more PolII binding at Gene A’s TSS. A.True B.False

Functional Analysis of HTS data  Gene Ontology –  Regulatory pathways.  Modeling & Systems Biology.

Gene Ontology – hierarchical framework of terms / concepts

Gene Ontology Goal – “produce a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing” – GO consortium Ontology: “ The branch of metaphysics that deals with the nature of being” – The American Heritage Dictionary

Implications of Gene Ontology (I) Monitoring biological processes or molecular functions beyond individual gene. Example: 1.) Which biological process (mol. Function) is activated/suppressed following a treatment?

Gene Expression Profile Differences between the two long cancer cell lines A549 and H23

Implications of Gene Ontology (II) Basis for cross genome comparison and integrating knowledge from different model systems.

Tools associated with GO A comprehensive list at GO web site.list Tools for browsing, AmiGO, QuickGO at EBI, etc. Tools for analyzing array data such as FuncAssociate, etc.

Using GO to gain comprehensive understanding of cellular differences Practice: Load a probe set list to FuncAssociate to identify over-represented GO