Presentation is loading. Please wait.

Presentation is loading. Please wait.

[Bejerano Fall10/11] 1.

Similar presentations


Presentation on theme: "[Bejerano Fall10/11] 1."— Presentation transcript:

1 http://cs273a.stanford.edu [Bejerano Fall10/11] 1

2 2 Lecture 13 Cis-Regulation cont’d GREAT

3 http://cs273a.stanford.edu [Bejerano Fall10/11] 3 Gene Regulation gene (how to) control region (when & where) DNA DNA binding proteins RNA gene Protein coding

4 http://cs273a.stanford.edu [Bejerano Fall10/11] 4 Pol II Transcription Key components: Proteins DNA sequence DNA epigenetics Protein components: General Transcription factors Activators Co-activators

5 http://cs273a.stanford.edu [Bejerano Fall10/11] 5 Enhancers

6 http://cs273a.stanford.edu [Bejerano Fall10/11] 6 Vertebrate Gene Regulation gene (how to) control region (when & where) DNA proximal: in 10 3 letters distal: in 10 6 letters DNA binding proteins

7 http://cs273a.stanford.edu [Bejerano Fall10/11] 7 Gene Expression Domains: Independent

8 http://cs273a.stanford.edu [Bejerano Fall10/11] 8 Distal Transcription Regulatory Elements

9 http://cs273a.stanford.edu [Bejerano Fall10/11] 9 Repressors / Silencers

10 http://cs273a.stanford.edu [Bejerano Fall10/11] 10 What are Enhancers? What do enhancers encode? Surely a cluster of TF binding sites. [but TFBS prediction is hard, fraught with false positives] What else? DNA Structure related properties? So how do we recognize enhancers? Sequence conservation across multiple species [weak but generic] Verifying repressors is trickier [loss vs. gain of function]. How do you predict an enhancer from a repressor? Duh... repressors Repressors

11 http://cs273a.stanford.edu [Bejerano Fall10/11] 11 Insulators

12 http://cs273a.stanford.edu [Bejerano Fall10/11] 12 Cis-Regulatory Components Low level (“atoms”): Promoter motifs (TATA box, etc) Transcription factor binding sites (TFBS) Mid Level: Promoter Enhancers Repressors/silencers Insulators/boundary elements Cis-regulatory modules (CRM) Locus control regions (LCR) High Level: Epigenetic domains / signatures Gene expression domains Gene regulatory networks (GRN)

13 http://cs273a.stanford.edu [Bejerano Fall10/11] 13 Disease Implications: Genes genome gene protein Limb Malformation Over 300 genes already implicated in limb malformations.

14 http://cs273a.stanford.edu [Bejerano Fall10/11] 14 Disease Implications: Cis-Reg genome gene NO protein made Limb Malformation Growing number of cases (limb, deafness, etc).

15 http://cs273a.stanford.edu [Bejerano Fall10/11] 15 Transcription Regulation & Human Disease [Wang et al, 2000]

16 http://cs273a.stanford.edu [Bejerano Fall10/11] 16 Critical regulatory sequences Lettice et al. HMG 2003 12: 1725-35 Single base changes Knock out

17 http://cs273a.stanford.edu [Bejerano Fall10/11] 17 Other Positional Effects [de Kok et al, 1996]

18 http://cs273a.stanford.edu [Bejerano Fall10/11] 18 Genomewide Association Studies point to non-coding DNA

19 http://cs273a.stanford.edu [Bejerano Fall10/11] 19 WGA Disease

20 9p21 Cis effects http://cs273a.stanford.edu [Bejerano Fall10/11] 20 Follow up study:

21 http://cs273a.stanford.edu [Bejerano Fall10/11] 21 Cis-Regulatory Evolution: E.g., obile Elements [Yass is a small town in New South Wales, Australia.] Gene What settings make these “co-option” events happen? Gene

22 http://cs273a.stanford.edu [Bejerano Fall10/11] 22 Britten & Davidson Hypothesis: Repeat to Rewire! [Britten & Davidson, 1971] [Davidson & Erwin, 2006]

23 http://cs273a.stanford.edu [Bejerano Fall10/11] 23 Modular: Most Likely to Evolve? ChimpHuman

24 24 Human Accelerated Regions Human-specific substitutions in conserved sequences 24 [ Pollard, K. et al., Nature, 2006] [Prabhakar, S. et al., Science, 2008] [Beniaminov, A. et al., RNA, 2008] Human Chimp

25 http://GREAT.stanford.eduhttp://GREAT.stanford.edu: Generating Functional Hypotheses from Genome-Wide Measurements of Mammalian Cis-Regulation 25 Gill Bejerano Dept. of Developmental Biology & Dept. of Computer Science Stanford University http://bejerano.stanford.edu

26 26 Human Gene Regulation All these cells have the same Genome. Gene 20,000 Genes encode how to make proteins. 1,000,000 Genomic “switches” determine which and how much proteins to make. 10 13 different cells in an adult human. Hundreds of different cell types.

27 http://bejerano.stanford.edu 27 Most Non-Coding Elements likely work in cis… 9Mb “IRX1 is a member of the Iroquois homeobox gene family. Members of this family appear to play multiple roles during pattern formation of vertebrate embryos.” gene deserts regulatory jungles Every orange tick mark is roughly 100-1,000bp long, each evolves under purifying selection, and does not code for protein.

28 http://bejerano.stanford.edu 28 Many non-coding elements tested are cis-regulatory

29 http://bejerano.stanford.edu 29 Combinatorial Regulatory Code Gene 2,000 different proteins can bind specific DNA sequences. A regulatory region encodes 3-10 such protein binding sites. When all are bound by proteins the regulatory region turns “on”, and the nearby gene is activated to produce protein. Proteins DNA Protein binding site

30 ChIP-Seq: first glimpses of the regulatory genome in action Cis-regulatory peak 30 http://bejerano.stanford.edu Peak Calling

31 Gene transcription start site What is the transcription factor I just assayed doing? Cis-regulatory peak 31 http://bejerano.stanford.edu Collect known literature of the form Function A: Gene1, Gene2, Gene3,... Function B: Gene1, Gene2, Gene3,... Function C:... Ask whether the binding sites you discovered are preferentially binding (regulating) any one or more of the functions listed above. Form hypothesis and perform further experiments.

32 Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile 32 Gene transcription start site SRF binding ChIP-seq peak ChIP-seq identified 2,429 SRF binding peaks in human Jurkat cells 1 SRF is known as a “master regulator of the actin cytoskeleton” In the ChIP-Seq peaks, we expect to find binding sites regulating (genes involved in) actin cytoskeleton formation. [1] Valouev A. et al., Nat. Methods, 2008 http://bejerano.stanford.edu

33 Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile 33 Existing, gene-based method to analyze enrichment: Ignore distal binding events. Count affected genes. Rank by enrichment hypergeometric p-value. Gene transcription start site SRF binding ChIP-seq peak Ontology term (e.g. ‘actin cytoskeleton’) N = 8 genes in genome K = 3 genes annotated with n = 2 genes selected by proximal peaks k = 1 selected gene annotated with P = Pr(k ≥1 | n=2, K =3, N=8) http://bejerano.stanford.edu

34 We have (reduced ChIP-Seq into) a gene list! What is the gene list enriched for? 34 Microarray tool Microarray data Deep sequencing data http://bejerano.stanford.edu Pro: A lot of tools out there for the analysis of gene lists. Cons: These tools are built for microarray analysis. Does it matter ??

35 SRF Gene-based enrichment results 35 Original authors can only state: “basic cellular processes, particularly those related to gene expression” are enriched 1 [1] Valouev A. et al., Nat. Methods, 2008 SRF SRF acts on genes both in nucleus and cytoplasm, that are involved in transcription and various types of binding 35 http://bejerano.stanford.edu Where’s the signal? Top “actin” term is ranked #28 in the list.

36 Associating only proximal peaks loses a lot of information 36 Relationship of binding peaks to nearest genes for eight human (H) and mouse (M) ChIP-seq datasets Restricting to proximal peaks often leads to complete loss of key enrichments http://bejerano.stanford.edu

37 Bad Solution: Associating distal peaks brings in many false enrichments 37 Why bad? 14% of human genes tagged ‘multicellular organismal development’. But 33% of base pairs have such a gene nearest upstream/downstream. http://bejerano.stanford.edu Term Bonferroni corrected p-value nervous system development 5x10 -9 system development 8x10 -9 anatomical structure development 7x10 -8 multicellular organismal development 1x10 -7 developmental process 2x10 -6 SRF ChIP-seq set has 2,000+ binding events. Throw a random set of 2,000 regions at the genome. What do you get from a gene list analysis? Regulatory jungles are often next to key developmental genes

38 Real Solution: Do not convert to gene list. Analyze the set of genomic regions 38 Gene transcription start site Ontology term ( ‘actin cytoskeleton’) P = Pr binom (k ≥5 | n=6, p =0.33) p = 0.33 of genome annotated with n = 6 genomic regions k = 5 genomic regions hit annotation http://bejerano.stanford.edu Gene regulatory domain Genomic region (ChIP-seq peak) Since 33% of base pairs are near a ‘multicellular organismal development’ gene, we now expect 33% of genomic regions to hit this term by chance. => Toss 2,000 random regions at genome, get NO (false) enrichments. GREAT = Genomic Regions Enrichment of Annotations Tool

39 How does GREAT know how to assign distal binding peaks to genes? 39 Future: High-throughput assays based on chromosome conformation capture (3C) methods will elucidate complex regulation mechanisms Currently: Flexible computational definitions allow assignment of peaks to nearest gene, nearest two genes, etc. Default: each gene has a “basal regulatory domain” of 5 kb up- and 1kb downstream of transcription start site, extends to basal domain of nearest genes within 1 Mb Though some associations may be missed or incorrect, in general signal richness and robustness is greatly improved by associating distal peaks http://bejerano.stanford.edu

40 GREAT infers many specific functions of SRF from its binding profile 40 Ontology Term # Genes Binomial Experimental P-value support * Gene Ontology actin cytoskeleton actin binding 7x10 -9 5x10 -5 Miano et al. 2007 * Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT. 30 31 Pathway Commons TRAIL signaling Class I PI3K signaling 5x10 -7 2x10 -6 Bertolotto et al. 2000 Poser et al. 2000 32 26 TreeFam 1x10 -8 5 Chai & Tarnawski 2002 TF Targets Targets of SRF Targets of GABP Targets of YY1 Targets of EGR1 5x10 -76 4x10 -9 1x10 -6 2x10 -4 Positive control ChIp-Seq support Natesan & Gilman 1995 84 28 44 23 Top gene-based enrichments of SRF Top GREAT enrichments of SRF (top actin-related term 28 th in list) FOS gene family http://bejerano.stanford.edu Similar results for GABP, NRSF, Stat3, p300 ChIP-Seq [McLean et al., Nat Biotechnol., 2010]

41 GREAT data integrated 41 Michael Hiller Twenty ontologies spanning broad categories of biology 44,832 total ontology terms tested in each GREAT run (2,800 terms) (5,215) (834) (5,781) (427) (456) (150) (1,253) (288) (706) (6,700) (3,079) (911) (615) (19) (222) (9) (6,857) (8,272) (238) http://bejerano.stanford.edu

42 GREAT implementation Can handle datasets of hundreds of thousands of genomic regions Testing a single ontology term takes ~1 ms Enables real-time calculation of enrichment results for all ontologies 42 http://bejerano.stanford.edu Cory McLean

43 43 GREAT web app: input page Dave Bristor Pick a genome assembly Input BED regions of interest http://great.stanford.edu http://bejerano.stanford.edu

44 44 Additional ontologies, term statistics, multiple hypothesis corrections, etc. GREAT web app: output summary Ontology-specific enrichments http://bejerano.stanford.edu

45 45 GREAT web app: term details page Frame holding http://www.geneontology.org definition of “actin binding” http://www.geneontology.org Genes annotated as “actin binding” with associated genomic regions Genomic regions annotated with “actin binding” Drill down to explore how a particular peak regulates Plectin and its role in actin binding http://bejerano.stanford.edu

46 You can also submit any track straight from the UCSC Table Browser 46 http://bejerano.stanford.edu A simple, well documented programmatic interface allows any tool to submit directly to GREAT. See our Help. Inquiries welcome!

47 GREAT web app: export data 47 HTML output displays all user selected rows and columns Tab-separated values also available for additional postprocessing http://bejerano.stanford.edu

48 External Web Stats: Catching On 48 http://bejerano.stanford.edu last 500 entries only

49 Current technologies identify cis-regulatory sequences GREAT accurately assesses functional enrichments of cis- regulatory sequences using a genomic region-based approach [McLean et al., Nat Biotechnol., 2010] Online tool available (version 1.5 coming soon, in QA) http://great.stanford.edu GREAT is immediately applicable to all sets with a significant cis-regulatory content: Regulatory Chromatin Markers (e.g., H3K4me1) Genome Wide Association Studies (GWAS) Comparative Genomics sets (e.g., ultraconserved elements) 49 Summary http://bejerano.stanford.edu

50 Acknowledgments GREAT developers Cory McLean Dave Bristor Michael Hiller Shoa Clarke Craig Lowe Aaron Wenger Gill Bejerano 50 Other help Fah Sathira Marina Sirota Bruce Schaar Terry Capellini Christopher Meyer Jennifer Hardee http://great.stanford.edu http://bejerano.stanford.edu


Download ppt "[Bejerano Fall10/11] 1."

Similar presentations


Ads by Google