Download presentation

Presentation is loading. Please wait.

Published byBrody Douglas Modified over 2 years ago

1
Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab

2
>Saccharomyces cerevisiae chr V CGTCTCCTCCAAGCCCTGTTGTCTCTTACCC GGATGTTCAACCAAAAGCTACTTACTACCTT TATTTTATGTTTACTTTTTATAGATTGTCTT TTTATCCTACTCTTTCCCACTTGTCTCTCGC TACTGCCGTGCAACAAACACTAAATCAAAAC AGTGAAATACTACTACATCAAAACGCATATT CCCTAGAAAAAAAAATTTCTTACAATATACT ATACTACACAATACATAATCACTGACTTTCG TAACAACAATTTCCTTCACTCTCCAACTTCT CTGCTCGAATCTCTACATAGTAATATTATAT CAAATCTACCGTCTGGAACATCATCGCTATC CAGCTCTTTGTGAACCGCTACCATCAGCATG TACAGTGGTACCTTCGTGTTATCTGCAGCGA GAACTTCAACGTTTGCCAAATCAAGCCAATG TGGTAACAACCACACCTCCGAAATCTGCTCC AAAAGATACTCCAGTTTCTGCCGAAATGTTT What information controlling gene expression is encoded in genome sequences? PROBLEM Features?

3
Gene Expression: Experiment

4
Gene Expression: Mechanism Promoter Structure

5
Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, …

6
Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, … PRIORS from sequence conservation

7
Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, … { S 1 > 0; S 2 > 0; min( | P 1j – P 2j | < 30 ) } S1S1 S2S2 j

8
Conserved Word Pairs Finding Conserved Words Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing Validate with Gene Expression

9
Finding Conserved Words > MET28_YAP5 Scer AACCTAAAACCAAAAAAAA-A-AAATAAGTCACGTGCACT Spar AATAAAAAATAGACTAACA-A-ATTGCGGTCACGTGCACT Smik AATCCCAGGCCAAAAACCAGA-AATTGAGTCACGTGCAGT Sbay GTCACGTGCCCGACGGCCCCACAACTGTGGCATCCATCTT 3860 CLUSTALW alignments from MIT Conserved word C w Found in same position in 3+ genomes within 600 bp of gene start

10
Conserved Word Pairs Finding Conserved Words Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing Validate with Gene Expression

11
TEST: Joint Word Conservation Word 2 Conserved Word 1 Conserved Y Y N N S1S1 S2S2

12
TEST: Joint Word Conservation Word 2 Conserved Word 1 Conserved Y Y N N 32162 134 3226 Chi-square Test for Independence (Yates adjustment) S1S1 S2S2

13
TEST: Joint Word Conservation 2090 words (Length 6: Word-Rev complement ) 2.06 10 6 Word PAIRS (Exclude overlap) Yates adjusted 2 -log 10 ( Frequency ) HISTOGRAM Empirical 2 0.005 = 30 Bonferroni 2 0.05 = 31

14
TEST: Close Spacing P1P1 P2P2 For conserved genes g : 1 … N J g = ( P 1g, P 2g ) sampled jointly from position distributions Test Statistic NULL:X obtained from random sampling ALT: X smaller than expected from sampling

15
TEST: Close Spacing Nonparametric Bootstrap 1234567812345678 1234567812345678 1234567812345678 1234567812345678 Avg. Min. Distance X = 32 bp

16
TEST: Close Spacing Nonparametric Bootstrap 1234567812345678 1234567812345678 7824163578241635 2368147523681475 Bootstrap Distance X *b = 65 bp

17
TEST: Close Spacing Position distributions 1 = { P 11, …, P 1v } 2 = { P 21, …, P 2w } Data J 1 = ( P 11, P 21 ), …, J n = ( P 1n, P 2n ) Bootstrap J 1 *b = ( P 11 *b, P 21 *b ), …, J n = ( P 1n *b, P 2n *b ) resampled with replacement from 1, 2 Record quantile of X in 100000 samples of X *b (empirical null distribution) Nonparametric Bootstrap

18
TEST: Close Spacing HISTOGRAM of Bootstrap Quantiles for Word Pairs

19
Conserved Word Pairs Finding Conserved Words Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing Validate with Gene Expression

20
Validating Expression Subsets { S 1 > 0; S 2 > 0; min( | P 1j – P 2j | < d ) } S1S1 S2S2 j Conserved Sequence (SUBSETTING) Rules ? Genome (6000 genes) SUBSET (N genes) Assess gene expression

21
Validating Expression Subsets Some Nonparametric Tests 1)Subset mean (Mann-Whitney) 2)Subset distribution (Kolmogorov-Smirnov) 3)Subset weighted correlation ( ? ) 4)Kernel density classification (Mixture of normals) Find optimal word pair distance d* for each test …

22
Validating Expression Subsets Kolmogorov-Smirnov log 2 (Gene expression) Frequency

23
Conserved Word Pairs – EXAMPLES aa / N starvCadmiumHeat shock AACTGT-CACGTG (Cbf1-Met31) N 29 Pair 2 55 Spacing p0.01 189 bp YHR112C_YHR113 63 bp MET28_YAP5 p < 10 -8 p < 10 -7

24
Conserved Word Pairs – EXAMPLES AAACGA-CCGATA (Upc2-Hap1) N 32 Pair 2 39.5 Spacing p0 MMSOther DNA damage p < 10 -5 34 bp 101 bp

25
Pipeline Summary # Genes > 10 Spacing < 0.05 K-S 10 Known TF

26
Future Directions Better gene expression subset tests (Timecourse) More flexible sequence models (IUPAC, Self-dimer) Automate distance cutoff (Distance d) Parameter optimization: 8 threshold values! ( Conserved: # aligned genomes & # upstream bp, Joint conservation 2, Bootstrap quantile, K-S probs, Min gene #, Distance d )

27
Acknowledgements Michael Eisen Eisen Lab Justin Fay Audrey Gasch Hunter Fraser Venky Nandagopal Dan Pollard Ben Lewis

28
Validating Expression Subsets Kolmogorov-Smirnov

29
Comparing Expression Samples

Similar presentations

OK

Chapter 5 Joint Probability Distributions and Random Samples 5.1 - Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.

Chapter 5 Joint Probability Distributions and Random Samples 5.1 - Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on crash fire tenders Ppt on reaction intermediates Ppt on historical places in india free download Ppt on mpeg audio compression and decompression Ppt on edge detection python Latest ppt on global warming Ppt on allergic contact dermatitis Ppt on earth and space science Ppt on forward rate agreement hedging Ppt on service oriented architecture for dummies