Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab.

Similar presentations


Presentation on theme: "Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab."— Presentation transcript:

1 Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab

2 >Saccharomyces cerevisiae chr V CGTCTCCTCCAAGCCCTGTTGTCTCTTACCC GGATGTTCAACCAAAAGCTACTTACTACCTT TATTTTATGTTTACTTTTTATAGATTGTCTT TTTATCCTACTCTTTCCCACTTGTCTCTCGC TACTGCCGTGCAACAAACACTAAATCAAAAC AGTGAAATACTACTACATCAAAACGCATATT CCCTAGAAAAAAAAATTTCTTACAATATACT ATACTACACAATACATAATCACTGACTTTCG TAACAACAATTTCCTTCACTCTCCAACTTCT CTGCTCGAATCTCTACATAGTAATATTATAT CAAATCTACCGTCTGGAACATCATCGCTATC CAGCTCTTTGTGAACCGCTACCATCAGCATG TACAGTGGTACCTTCGTGTTATCTGCAGCGA GAACTTCAACGTTTGCCAAATCAAGCCAATG TGGTAACAACCACACCTCCGAAATCTGCTCC AAAAGATACTCCAGTTTCTGCCGAAATGTTT What information controlling gene expression is encoded in genome sequences? PROBLEM Features?

3 Gene Expression: Experiment

4 Gene Expression: Mechanism Promoter Structure

5 Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, …

6 Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, … PRIORS from sequence conservation

7 Conserved Sequence Rules Hidden Variables Sites S 1, S 2, … PositionsP 11, P 12, … P 21, P 22, … { S 1 > 0; S 2 > 0; min( | P 1j – P 2j | < 30 ) } S1S1 S2S2 j

8 Conserved Word Pairs  Finding Conserved Words  Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing  Validate with Gene Expression

9 Finding Conserved Words > MET28_YAP5 Scer AACCTAAAACCAAAAAAAA-A-AAATAAGTCACGTGCACT Spar AATAAAAAATAGACTAACA-A-ATTGCGGTCACGTGCACT Smik AATCCCAGGCCAAAAACCAGA-AATTGAGTCACGTGCAGT Sbay GTCACGTGCCCGACGGCCCCACAACTGTGGCATCCATCTT  3860 CLUSTALW alignments from MIT  Conserved word C w  Found in same position in 3+ genomes within 600 bp of gene start

10 Conserved Word Pairs  Finding Conserved Words  Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing  Validate with Gene Expression

11 TEST: Joint Word Conservation Word 2 Conserved Word 1 Conserved Y Y N N S1S1 S2S2

12 TEST: Joint Word Conservation Word 2 Conserved Word 1 Conserved Y Y N N Chi-square Test for Independence (Yates adjustment) S1S1 S2S2

13 TEST: Joint Word Conservation  2090 words (Length 6: Word-Rev complement )  2.06  10 6 Word PAIRS (Exclude overlap) Yates adjusted  2 -log 10 ( Frequency ) HISTOGRAM Empirical  = 30 Bonferroni  = 31

14 TEST: Close Spacing P1P1 P2P2 For conserved genes g : 1 … N J g = ( P 1g, P 2g ) sampled jointly from position distributions Test Statistic NULL:X obtained from random sampling ALT: X smaller than expected from sampling

15 TEST: Close Spacing Nonparametric Bootstrap Avg. Min. Distance X = 32 bp

16 TEST: Close Spacing Nonparametric Bootstrap Bootstrap Distance X *b = 65 bp

17 TEST: Close Spacing  Position distributions  1 = { P 11, …, P 1v }  2 = { P 21, …, P 2w }  Data J 1 = ( P 11, P 21 ), …, J n = ( P 1n, P 2n )  Bootstrap J 1 *b = ( P 11 *b, P 21 *b ), …, J n = ( P 1n *b, P 2n *b ) resampled with replacement from  1,  2  Record quantile of X in samples of X *b (empirical null distribution) Nonparametric Bootstrap

18 TEST: Close Spacing HISTOGRAM of Bootstrap Quantiles for Word Pairs

19 Conserved Word Pairs  Finding Conserved Words  Evaluate Word Pairs 1) Joint Conservation 2) Close Spacing  Validate with Gene Expression

20 Validating Expression Subsets { S 1 > 0; S 2 > 0; min( | P 1j – P 2j | < d ) } S1S1 S2S2 j Conserved Sequence (SUBSETTING) Rules ? Genome (6000 genes) SUBSET (N genes) Assess gene expression

21 Validating Expression Subsets Some Nonparametric Tests 1)Subset mean (Mann-Whitney) 2)Subset distribution (Kolmogorov-Smirnov) 3)Subset weighted correlation ( ? ) 4)Kernel density classification (Mixture of normals) Find optimal word pair distance d* for each test …

22 Validating Expression Subsets Kolmogorov-Smirnov log 2 (Gene expression) Frequency

23 Conserved Word Pairs – EXAMPLES aa / N starvCadmiumHeat shock AACTGT-CACGTG (Cbf1-Met31) N 29 Pair  2 55 Spacing p bp YHR112C_YHR bp MET28_YAP5 p < p < 10 -7

24 Conserved Word Pairs – EXAMPLES AAACGA-CCGATA (Upc2-Hap1) N 32 Pair  Spacing p0 MMSOther DNA damage p < bp 101 bp

25 Pipeline Summary # Genes > 10 Spacing < 0.05 K-S 10 Known TF

26 Future Directions  Better gene expression subset tests (Timecourse)  More flexible sequence models (IUPAC, Self-dimer)  Automate distance cutoff (Distance d)  Parameter optimization: 8 threshold values! ( Conserved: # aligned genomes & # upstream bp, Joint conservation  2, Bootstrap quantile, K-S probs, Min gene #, Distance d )

27 Acknowledgements Michael Eisen Eisen Lab Justin Fay Audrey Gasch Hunter Fraser Venky Nandagopal Dan Pollard Ben Lewis

28 Validating Expression Subsets Kolmogorov-Smirnov

29 Comparing Expression Samples

30


Download ppt "Prediction of Gene Expression in Yeast using Conserved Sequence Templates Derek Chiang & Alan Moses Molecular & Cell Biology, UCB Eisen Lab."

Similar presentations


Ads by Google