Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of count scores for weight matrix motifs Project Presentation for CS598SS Hong Cheng and Qiaozhu Mei.

Similar presentations


Presentation on theme: "Evaluation of count scores for weight matrix motifs Project Presentation for CS598SS Hong Cheng and Qiaozhu Mei."— Presentation transcript:

1 Evaluation of count scores for weight matrix motifs Project Presentation for CS598SS Hong Cheng and Qiaozhu Mei

2 Problem Background Understand the mechanism of gene regulation and predict the gene regulation. Need a quantitative measure of the strength of a TF Binding Site correlated in a gene sequence. This measure can be used as an important feature in the study of gene regulation.

3 Project Background (cont.) There are no standard criteria for such a measure. But we expect a good measure can model –The quality of a Binding Site. –The occurrence of a binding site in the sequence. There can be many choices for such a measure, but which one is better…?

4 Project Overview Project Goal Possible scoring measures Evaluate a score: Constraint Analysis Experiments and results Current Status and Future Work

5 Goal of the project Three Steps: –Formalize the problem of counting score of weight matrix motifs and propose an evaluation mechanism. –Evaluate the existing scoring methods of weight matrix motifs. –Either suggest a good motif counting method or propose a new score better than existing scores.

6 Possible scoring measures Simple Counting –Match or not match Likelihood Sum –Data likelihood of a site is generated by a motif: sum over all possible sites Model based scores Free Energy Normalization of existing scores

7 Simple counting Simple Counting (match or not match) –Doesn’t work for fuzzy motifs –Variation: count a motif if for a subsequence, P(s|w) is above a threshold. Likelihood sum score –A soft version of simple counting –Ad hoc, doesn’t have a sound probabilistic interpretation

8 Model Based Scores Consider the sequence to be generated by a model involving a set of motifs HMM model: Stubb (Sinha et al 2003) –Count of a motif as an average number of times the motif is planted in the sequence Two options: –With fixed transition probabilities –Fitting transition probabilities by unsupervised learning

9 Other Possible Scores Free Energy : –  : a set of motifs M and model parameters –  b : model parameters and only backgrounds –F(s,  ) = log( Pr(S|  )/ Pr(S|  )) –Models the score of a sequence and a set of motifs, cannot give score of a specific motif unless run the computation for only one motif Normalization of Existing Scores –Estimating P(C >= x) instead of #sequences –Use well known normalization methods to normalize the actual counts –Min-Max; Z-Score ( Z = (N-E)/S)

10 Question: What makes a good score for weight matrix motif?

11 Evaluation of scores Empirical evaluation with Lab Experiments –Comparing the score with lab experiments to see the effectiveness –ChIP to Chip studies –Problem: Lab experiment data not easy to get Performance may vary over species (thus may be biased) Analytical evaluation: heuristic constraints

12 Analytical Evaluation with Heuristic Constrains There are many heuristic constraints which we expect a good score will satisfy The effectiveness of a score can be implied by how good it satisfies the constraints Whether a score satisfy a constraint can be studied analytically or with experiments on random data Combining with empirical evaluation, constraint analysis can tell us why a score is better than others, and help us defining a new score.

13 Heuristic Constraints (I) Formalization –Motif PWMs: w, M –Sequences: S –Possible binding sites: s –Score of the n th run: C n (S, w) Motif Quality Constraint –Focus on quality of sites –Contribution of a motif w with length l on sites –For one motif w, two sequence S1, S2, a site position [i, i + l - 1]. S1[i, i + l - 1] != S2[i, i + l - 1] and other positions are the same. –If I(S1[i, i + l - 1] ) <= I(S2[i, i + l - 1] ) –C(S1, w) >= C(S2, w)

14 Heuristic Constraints (II) Motif Length Constraint –For two motifs w1 and w2, length(w1) = length(w2) + 1. For any position i <= length(w2), the multinomial vector w1(i) = w2(i). –Compute the score of M1 and M2 on one sequence S independently –C(S, w1)<= C(S, w2) Motif Sharpness Constraint –For two motifs w1 and w2, length(w1) = length(w2), if for any position i, j = 0, 1, 2, w1(i, j) w2(i, 3) –(w1 is sharper than w2) –Compute the score of w1 and w2 on a large number of sequences independently –Expectation [C(w1)]<= Expectation [C(w2)]

15 Heuristic Constraints (III) Motif Probability Constraint –For one motif w, one sequence S, if we compute the score C(S, w) two times and give higher probability to w in the second run –(e.g. transition probability or prior probability in HMM) –E.g. p 1 < p 2 –C 1 (S, w) <= C 2 (S, w) Motif Competition Constraint –For two motifs w1 and w2, one sequence S. First compute the score for w1 only, then compute considering the co-occurrences of w1 and w2. –C 1 (S, w1) >= C 2 (S, w2)

16 Heuristic Constraints (IV) Deterministic Constraint –One motif w, one sequence S, if we compute the score of w twice with no parameter changing, –C 1 (S, w) = C 2 (S, w) Upper Bound Constraint –An existing set of motifs M, a sequence S. if we adding a new motif w n and compute the scores for M and w n again, – –But cannot exceed an upper bound (e.g. the length of S)

17 A summary of constraints The heuristic constraints can allow us to analyze the effectiveness of a score without doing experiments. In experiments show that one score is better than others, the heuristic constraints can indicate why it is better. Difficult to find a close set of constraints Some constraints are closely related (maybe not orthogonal, though not redundant)

18 Experiment Design Regular (comparing distribution): –Method Stubb with learnt p Simple Count –Data Real motifs, real sequence data Real motifs, random generated very long sequence (say, 10k~100k) Random motifs, including long, short, fuzzy and sharp combinations, random long sequence

19 Experiment Design Stubb with Fixed Prior Probability –Vary prior prob p : 0.0001, …, 0.001, …, 0.01… –Data Real motifs, random generated long sequence Random motifs, including long, short, fuzzy and sharp combinations, random generated long sequence –See score distribution

20 Experiment Design: Constraints Motif Length: –Random generated motifs (uniform, varying length), random generated long sequence. –Random generated motifs (uniform, varying length), real sequences Motif sharpness: –Random generated motifs (varying sharpness, equal length), random generated long sequence (100k)

21 Experiment Design: Constraints Motif Competition –Real motifs, real sequence/random sequence data –several runs: 1 st run: only motif M1 2 nd run: M1 and M2, 3 rd run: M1 and M2 and M3, … –Plot the distribution of M1 in several runs.

22 Experiment Design (cont.) Deterministic constraints: –Real motifs, real sequences, run it several times, plot the distributions of Motif 1 to see whether it changes a lot. Normalization: –Z-Score only;Min-Max only; P(C>=N) only; P(C>=N) + Z-Score; P(C>=N) + Min-Max

23 Experiment Result(1) Stubb on real sequences against real motifs Simple count on real sequences against real motifs Four motifs –Bicoid, length 11, medium sharp –Kruppel, length 9, medium sharp –Gt, length 12, a bit sharper –Hkb, length 7, sharpest, every row has one non-zero count and three 0s

24 Experiment Result (1)-Stubb

25 Experiment Result (1)-Simple Count

26 Experiment Result (1) – Normalization P(x>=N)

27 Result (1) – Normalization z-score on motif score

28 Experiment Result(2) Stubb on random sequences against random motifs Simple count on random sequences against random motifs Four motifs –Long_fuzzy, length 20, uniform –Long_sharp, length 20, sharp –Short_fuzzy, length 5, uniform –Short_sharp, length 5, sharp

29 Experiment Result(2)-Stubb

30 Experiment Result (2)-Simple Count

31 Experiment Result(3) Stubb with Fixed Prior Probability, varying p 0.0001 ~0.05 Four real motifs –Bicoid –Kruppel –Hkb –Gt Four random motifs –Long_fuzzy –Long_sharp –Short_sharp –Short_fuzzy

32 Experiment Result(3)-Bicoid

33 Experiment Result(3)-Hkb

34 Experiment Result(3)-Long_fuzzy

35 Experiment Result(3)-Short_sharp

36 Experiment Result(4)-Constraint Motif Length Test on this heuristic –Stubb –Simple Count Generate 10 random motifs, uniform, vary length from 1 to 10

37 Experiment Result(4)-Stubb

38 Experiment Result(4)-Simple count

39 Experiment Result(5)-Contraint Motif Sharpness Test on this heuristic –Stubb –Simple Count Generate 10 random motifs, length 10, vary sharpness

40 Experiment Result(5)-Stubb

41 Experiment Result(5)-Simple count

42 Experiment Result(6)-Motif Competition Test on this constraint –Stubb –Simple Count 1 st run: using bicoid only 2 nd run: using bicoid and other five motifs 3 rd run: using bicoid and other nine motifs Monitor the bicoid score

43 Experiment Result(6)-Stubb

44 Experiment Result(6)-Simple count

45 Summary ConstraintsStubbStubb_Fi xedP Likelihood Sum Probability ConstraintN/AYesN/A Motif LengthYes Motif SharpnessYes No Motif CompetitionNot clear DeterministicNoYes Upper BoundYes No Site QualityTo be done..

46 Future Work Finish constraint tests Evaluate more scores (e.g. Free Energy) Define and formalize more constraints Comparing with ChIP-chip experiment results, study the effectiveness of scores and the relation to constraints


Download ppt "Evaluation of count scores for weight matrix motifs Project Presentation for CS598SS Hong Cheng and Qiaozhu Mei."

Similar presentations


Ads by Google