Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.

Similar presentations


Presentation on theme: "Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall."— Presentation transcript:

1 Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall (JHN) 022 Introduction to Probabilistic Models for Computational Biology 1

2 Review: Gene Regulation AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC DNA AUGUGGAUUGUU AUGCGCGUC AUGUUACGCACCUAC AUGAUUGAU RNA Protein MWIV MRV MLRTY MID Gene AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC Genes regulate each others’ expression and activity. AUGCGCGUC MRV Genetic regulatory network gene RNA degradation MID AUGAUUAU AUGAUUGAU MID “Gene Expression” a switch! (“transcription factor binding site”) Gene regulation transcription translation

3 Review: Variations in the DNA AGATATGTGGATTGTTAGGATTTATGCGCGTCAGTGACTACGCATGTTACGCACCTACGACTAGGTAATGATTGATC Genetic regulatory network “Single nucleotide polymorphism (SNP)” AUGUGGAUUGUU AUGCGCGUC AUGUUACGCACCUAC AUGAUUGAU RNA Protein MWIV MRV MLRTY MID gene C X T XXX A G X T X C X L C X X T X U X X Sequence variations perturb the regulatory network.

4 4 Outline Probabilistic models in biology Model selection problems Mathematical foundations Bayesian networks Probabilistic Graphical Models: Principles and Techniques, Koller & Friedman, The MIT Press Learning from data Maximum likelihood estimation Expectation and maximization

5 5 Example 1 How a change in a nucleotide in DNA, blood pressure and heart disease are related? There can be several “models”… Blood pressure Heart disease OR DNA alteration Blood pressure Heart disease DNA alteration Blood pressure Heart disease DNA alteration

6 6 Example 2 How genes A, B and C regulate each other’s expression levels (mRNA levels) ? There can be several models… A BC A BC A BC OR ?

7 7 Gene A Gene B Gene C Exp 1Exp 2Exp N … A BC A BC A BC OR ? Statistical dependencies between expression levels of genes A, B, C? Probability that model x is true given the data Model selection: argmax x P(model x is true | Data) N instances Model IModel IIModel III Probabilistic graphical models A graphical representation of statistical dependencies.

8 8 Outline Probabilistic models in biology Model selection problem Mathematical foundations Bayesian networks Learning from data Maximum likelihood estimation Expectation and maximization

9 9 Probability Theory Review Assume random variables Val(A)={a 1,a 2,a 3 }, Val(B)={b 1,b 2 } Conditional probability Definition Chain rule Bayes’ rule Probabilistic independence

10 10 Probabilistic Representation Joint distribution P over {x 1,…, x n } x i is binary 2 n -1 entries If x’s are independent P(x) = p(x 1 ) … p(x n )

11 11 Conditional Parameterization The Diabetes example Genetic risk (G), Diabetes (D) Val (G) = {g 1,g 0 }, Val (D) = {d 1,d 0 } P(G,D) = P(G) P(D|G) P(G): Prior distribution P(D|G): Conditional probabilistic distribution (CPD) Genetic risk Diabetes

12 12 Naïve Bayes Model - Example Elaborating the diabetes example, Genetic Risk (G), Diabetes (D), Hypertension (H) Val (G) = {g 1,g 0 }, Val (D) = {d 1,d 0 }, Val (H) = {h 1,h 0 } 8 entries If S and G are independent given I, P(G,D,H) = P(G)P(D|G)P(H|G) 5 entries; more compact than joint Genetic risk DiabetesHypertension

13 13 Naïve Bayes Model A class C where Val (C) = {c 1,…,c k }. Finding variables x 1,…,x n Naïve Bayes assumption The findings are conditionally independent given the individual’s class. The model factorizes as: The Diabetes example class: Genetic risk, findings: Diabetes, Hypertension

14 14 Naïve Bayes Model - Example Medical diagnosis system Class C: disease Findings X: symptoms Computing the confidence: Drawbacks Strong assumptions

15 15 Bayesian Network Directed acyclic graph (DAG) Node: a random variable Edge: direct influence of one node on another The Diabetes example revisited Genetic risk (G), Diabetes (D), Hypertension (H) Val (G) = {g 1,g 0 }, Val (D) = {d 1,d 0 }, Val (H) = {h 1,h 0 } Genetic risk DiabetesHypertension

16 Bayesian Network Semantics A Bayesian network structure G is a directed acyclic graph whose nodes represent random variables X 1,…,X n. PaX i : parents of X i in G NonDescendantsX i : variables in G that are not descendants of X i. G encodes the following set of conditional independence assumptions, called the local Markov assumptions, and denoted by I L (G): For each variable X i : x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x3x3 x7x7 x 11 x 10 x8x8 x9x9 16

17 17 The Genetics Example Variables B: blood type (a phenotype) G: genotype of the gene that encodes a person’s blood type;,,,,,

18 18 Bayesian Network Joint Distribution Let G be a Bayesian network graph over the variables X 1,…,X n. We say that a distribution P factorizes according to G if P can be expressed as: A Bayesian network is a pair (G,P) where P factorizes over G, and where P is specified as a set of CPDs associated with G’s nodes.

19 19 The Student Example More complex scenario Course difficulty (D), quality of the recommendation letter (L), Intelligence (I), SAT (S), Grade (G) Val(D) = {easy, hard}, Val(L) = {strong, weak}, Val(I) = {i 1,i 0 }, Val (S) = {s 1,s 0 }, Val (G) = {g 1,g 2,g 3 } Joint distribution requires 47 entries

20 20 The Student Bayesian network Joint distribution P(I,D,G,S,L) = from Koller & Friedman

21 21 Parameter Estimation Assumptions Fixed network structure Fully observed instances of the network variables: D={d[1],…,d[M]} Maximum likelihood estimation (MLE)! “Parameters” of the Bayesian network For example, {i0,d1,g1,l0,s0} from Koller & Friedman

22 22 Outline Probabilistic models in biology Model selection problem Mathematical foundations Bayesian networks Learning from data Maximum likelihood estimation Expectation and maximization

23 23 Acknowledgement Profs Daphne Koller & Nir Friedman, “Probabilistic Graphical Models”


Download ppt "Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall."

Similar presentations


Ads by Google