Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)

Slides:



Advertisements
Similar presentations
1. Probability of an Outcome 2. Experimental Probability 3. Fundamental Properties of Probabilities 4. Addition Principle 5. Inclusion-Exclusion Principle.
Advertisements

Chapter 2 Concepts of Prob. Theory
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Probability Simple Events
Probability Seeing structure and order within chaotic, chance events. Defining the boundaries between what is mere chance and what probably is not. Asymptotic.
Chapter 4 Probability and Probability Distributions
Probability and Statistics1  Basic Probability  Binomial Distribution  Statistical Measures  Normal Distribution.
How can you tell which is experimental and which is theoretical probability? You tossed a coin 10 times and recorded a head 3 times, a tail 7 times.
1 Chapter 6: Probability— The Study of Randomness 6.1The Idea of Probability 6.2Probability Models 6.3General Probability Rules.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
CONDITIONAL PROBABILITY and INDEPENDENCE In many experiments we have partial information about the outcome, when we use this info the sample space becomes.
Chapter 4 Probability.
Section The Idea of Probability Statistics.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Mathematics in Today's World
Basic Concepts and Approaches
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
5.1 Basic Probability Ideas
10/1/20151 Math a Sample Space, Events, and Probabilities of Events.
Math 409/409G History of Mathematics
Theory of Probability Statistics for Business and Economics.
Probability Distributions. Essential Question: What is a probability distribution and how is it displayed?
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 6. Probability Theory and the Normal Probability Distribution.
Ch.3 PROBABILITY Prepared by: M.S Nurzaman, S.E, MIDEc. ( deden )‏
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. Probability is a numerical measure of the likelihood Probability is a numerical measure of the.
Probability Probability is the measure of how likely an event is. An event is one or more outcomes of an experiment. An outcome is the result of a single.
Basic Concepts of Probability Coach Bridges NOTES.
Probability The calculated likelihood that a given event will occur
Chapter 4 Probability ©. Sample Space sample space.S The possible outcomes of a random experiment are called the basic outcomes, and the set of all basic.
Computing Fundamentals 2 Lecture 6 Probability Lecturer: Patrick Browne
Class 2 Probability Theory Discrete Random Variables Expectations.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
From Randomness to Probability Chapter 14. Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen,
PROBABILITY IN OUR DAILY LIVES
POSC 202A: Lecture 4 Probability. We begin with the basics of probability and then move on to expected value. Understanding probability is important because.
Statistics Lecture 4. Last class: measures of spread and box-plots Have completed Chapter 1 Today - Chapter 2.
Lesson 8.7 Page #1-29 (ODD), 33, 35, 41, 43, 47, 49, (ODD) Pick up the handout on the table.
Probability Definition : The probability of a given event is an expression of likelihood of occurrence of an event.A probability isa number which ranges.
QR 32 Section #6 November 03, 2008 TA: Victoria Liublinska
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Measuring chance Probabilities FETP India. Competency to be gained from this lecture Apply probabilities to field epidemiology.
BIA 2610 – Statistical Methods
Probability Theory Rahul Jain. Probabilistic Experiment A Probabilistic Experiment is a situation in which – More than one thing can happen – The outcome.
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Probability theory is the branch of mathematics concerned with analysis of random phenomena. (Encyclopedia Britannica) An experiment: is any action, process.
PROBABILITY AND BAYES THEOREM 1. 2 POPULATION SAMPLE PROBABILITY STATISTICAL INFERENCE.
3.4 Elements of Probability. Probability helps us to figure out the liklihood of something happening. The “something happening” is called and event. The.
Chapter 6 - Probability Math 22 Introductory Statistics.
1 Probability- Basic Concepts and Approaches Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Section Constructing Models of Random Behavior Objectives: 1.Build probability models by observing data 2.Build probability models by constructing.
Probability. Definitions Probability: The chance of an event occurring. Probability Experiments: A process that leads to well- defined results called.
PROBABILITY 1. Basic Terminology 2 Probability 3  Probability is the numerical measure of the likelihood that an event will occur  The probability.
6.2 – Probability Models It is often important and necessary to provide a mathematical description or model for randomness.
Probability Models Section 6.2. The Language of Probability What is random? What is random? Empirical means that it is based on observation rather than.
 Students will be able to find theoretical and experimental probabilities.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
Essential Ideas for The Nature of Probability
Math a - Sample Space - Events - Definition of Probabilities
Chapter 3 Probability Slides for Optional Sections
Chapter 6 6.1/6.2 Probability Probability is the branch of mathematics that describes the pattern of chance outcomes.
PROBABILITY AND PROBABILITY RULES
9. Relative frequency and probability
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
Warm Up Evaluate. 6P P2 7C C6.
Section 6.2 Probability Models
Unit 6: Application of Probability
Probability is the measure of how likely an event is to occur
7.2 Union, intersection, complement of an event, odds
Probability Probability Principles of EngineeringTM
Presentation transcript:

Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)

Biological Sequence Space is Discrete Probability theory is crucial to understanding sequences Furthermore, metrics and algorithms for analyzing sequences are statistical

Sequence Comparison: Similarity/Distance CAGTTAGCT C CATTAAGCT C Proportional distance (p) = # of differences in aligned positions p = 2 differences in 10 positions p = 0.2

Character Evolution in Biological Sequences CAGTTAGCT C ||+|+||+|| CATTCAGAT C ||||+||+|| CATTAAGCT C

Probabilistic Modeling of Sequence Evolution During replication (and recombination), errors occur – Substitution, insertion, deletion, inversion, translocation, … Different types of errors have different chances of occurring, but they can all be considered to happen randomly To develop models we need data to indicate how often each type of error occurs Use observed frequencies and our understanding of the mechanism of error occurrence to estimate probabilities of errors

Stochastic Modeling of Sequences … and More Yesterday we talked about dynamic modeling with difference equations and rates of change Instead of thinking in terms of rate or proportion per unit time, we could think in terms of probability of occurring within a time-step So all of our dynamic models yesterday could be modeled probabilistically

Probability Terms – Part I Trial – A single occurrence of a random process (e.g. a single coin toss, or roll of a die) Outcome – The result of a trial (e.g. “heads” or “tails”) Probability – The chance of a random outcome occurring (e.g. p(heads)=0.5, p(3)=1/6) Frequency – The number of occurrences of an outcome Relative frequency – Number of occurrences of an outcome divided by the total number of trials

Probability Terms – Part II Event – A grouping of multiple outcomes (e.g. “roll an odd number”, “roll less than 5”) Independent – The probability of an outcome is not influenced by the outcomes of previous trials Multiplication rule – The probability that two independent outcomes will occur is the product of their individual probabilities (e.g. “toss two heads”, “roll is odd and <= 4)

Coin Toss Create an Excel worksheet that conducts 10, 100, or 1000 trials of a coin toss Tally the frequency of heads for each number of trials Is it a fair coin? How do you know. Variation in composition vs sample size Degree of variation with sample size (range or difference between (max – min) 3 of heads – Probably smaller with small sample sizes since sample size limits range – Larger sample size makes variation across larger range possible Proportional variation vs sample size – Much larger for small samples since small differences in observed frequency have larger effect when divided by small denominator of small sample Probability of a given sequence of outcomes – multiplication rule multiple independent events Permutations – 2 n different n-length sequences Probability of event specifying sequence composition – enumerate permutations and sum probabilities of events matching criteria, this is the addition rule

Die Roll Is it a fair die? How do you know? Event probabilities for single trial – p(odd), p(odd and < 5) Event probabilities for two trial events – p(2,5), p(2,5 in any order), p(sum is 7) – (use plop-it) Union and intersection – explore addition rule and mutual exclusive, as well as multiplication rule and independence Probability of a given sequence of length n = 6 n

Probability Terms – Part III Union – The event that either or both of two events will occur on a trial (e.g. the union of “odd” and “>4” is “1,3,5,6”) Mutually exclusive – Two events are mutually exclusive if they can’t occur simultaneously (e.g. “roll an even number” and “roll an odd number”) Addition rule – The probability of an event consisting of mutually exclusive outcomes is the sum of the probabilities of the outcomes (e.g. p(heads,tails) = p(heads) + p(tails) Complement – The complement to any event includes all possible outcomes not in the event (e.g. “not heads”, “not 5”). The probability of the complement is ( 1 – the probability of the event) Exhaustive – The set of all possible outcome, the probability must sum to 1

Molecular Evolution and Phylogenetics Biology basics – Central dogma: DNA -> RNA -> protein – DNA replication and processing can lead to changes in DNA composition Metrics of distance – Observed substitution frequencies “How often do we see A replaced with C” – Distance based on evolutionary model “How many events separate these two sequences” Markov Models of Sequence Evolution – Markov process – future state only depends on current state, not how it got there – Molecular genetic mechanisms at multiple scales with distinct probabilities Single site events – sequences Events at larger scales

Nucleotide substitution: Jukes-Cantor model C C A T G A C G T A A C G T CGT Substitution rates are equal     Markov process           Nucleotides are in equal abundance Rate matrix = M = {  ij }

Simulating Jukes-Cantor sequence evolution One nucleotide per sequence position – Simulating change as finite difference using rate equation would give fractional abundances at each position (population) Need to convert matrix of rates to transition probabilities P(t) = {p ij (t)} = e Mt

Simulating Jukes-Cantor sequence evolution P(t) = p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) p 1 (t) p 1 (t) p 1 (t) p 1 (t) p 0 (t) { p 0 (t) = (1 + 3e -4  t ) / 4 p 1 (t) = (1 - e -4  t ) / 4 with Since each row sums to one, only one expression is needed

Jukes-Cantor models AgentSheets – Cell lineage tree Excel – Cell lineage tree – two sequence distance – Probability vs. time R – Cell lineage tree vs. phylogenetic reconstruction

References Felsenstein, J. (2003). Inferring Phylogenies (2nd ed.). Sinauer Associates. Nielsen, R. (2005). Statistical Methods in Molecular Evolution (1st ed.). Springer. Yang, Z. (2006). Computational molecular evolution (p. 357). Oxford University Press.