Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2.

Slides:



Advertisements
Similar presentations
Introduction to Molecular Evolution
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Evolution of genes & proteins
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Phylogenetic Trees Lecture 4
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Schedule Day 1: Molecular Evolution Introduction Lecture: Models of Sequence Evolution Practical: Phylogenies Chose Project and collect literature Read.
The origins & evolution of genome complexity Seth Donoughe Lynch & Conery (2003)
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Molecular Clocks, Base Substitutions, & Phylogenetic Distances.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Advanced Questions in Sequence Evolution Models Context-dependent models Genome: Dinucleotides..ACGGA.. Di-nucleotide events ACGGAGT ACGTCGT Irreversibility.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Adaptive Molecular Evolution Nonsynonymous vs Synonymous.
The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3.
. Phylogenetic Trees Lecture 13 This class consists of parts of Prof Joe Felsenstein’s lectures 4 and 5 taken from:
1 Additive Distances Between DNA Sequences MPI, June 2012.
1 Patterns of Substitution and Replacement. 2 3.
Chapter 3 Substitution Patterns Presented by: Adrian Padilla.
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 5: Variation within and between species * Chapter 5: Are Neanderthals among.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Evolution: Lamarck Evolution: Change over time Evolution: Change over time Lamarck Lamarck Use / disuse Use / disuse Theory of inheritance of ACQUIRED.
Substitution Numbers and Scoring Matrices
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Lecture 3: Markov models of sequence evolution Alexei Drummond.
Tree Inference Methods
Models of Molecular Evolution I Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.1 – 7.2.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
1 Evolutionary Change in Nucleotide Sequences Dan Graur.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities.
Identifying and Modeling Selection Pressure (a review of three papers) Rose Hoberman BioLM seminar Feb 9, 2004.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
MS Sequence Clustering
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
MODELLING EVOLUTION TERESA NEEMAN STATISTICAL CONSULTING UNIT ANU.
NEW TOPIC: MOLECULAR EVOLUTION.
Evolutionary Models CS 498 SS Saurabh Sinha. Models of nucleotide substitution The DNA that we study in bioinformatics is the end(??)-product of evolution.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
1 Repeats!. 2 Introduction  A repeat family is a collection of repeats which appear multiple times in a genome.  Our objective is to identify all families.
Modelling evolution Gil McVean Department of Statistics TC A G.
Methods in Phylogenetic Inference Chris Castorena Thornton Lab.
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Evolutionary Change in Sequences
Models for DNA substitution
Inferring a phylogeny is an estimation procedure.
Linkage and Linkage Disequilibrium
Maximum likelihood (ML) method
Gene – Expression – Mutation - polymorphism
Distances.
Models of Sequence Evolution
Goals of Phylogenetic Analysis
Gene – Expression – Mutation - polymorphism
What are the Patterns Of Nucleotide Substitution Within Coding and
Why Models of Sequence Evolution Matter
Evolutionary Biology Concepts
Presentation transcript:

Measuring genetic change Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Section 5.2

A Parallel 2 changes, no difference ACAC A Coincidental 2 changes, 1 difference AC AG A Single 1 change, 1 difference AC A Back 2 changes, no difference AC CA A Convergent 3 changes, no difference AC CT AT A Multiple 2 changes, 1 difference AC CT Types of substitution ACATGCCCTTAA

Types of substitution (continued) Multiple substitutions can greatly obscure actual evolutionary history, particularly in cases where there have been many mutations i.e. over long evolutionary time scales Final three examples have serious implications for inference of evolutionary history: Similarity inherited from an ancestor is called homology Independently acquired similarity is called homoplasy All tree-building methods rely on sufficient levels of homology

Types of substitution (continued) Substitutions that exchange a purine for another purine or a pyrimidine for another pyrimidine are called transitions A A T T G G C C Substitutions that exchange a purine for a pyrimidine or vice-versa are called transversions

Measuring evolutionary change Simplest measure is to count number of different sites Poor measure: Some sites may undergo repeated substitutions As sequences diverge, measure becomes less accurate Time since divergence (Myr) Base pair differences Saturation occurs - most sites changing have changed before

Time Sequence difference Correction of observed sequence differences Observed difference Expected difference ‘Correction’

A general framework of sequence evolution models Pt =Pt =Pt =Pt = p AA p CA p GA p TA p AC p CC p GC p TC p AG p CG p GG p TG p AT p CT p GT p TT P ii = 1 - p ij  jijijiji f = [f A f C f G f T ]

The Jukes-Cantor (JC) model Assumes that all four bases have equal frequencies and that all substitutions are equally likely Pt =Pt =Pt =Pt =---- f = [¼ ¼ ¼ ¼]

Kimura’s 2 parameter model (K2P) Takes into account different frequencies of transitions vs. transversions Pt =Pt =Pt =Pt =---- f = [¼ ¼ ¼ ¼] Transitions (  ) Transversions (  )

Felsenstein (1981) (F81) Takes into account differences in base composition Percentage (G + C) can range from 25% - 75% F81 model allows the frequencies of the four nucleotides to be different Does not allow for variation between genes/species f = [  A  C  G  T ] Pt =Pt =Pt =Pt =- --AAAAAA--AAAAAA CC--CCCCCC--CCCC- GGGG--GGGGGG--GG- TTTTTT--TTTTTT---

Hasegawa, Kishino and Yano (1985) (HKY85) Essentially merges the K2P and F81 models to allow transitions and transversions to occur at different rates as well as allowing base frequencies to vary f = [  A  C  G  T ] Pt =Pt =Pt =Pt =- --AAAAAA--AAAAAA CC--CCCCCC--CCCC- GGGG--GGGGGG--GG- TTTTTT--TTTTTT---

General reversible model (REV) Most general model - each substitution has its own probability f = [  A  C  G  T ] Pt =Pt =Pt =Pt =- --AaAaAbAbAcAc--AaAaAbAbAcAc CaCa--CdCdCeCeCaCa--CdCdCeCe- GbGbGdGd--GfGfGbGbGdGd--GfGf- TcTcTeTeTfTf--TcTcTeTeTfTf--- By constraining a-f it is possible to generate all the other models

Comparing the models JC  A =  C =  G =  T  =  JC  A =  C =  G =  T  =  HKY85  A  C  G  T  HKY85  A  C  G  T  REV  A  C  G  T a,b,c,d,e,f REV  A  C  G  T a,b,c,d,e,f K2P  A =  C =  G =  T  K2P  A =  C =  G =  T  Allow transition/ transversion bias Allow transition/ transversion bias F81  A  C  G  T  =  F81  A  C  G  T  =  Allow base frequencies to vary Allow base frequencies to vary

Comparing the models (continued) ACGTA C G T Observed ACGTA C G T JC ACGTA C G T K2P ACGTA C G T HKY85

Assumptions: independence Assumes that change at one site has no effect on other sites Good example is in RNA stem-loop structures ACCCCUUGC A U G GGGGAA Substitution may result in mismatched bases and decreased stem stability A CCC C UU G C A U G GGG C AA A CCCGUU G C A U G GGGCAA Compensatory change may occur to restore Watson-Crick base pairing

Assumptions: base composition Assumption that base composition is at equilibrium and that it is similar across all taxa studied In example opposite, trees inferred using models which do not allow for this will not group Thermus and Deinococcus AquifexThermotogaThermusDeinococcusOthers % G + C

Assumptions: variation in substitution rate across sites All sites are not equally likely to undergo a substitution Functional constraints: Pseudogenes have lost all function and can evolve freely Fourfold degenerate sites do not change amino acid composition of proteins Non-degenerate sites are highly constrained 5’ flanking region 5’ untranslated region Non-degenerate sites Twofold degenerate sites Fourfold degenerate sites Introns 3’ untranslated region 3’ flanking region Pseudogenes Substitution / site / 10 9 years

Assumptions: variation in substitution rate across sites (continued) More rapidly evolving sequence shows most divergence initially but soon saturates Sequence A actually appears to be more rapidly evolving DNA divergence Divergence time (Myr) A 0.5% / Myr + 20% constraint B 2% / Myr + 50% constraint