TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.

Slides:



Advertisements
Similar presentations
Genetica per Scienze Naturali a.a prof S. Presciuttini Homologous genes Genes with similar functions can be found in a diverse range of living things.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Molecular Evolution Revised 29/12/06
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
From population genetics to variation among species: Computing the rate of fixations.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Molecular Evolution, Part 2 Everything you didn’t want to know… and more! Everything you didn’t want to know… and more!
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Phylogenetic trees Sushmita Roy BMI/CS 576
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Molecular Clock. Rate of evolution of DNA is constant over time and across lineages Resolve history of species –Timing of events –Relationship of species.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Introduction to Phylogenetics
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Using blast to study gene evolution – an example.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selectionist view: allele substitution and polymorphism
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Phylogeny Ch. 7 & 8.
NEW TOPIC: MOLECULAR EVOLUTION.
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Transcription factor binding motifs (part II) 10/22/07.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Molecular Clocks and Continued Research
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
Announcements Seminar today after class! Seminar Wednesday!
Evolutionary genomics can now be applied beyond ‘model’ organisms
Genetics and Evolutionary Biology
Linkage and Linkage Disequilibrium
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
In-Text Art, Ch. 16, p. 316 (1).
5.4 Cladistics.
Mattew Mazowita, Lani Haque, and David Sankoff
Evolutionary Biology Concepts
Study phylogeny in the context of species evolution
Presentation transcript:

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT CCCTGTTTCCAGGTTTGTTGTCCCAAAATAGTGACCATTTCATATGTATA Comparative Genomics

Overview I. Comparing genome sequences Concepts and terminology Methods  Whole-genome alignments  Quantifying evolutionary conservation (PhastCons, PhyloP)  Identifying conserved elements Available datasets at UCSC II. Comparative analyses of function Evolutionary dynamics of gene regulation Case studies Insights into regulatory variation within and across species

Distribution of evolutionary constraint in the human genome Lindblad-Toh et al. Nature 478:476 (2011) 4.2% of genome is putatively constrained ~1 million putative regulatory elements

Infer the course of past evolution using statistical models of sequence evolution Identify sequence elements evolving more slowly or more rapidly than neutral Evaluate the precise degree of constraint on specific positions Predict the functional effects of nucleotide or amino acid mutations in constrained sequences Goals of comparative genomics

Vertebrate genomes available for comparative studies Primates Mammals Tetrapods Vertebrates

Commonly used (and misused) terms Mutation vs. Substitution Mutations occur in individuals, segregate in populations Substitutions are mutations that have become fixed Mutations = within species; substitutions = between species Conservation vs. Constraint Conservation = an observation of sequence similarity Constraint = a hypothesis about the effect of purifying selection Homology, Orthology and Paralogy Homologous sequences = derived from a common ancestor Orthologous sequences = homologous sequences separated by a speciation event (e.g., human HOXA and mouse Hoxa) Paralogous sequences = homologous sequences separated by gene duplication (e.g., human HOXA and human HOXB)

Basic premises in comparative sequence analysis Most mutations that affect function are eliminated by purifying selection Constrained elements have lower substitution rates than expected from the neutral rate Contingent on the effect of the mutation and degree of constraint on the function Manifests as sequence conservation, even among distant species Beneficial mutations may be driven to fixation by positive selection May be detected as “faster-than-neutral” substitution rate Expected to be rare Most sequence differences among genomes are neutral Involve substitutions with minimal or no functional impact Fixed by random genetic drift Fixation rate is equal to mutation rate Genomes become more dissimilar with greater phylogenetic distance

Phylogenies Phylogenetic trees show two things: Evolutionary relationships among species or sequences: branching order Evolutionary distance (e.g., degree of similarity or divergence): branch length Internal node Terminal node Branch

Phylogenies Phylogenetic trees show two things: Evolutionary relationships among species or sequences: branching order Evolutionary distance (e.g., degree of similarity or divergence): branch length Species treeGene tree

Orthologs and paralogs in gene trees Capra et al HMGCS1 HMGCS2

Orthologs and paralogs in gene trees Capra et al Orthologs Paralogs Duplication

Orthologs and paralogs in gene trees Capra et al :1 Orthologs Human HMGCS1 Human HMGCS2 1:2

Ortholog assignments at Ensembl

Steps in sequence comparisons Sequence alignment Global vs. local Whole-genome vs. genome segments (e.g., genes) Identify sites that are homologous (not necessarily identical) Measure similarity and divergence of sequences Sequence similarity – level of conservation Rates of change among sequences - divergence Infer degree of evolutionary constraint Are the sequences more conserved than expected from neutral evolution?

Rates of sequence change are estimated using models of the substitution process       Transition probabilities:

Phylogeny        Substitution rates are calculated for each lineage in a sequence phylogeny

Conserved sequences identified by local reductions in substitution rate aligned position   local  neut

Tools for quantifying evolutionary conservation across genomes Alignment: Multiz Generates multiple species alignment relative to a base genome Constructed from pairwise alignment of individual genomes to reference 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10

100-way Multiz alignment in hg19 Green = level of sequence similarity at each site

Conservation of synteny: “net” alignments Conservation of genome segments Order and orientation of genes and regulatory sequences

Conservation of synteny: “net” alignments Synteny is frequently conserved on megabase scales

Tools for quantifying evolutionary conservation across genomes PhastCons Estimates the probability that a nucleotide belongs to a conserved element Sensitive to ‘runs’ of conserved sites – effective for identifying conserved blocks For hg19, elements are calculated at three phylogenetic scopes (Vertebrate, Placental Mammal, Primate) PhyloP Measures conservation independently at individual positions Provides per-base conservation scores: (-log p value under hypothesis of neutrality) Positive scores suggest constraint; negative scores suggest accelerated evolution Alignment: Multiz Generates multiple species alignment relative to a base genome Constructed from pairwise alignment of individual genomes to reference 46-way and 100-way alignment to hg19, 30-way to mm9; 60-way to mm10

Identifying conserved elements: PhastCons PhastCons scores PhastCons elements lod score: log probability under conserved model – log probability under neutral model Score: normalized lod score on scale Use scores to rank elements by estimated constraint lod: 882 Score: 694

PhastCons elements estimated at 3 phylogenetic scopes Primate Placental Vertebrate

Level of conservation decays with increasing evolutionary distance

PhyloP: measuring basewise conservation PhyloP scores Scores are calculated independently for each base Scores are –log P values under hypothesis of neutral evolution Positive scores = constraint Negative scores = acceleration

Per-site phyloP conservation scores Use PhastCons to identify conserved elements Use phyloP to evaluate individual sites within elements

Accessing conservation data

Multiple genome alignments and conservation metrics are calculated independently for each reference genome Orthologous region in mouse: 30-way multiz alignment

Conservation identifies critical binding sites in regulatory elements Regulatory info (ENCODE) Conservation Important binding sites and variants that affect function will be here