Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.

Slides:



Advertisements
Similar presentations
Large scale genomes comparisons Bioinformatics aspects (Introduction) Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Algorithms for Alignment of Genomic Sequences Michael Brudno Department of Computer Science Stanford University PGA Workshop 07/16/2004.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
CS262 Lecture 9, Win07, Batzoglou History of WGA 1982: -virus, 48,502 bp 1995: h-influenzae, 1 Mbp 2000: fly, 100 Mbp 2001 – present  human (3Gbp), mouse.
Sequence Similarity. The Viterbi algorithm for alignment Compute the following matrices (DP)  M(i, j):most likely alignment of x 1 …x i with y 1 …y j.
Genomic Sequence Alignment. Overview Dynamic programming & the Needleman-Wunsch algorithm Local alignment—BLAST Fast global alignment Multiple sequence.
CS273a Lecture 14, Fall 08, Batzoglou CS273a Lecture 14, Fall 2008 Finding Conserved Elements (1) Binomial method  25-bp window in the human genome 
CS273a Lecture 10, Aut 08, Batzoglou CS273a Lecture 10, Fall 2008 Neutral Substitution Rates.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
CS273a Lecture 11, Aut 08, Batzoglou Multiple Sequence Alignment.
Some new sequencing technologies. Molecular Inversion Probes.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics and Phylogenetic Analysis
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
CS262 Lecture 9, Win07, Batzoglou Phylogeny Tree Reconstruction
CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 10, Aut 08, Batzoglou CS273a Lecture 10, Fall 2008 Local Alignments.
Alignments and Comparative Genomics. Welcome to CS374! Today: Serafim: Alignments and Comparative Genomics Omkar: Administrivia.
Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5,
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
$399 Personal Genome Service $2,500 Health Compass service $985 deCODEme (November 2007) (April 2008) $350,000 Whole-genome sequencing (November 2007)
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Variants of HMMs. Higher-order HMMs How do we model “memory” larger than one time point? P(  i+1 = l |  i = k)a kl P(  i+1 = l |  i = k,  i -1 =
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Using blast to study gene evolution – an example.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Multiple Sequence Alignment
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Step 3: Tools Database Searching
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
CSCI2950-C Lecture 12 Networks
Evolutionary genomics can now be applied beyond ‘model’ organisms
Basics of Comparative Genomics
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Inferring phylogenetic trees: Distance and maximum likelihood methods
Molecular Evolution.
Mattew Mazowita, Lani Haque, and David Sankoff
Evolutionary genetics
Volume 10, Issue 11, Pages (March 2015)
Basics of Comparative Genomics
Presentation transcript:

Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication

Orthology and Paralogy HB Human WB Worm HA1 Human HA2 Human Yeast WA Worm Orthologs: Derived by speciation Paralogs: Everything else Orthologs: Derived by speciation Paralogs: Everything else

Orthology, Paralogy, Inparalogs, Outparalogs

Synteny maps Comparison of human and mouse

Synteny maps

Building synteny maps Recommended local aligners BLASTZ  Most accurate, especially for genes  Chains local alignments WU-BLAST  Good tradeoff of efficiency/sensitivity  Best command-line options BLAT  Fast, less sensitive  Good for comparing very similar sequences finding rough homology map

Index-based local alignment Dictionary: All words of length k (~10) Alignment initiated between words of alignment score  T (typically T = k) Alignment: Ungapped extensions until score below statistical threshold Output: All local alignments with score > statistical threshold …… query DB query scan Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?

Local Alignments

After chaining

Chaining local alignments 1.Find local alignments 2.Chain -O(NlogN) L.I.S. 3.Restricted DP

Progressive Alignment When evolutionary tree is known:  Align closest first, in the order of the tree  In each step, align two sequences x, y, or profiles p x, p y, to generate a new alignment with associated profile p result Weighted version:  Tree edges have weights, proportional to the divergence in that edge  New profile is a weighted average of two old profiles x w y z Example Profile: (A, C, G, T, -) p x = (0.8, 0.2, 0, 0, 0) p y = (0.6, 0, 0, 0, 0.4) s(p x, p y ) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -) Result: p xy = (0.7, 0.1, 0, 0, 0.2) s(p x, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -) Result: p x- = (0.4, 0.1, 0, 0, 0.5)

Threaded Blockset Aligner Human–Cow HMR – CD Restricted Area Profile Alignment

Reconstructing the Ancestral Mammalian Genome Human: C Baboon: C Cat: C Dog: G C C or G G

Neutral Substitution Rates

Finding Conserved Elements (1) Binomial method  25-bp window in the human genome  Binomial distribution of k matches in N bases given the neutral probability of substitution

Finding Conserved Elements (2) Parsimony Method  Count minimum # of mutations explaining each column  Assign a probability to this parsimony score given neutral model  Multiply probabilities across 25-bp window of human genome A C A A G

Finding Conserved Elements

Finding Conserved Elements (3) GERP

Phylo HMMs HMM Phylogenetic Tree Model Phylo HMM

Finding Conserved Elements (3)

How do the methods agree/disagree?

Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate  : constraint mutation rate relative to neutral

Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate  : constraint mutation rate relative to neutral