Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
© Wiley Publishing All Rights Reserved. Phylogeny.
Xenolog: Homologs resulting from horizontal gene transfer.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
Multiple alignment June 29, 2007 Learning objectives- Review sequence alignment answer and answer questions you may have. Understand how the E value may.
Bioinformatics and Phylogenetic Analysis
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Protein Modules An Introduction to Bioinformatics.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Automatic methods for functional annotation of sequences Petri Törönen.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Alexis Dereeper Homology analysis and molecular phylogeny CIBA courses – Brasil 2011.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
T-COFFEE Multiple Alignments of Orthologous Sequences Horizontal Gene Transfer (Phylogenetic Trees) WebLogo.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Manually Adjusting Multiple Alignments Chris Wilton.
P HYLO P AT : AN UPDATED VERSION OF THE PHYLOGENETIC PATTERN DATABASE CONTAINS GENE NEIGHBORHOOD Presenter: Reihaneh Rabbany Presented in Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Jalview Visualising DAS annotation on Multiple Sequence Alignments 26 th February 2007 Andrew Waterhouse
Copyright OpenHelix. No use or reproduction without express written consent1.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Gene3D, Orthology and Homology-Based Inheritance of Protein-Protein Interactions Corin Yeats
1 MSA (Multiple Sequence Alignment) and Evolution Fiona Brinkman Simon Fraser University, Greater Vancouver, BC, Canada.
First & Last Name August X, 2000 Evolution
Sequence similarity, BLAST alignments & multiple sequence alignments
BLAST program selection guide
Basics of Comparative Genomics
Sequence based searches:
Comparative Genomics.
Pipelines for Computational Analysis (Bioinformatics)
Genome Annotation Continued
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Ensembl Genome Repository.
Pairwise Sequence Alignment
Multiple sequence alignment & Phylogenetics Analysis
Basics of Comparative Genomics
Presentation transcript:

Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]

Overview  Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources  Alignment & Assembly Differences Key programs for each Jalview example

Homologs Have common origins but may or may not have common activity. Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment

Homologs …have common ancestry, but the way they are related can vary (i.e. the reasons they have diverged into different sequences can vary)  orthologs - Homologs produced by speciation. They tend to have similar function.  paralogs - Homologs produced by gene duplication. They tend to have differing functions.

Orthologous or paralogous homologs Early globin gene mouse  ß -chain gene  -chain gene cattle ß human ß mouse ß human  cattle  Orthologs (  ) Orthologs ( ß ) Paralogs (cattle) Homologs Gene Duplication Orthologs – diverged after speciation – tend to have similar function Paralogs – diverged after gene duplication – some functional divergence occurs Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs

True or False? A1x is the ortholog in species x of A1y? A1x is a paralog of A2x? A1x is a paralog of A2y?

Identifying Gene/Protein Relationships from Phylogenetic trees  orthologs - Homologs produced by speciation. Gene phylogeny matches organismal phylogeny.  paralogs - Homologs produced by gene duplication. Multiple copies of homologs in a given species or evidence that gene duplication involved through phylogenetic analysis and lack of match to organismal phylogeny

Gene Orthology: How to detect?  Most : Identify reciprocal best BLAST hits (EGO, COGs,…) Example Problem:  If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete  Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced cattlehumancattle mouse

2 Forms in 1 Species ++++ Slides from Jonathan Eisen

2 Forms in 1 Species - Gene Loss Gene duplicated in common ancestor Loss

Unusual Distribution Pattern + +

Unusual Distribution - Gene Loss + + Gene present in ancestor Gene lost here

Unusual Distribution - Evolutionary Rate Variation -? + + Gene too diverged to be found

Ortholog guess via synteny ACB AC?

Syntenic blocks

ensEMBL calculations demo

OMA Browser demo

Alignments and Assemblies  Alignment ALL sequences from SAME region Therefore can be useless for a  non-overlapping contigs  PCR probes/oligos Good for  paralog/orthologs  Basis for phylogeny  Assembly: Good for near identical sequences Types:  De-novo  Guided [reference sequence]

Alignment  Implicit statement Each residue in an aligned sequence derived from the last common ancestor [LCA]  Therefore ok to only look at conserved regions or mask non- conserved regions Especially for phylogeny

Alignment Tools  Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT  Slow but more accurate *-Coffee  T: original  3D: uses pdb as guide (structural)  M: uses multiple methods Probcons

Alignment Edit Tools  NEVER use a word processor or excel to edit alignments……  JalView (Java Alignment Viewer) Good for editing DAS capable

Figure Generation Trees Annotation Features Structures PDB ‘Standard’ Formats FASTA MSF CLUSTAL PILEUP BLC PFAM Distributed Annotation System Distributed Annotation System GFF Jalview Features Newick Secondary Structure Prediction Multiple Sequence Alignment Sequences Alignments Clickable HTML Images Line Art Analysis Consensus Conservation & Clustering Visualization Jalview Annotation

Jalview DAS Client Functionality DAS ANNOTATION SERVERS DAS ANNOTATION SERVERS Query matches ID to Authority Map to local reference frame Mouse over for feature name, links and scores Group features by source Type==colour Highlight start-end Select specific sources Filtered list Add user defined sources

Assemblers  Many free options  STADEN - staden.sf.net Original assembler, all platforms No longer in development Useless for next gen sequencing  MAQ and MAQView Installed in computers in COIL