The BioQUEST Curriculum Consortium at Clark Atlanta University Atlanta, Georgia Feb. 14-16, 2003 Evolutionary Bioinformatics Education: a National Science.

Slides:



Advertisements
Similar presentations
Woods Hole, Massachusetts July 25, 2005, 7 to 10 PM Marine Biological Laboratory — Workshop on Molecular Evolution.
Advertisements

Introduction to the GCG Wisconsin Package The Center for Bioinformatics UNC at Chapel Hill Jianping (JP) Jin Ph.D. Bioinformatics Scientist Phone: (919)
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Bioinformatics and Phylogenetic Analysis
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Sequence alignment, E-value & Extreme value distribution
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Bioinformatics Sequence Analysis III
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Introduction to Profile Hidden Markov Models
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
July 29, 2008, 7 to 10 PM Marine Biological Laboratory, Woods Hole, MA Workshop on Molecular Evolution: multiple sequence analysis session.
Eidhammer et al. Protein Bioinformatics Chapter 4 1 Multiple Global Sequence Alignment and Phylogenetic trees Inge Jonassen and Ingvar Eidhammer.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Multiple sequence alignment
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sept. 21, 2006, 5:30 PM Florida State University — Bioinformatics Workshop #1 An Introduction to Multiple Sequence Alignment & Analysis thru GCG’s SeqLab.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Protein Sequence Alignment Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Bioinformatics Overview
Protein Families, Motifs & Domains.
Multiple sequence alignment (msa)
Steve Thompson 8/2/2018 BSC4933(04)/ISC5224(01): Introduction to Bioinformatics Florida State University School of Computational Science and Department.
9/20/2018 Marine Biological Laboratory, Woods Hole, MA Workshop on Molecular Evolution: multiple sequence analysis session July 28, 2009, 7 to 10 PM.
Multiple Sequence Alignment
Sequence Based Analysis Tutorial
BLAST.
Explore Evolution: Instrument for Analysis
Multiple Sequence Alignment (I)
Introduction to Bioinformatics
Applying principles of computer science in a biological context
MULTIPLE SEQUENCE ALIGNMENT
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

The BioQUEST Curriculum Consortium at Clark Atlanta University Atlanta, Georgia Feb , 2003 Evolutionary Bioinformatics Education: a National Science Foundation Chautauqua Course

More data yields stronger analyses — if done carefully! Mosaic ideas and evolutionary ‘importance.’ Multiple Sequence Alignment & Analysis Steven M. Thompson Steven M. Thompson Florida State University School of Computational Science and Information Technology (CSIT) CSIT

So what; why even bother? Applications: Probe, primer, and motif design; Graphical illustrations; Comparative ‘homology’ inference; Molecular evolutionary analysis. OK — well, how do you do it? Applicability?

Dynamic programming’s complexity increases exponentially with the number of sequences being compared: N-dimensional matrix.... complexity=[sequence length] number of sequences

See — MSAMSA (‘global’ within ‘bounding box’) and MSA PIMAPIMA (‘local’ portions only) on the multiple alignment page at the PIMA Baylor College of Medicine’s Search Launcher — — but, severely limiting restrictions! ‘Global’ heuristic solutions

Therefore — pairwise, progressive dynamic programming restricts the solution to the neighbor- hood of only two sequences at a time. All sequences are compared, pairwise, and then each is aligned to its most similar partner or group of partners. Each group of partners is then aligned to finish the complete multiple sequence alignment. Multiple Sequence Dynamic Programming

Web resources for pairwise, progressive multiple alignment — bielefeld.de/bcd/Curric/MulAli/welcome.htmlhttp:// bielefeld.de/bcd/Curric/MulAli/welcome.html. bielefeld.de/bcd/Curric/MulAli/welcome.html However, problems with very large datasets and huge multiple alignments make doing multiple sequence alignment on the Web impractical after your dataset has reached a certain size. You’ll know it when you’re there!

Reliability and the Comparative Approach — explicit homologous correspondence; manual adjustments based on knowledge, especially structural, regulatory, and functional sites. Therefore, editors like SeqLab and the Ribosomal Database Project:

Structural & Functional correspondence in the Wisconsin Package’s SeqLab — Wisconsin PackageSeqLabWisconsin PackageSeqLab

Work with proteins! If at all possible — Twenty match symbols versus four, plus similarity! Way better signal to noise. Also guarantees no indels are placed within codons. So translate, then align. Nucleotide sequences will only reliably align if they are very similar to each other. And they will require extensive hand editing and careful consideration.

Beware of aligning apples and oranges [and grapefruit]! Parologous versus orthologous; genomic versus cDNA; mature versus precursor.

Mask out uncertain areas —

Complications — Order dependence. Not that big of a deal. Substitution matrices and gap penalties. A very big deal! Regional ‘realignment’ becomes incredibly important, especially with sequences that have areas of high and low similarity (GCG’ PileUp -InSitu option).

Complications cont. — Format hassles! Specialized format conversion tools such as GCG’s From’ and To’ programs and PAUPSearch. Don Gilbert’s public domain ReadSeq program. ReadSeq

Still more complications — Indels and missing data symbols (i.e. gaps) designation discrepancy headaches —., -, ~, ?, N, or X..... Help!

The consensus and motifs — Conserved regions can be visualized with a sliding window approach and appear as peaks. P-Loop Let’s concentrate on the first peak seen here to simplify matters.

The first GTP binding domain of EF 1  /Tu — A consensus isn’t necessarily the biologically “correct” combination. A simple consensus throws much information away! Therefore, motif definition.

The EF 1  /Tu P-Loop — Defined as: (A,G)x4GK(S,T). A one-dimensional ‘regular-expression’ of a conserved site. Not necessarily biologically meaningful. Motifs are limited in their ability to discriminate a residue’s ‘importance.’

FOR MORE INFO... Explore my Web Home: and and Contact me for specific long-distance bioinformatics assistance and collaboration. So how do we include ‘all’ the information of a multiple sequence alignment, or of a region within an alignment, in a description that doesn’t throw anything away? Enter — for remote homology searching, the ‘profile’... profile algorithms, incl. ‘traditional’ Gribskov profiles, Expectation Maximization (MEME’s), and Hidden Markov Models (HMMer’s). Conclusions —

Many fine texts are starting to become available in the field. Many fine texts are starting to become available in the field. To ‘honk-my-own-horn’ a bit, check out the new — Current Protocols in Bioinformatics from John Wiley & Sons, Inc: They asked me to contribute a chapter on multiple sequence analysis using GCG software. Humana Press, Inc. also asked me to contribute. I’ve got two chapters in their — Introduction to Bioinformatics: A Theoretical And Practical Approach m/Product.pasp?txtCatalog= HumanaBooks&txtCategory= &txtProductID= X&isVariant=0http:// m/Product.pasp?txtCatalog= HumanaBooks&txtCategory= &txtProductID= X&isVariant=0. m/Product.pasp?txtCatalog= HumanaBooks&txtCategory= &txtProductID= X&isVariant=0 Both volumes are now available. AND FOR EVEN MORE INFO...

References — Bailey, T.L. and Elkan, C., (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers, in Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, California, U.S.A. pp. 28–36. Bairoch A. (1992) PROSITE: A Dictionary of Sites and Patterns in Proteins. Nucleic Acids Research 20, Eddy, S.R. (1996) Hidden Markov models. Current Opinion in Structural Biology 6, 361–365. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics 14, Felsenstein, J. (1993) PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Dept. of Genetics, University of Washington, Seattle, Washington, U.S.A. Feng, D.F. and Doolittle, R. F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25, 351–360. Genetics Computer Group (Copyright ) Program Manual for the Wisconsin Package, Version 10.3, Accelrys, subsidiary of Pharmocopeia Inc. Gilbert, D.G. (1993 [C release] and 1999 [Java release]) ReadSeq, public domain software distributed by the author. Bioinformatics Group, Biology Department, Indiana University, Bloomington, Indiana,U.S.A. Gribskov M., McLachlan M., Eisenberg D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. U.S.A. 84, Gupta, S.K., Kececioglu, J.D., and Schaffer, A.A. (1995) Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. Journal of Computational Biology 2, 459–472. Smith, R.F. and Smith, T.F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for comparative protein modelling. Protein Engineering 5, 35–41. Swofford, D.L., PAUP (Phylogenetic Analysis Using Parsimony) ( ) Illinois Natural History Survey, (1994) personal copyright, and (1997) Smithsonian Institution, Washington D.C., U.S.A. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins,D.G. (1997) The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24, 4876–4882. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22,