Inter-species sequence conservation and intra- species sequence diversity Apratim Mitra.

Slides:



Advertisements
Similar presentations
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Advertisements

Multiple Sequence Alignment
Measuring the degree of similarity: PAM and blosum Matrix
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
Sequence similarity.
Multiple alignment: heuristics
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Multiple Sequence Alignments
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Bioinformatics Sequence Analysis III
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Chapter 5 Multiple Sequence Alignment.
Multiple sequence alignment
Biology 4900 Biocomputing.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
An Introduction to Bioinformatics
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Role of Rubisco in Photosynthesis Anu Murphy Dept. of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pairwise Sequence Analysis-III
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
L ESSON A IMS & O BJECTIVES Two part lab: First part will be completed in class today. (1) Use the online Bioinformatics tool ClustalW to analyze DNA sequences.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence similarity, BLAST alignments & multiple sequence alignments
Multiple Sequence Alignment
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sequence Based Analysis Tutorial
Alignment IV BLOSUM Matrices
Presentation transcript:

Inter-species sequence conservation and intra- species sequence diversity Apratim Mitra

Background  Sequence conservation across species is a well-documented fact  Genes coding for the same or similar proteins, even in evolutionary distant organisms, have been observed to have remarkable similarities

Background (Contd.)  At the same time, proteins with similar functions, even in the same species, can show a bewildering diversity  Eg. Immuno-globulins ( commonly called ‘antibodies’)

Aim of this project  To demonstrate intra-species sequence diversity and inter-species sequence conservation using various web-based resources and tools, eg., NCBI, GenBank, ClustalW, etc.  Investigating a new way of visualizing the multiple alignment results

Cumulative alignment profile  We produce a pair-wise alignment from such alignment programs as ClustalW or MUSCLE  Using BLOSUM / PAM substitution matrices and Gap opening/extension penalties, we build a cumulative alignment score profile from the above alignment  In addition to global sequence similarity this would include spatial information

Plan of Action 1. Pool sequences of same/similar genes from different species and proteins (eg. IgG) from same species that exhibit diversity. 2. Run multiple alignment and clustering programs to obtain phylogenic trees hinting at evolutionary relationships. 3. Transform the alignment results into a cumulative alignment profile that indicates spatial features. 4. Cluster these profiles using correlation measures and obtain phylogenic trees. 5. Compare the two results.

Schematic Collect sequences from online libraries Align using ClustalW, MUSCLE, etc Convert alignment scores into a ‘profile’ that indicates spatial information about the alignment Cluster these profiles and compare with the phylogenic tree obtained at the earlier step

Why do it ?  Global pair-wise alignment scores look at the entire alignment at once  An alignment profile which indicates some spatial information would be a way of ‘improving’ interpretation  Sequences which have a high degree of similarity can be differentiated on the basis of ‘patterns’ of dissimilarity in the profiles

Further Uses/Extensions  This method could be useful when trying to find: Differences between closely related species Differences between closely related species Similarities between distant species Similarities between distant species  Can be easily extended to multiple alignments although results might be hard to interpret  The cumulative profiles can also be analyzed by time-frequency methods like Fourier transforms or wavelet analysis for feature extraction

Thank You