Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST Nitin Bhardwaj, Dept. of Chemical Engineering, IIT Bombay.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
1 ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases Xiaochun Yang, Honglei Liu, Bin Wang Northeastern University, China.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Structural bioinformatics
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
1 BLAST – A heuristic algorithm Anjali Tiwari Pannaben Patel Pushkala Venkataraman.
Introduction to bioinformatics
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Heuristic Approaches for Sequence Alignments
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence Analysis. DNA and Protein sequences are biological information that are well suited for computer analysis Fundamental Axiom: homologous sequences.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Sequence Alignment.
Construction of Substitution matrices
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
InterPro Sandra Orchard.
Heuristic Alignment Algorithms Hongchao Li Jan
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Genome alignment Usman Roshan.
Local alignment and BLAST
Sequence Based Analysis Tutorial
Lecture #7: FASTA & LFASTA
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Speeding the Database Searches and Sequence Alignments with Multi-Motif PHI-BLAST Nitin Bhardwaj, Dept. of Chemical Engineering, IIT Bombay.

National Center For Biological Sciences, (NCBS) Bangalore A unit of TATA Institute of Fundamental Research (TIFR)

What is Sequence Alignment ? The process of lining up two or more sequences to achieve maximal levels of similarity Why make Sequence Alignments ? To detect: Structural & Functional Relationship Evolutionary Relationship

Some Basic Terms Global Alignment Entire Sequence Local Alignment Restricted to regions of identity and strong similarity Query Sequence The sequence of interest Subject Sequence The other one

And…. Scoring Matrices: to score a match/mismatch True Positive True Negative False PositiveFalse Negative Motif: A short conserved region of a sequence Hits: Sequences picked up from the database

What after alignment ? Calculate the score of the alignment Sort the aligned sequences in the order of their decreasing scores Go ahead with your analysis to find out the relationships/similarities

Pattern-Hit Initiated Basic Local Alignment Search Tool (PHI-BLAST) Takes a query seq, a motif, a database to search into Aligns the query sequence with all the seqs which have the motif Brings out a score for each seq Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

A Typical PHI-BLAST Output 1 occurrence(s) of pattern in query pattern [RA][C][ACDEFGHIKLMNPQRSTVWY][C] at position 3 of query sequence Significant matches for pattern occurrence 1 at position 3 Score E Value (bits) pdb|1ILP|A Chain A, Cxcr-1 N-Terminal Peptide Bound To Interleuk e-37 pdb|1QE6|D Chain D, Interleukin-8 With An Added Disulfide Betwee e-35 pdb|1ICW|A Chain A, Interleukin-8, Mutant With Glu 38 Replaced B e-35 pdb|1ROD|A Chain A, Chimeric Protein Of Interleukin 8 And Human e-28 pdb|1TVX|B Chain B, Neutrophil Activating Peptide-2 Variant Form e-14 pdb|1NAP|A Chain A, Mol_id: 1; Molecule: Neutrophil Activating P e-14 pdb|1MSG|A Chain A, Human Melanoma Growth Stimulatory Activity ( e-13 pdb|1MGS|A Chain A, Human Melanoma Growth Stimulating Activity ( e-13 pdb|1QNK|A Chain A, Truncated Human Grob[5-73], Nmr, 20 Structur e-13 pdb|1MI2|A Chain A, Solution Structure Of Murine Macrophage Infl e-12

Strategy behind PHI-BLAST Location of motifs in the seqs Motif (Query) Motif (Subject) Extension in both directions with local alignment Calculate the score for the alignment

Problems with PHI-BLAST Only one motif as input so no of runs required thus increasing the time Consequently, no space for attaching any weightage to any motif No parallel comparison possible No control on the specificity of the program

The Solution(s) !!! MULTI – MOTIF PHI-BLAST (MMPB) RANKED MOTIF PHI-BLAST (RMPB)

Multi-Motif PHI-BLAST Takes a query seq, any no of motifs, a database to search into Aligns the query sequence with all the seqs which have a min no of motif(s) Brings out a score for each seq Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

Strategy behind MMPB Location of motifs in the two seqs Extension in both directions with local alignment and the part in between with global alignment Calculate the score for the alignment Query Motif 2 Motif 1 Motif 2 Subject (Local)(Global) Motif 1

Comparison of Results il8 Macrophage Inflammatory 1beta (the middle columns correspond to PHI-BLAST(e=1) And the last one correspond to MMPB

il8 (1ikl) Interleukin-8

4helud (1bbh) Cytochrome $c (prime)

4helud (256b) Cytochrome $b502

Flav (1ord) Orthinine Decarboxylase

Flav (1cus) Cutinase

Ranked Motif PHI-BLAST Takes a query seq, a number of motifs in the order of their ranks, and a database to search into Aligns the query sequence with all the seqs which have the min no of highest ranked motifs Brings out a score for each seq Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score

Comparison of results Results for il8 (1hum) Macrophage Inflammatory 1beta the unmarked columns correspond to RMPB with at least 3 & 2

il8 (1ikl) Interleukin-8

The problems are solved !!!! Space for multiple motifs as input Space for attaching weightage to the motifs via their ranks Only one run required for any number of motifs so less time A deeper analysis possible

Thats All & Thanks to All of You