Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information.

Slides:



Advertisements
Similar presentations
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
Heuristic alignment algorithms and cost matrices
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Bioinformatics and Data Warehousing 1)Introduction to Bioinformatics 2)FASTA File Format 3)Searching Gene Sequences (BLAST) 4)Data Management in Biomedical.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Protein Sequence Comparison Patrice Koehl
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
1 Lesson 3 Aligning sequences and searching databases.
Sequence alignment, E-value & Extreme value distribution
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Sequence comparison: Local alignment
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Comp. Genomics Recitation 3 The statistics of database searching.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Step 3: Tools Database Searching
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
DNA SEQUENCE ALIGNMENT FOR PROTEIN SIMILARITY ANALYSIS CARL EBERLE, DANIEL MARTINEZ, MENGDI TAO.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence comparison: Local alignment
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Pairwise sequence Alignment.
Sequence Based Analysis Tutorial
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Sequence Analysis Determining how similar 2 (or more) gene/protein sequences are (too each other) is a “staple” function in bioinformatics. This information is utilized for: 1) Gene/Protein Identification 2) Infer Gene/Protein Function 3) Measure Genetic Distance This ENTIRE exercise relies on the comparison between 2 (or more) sequences, and is independent of any functional content within the sequence(s).

In “Pair Wise” analysis and “Multiple Sequence Alignments”, two (or more) sequences are compared to each other and a similarity measurement is derived. This process is completely computational and there is no need for a database query. From this process we can: 1) Identify common regions of sequence identity (infer function). 2) Rank order multiple sequences to identify the sequences that are most similar (measure genetic distance).

In “Sequence Identification”, we compare our sequence(s) of interest to an entire database of (known) sequences, and identify those sequences that are most similar to our sequence of interest. Theoretical Basis of Pairwise Sequence Analysis Needleman-Wunsch Algorithm : Global Alignment (entire sequence contributes to alignment) Fundamental Principle: calculate the alignment score across two sequences. All possible pairs are represented by a two-dimensional array, and all possible comparisons are represented by pathways through the array. Represents Dynamic Programming: Solving a series of subsets of a computational problem to solve the entire problem. “Divide and Conquer”.

DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS 'Dynamic programming' is an efficient programming technique for solving certain combinatorial problems. It is particularly important in bioinformatics as it is the basis of sequence alignment algorithms for comparing protein and DNA sequences. In the bioinformatics application Dynamic Programming gives a spectacular efficiency gain over a purely recursive algorithm. Don't expect much enlightenment from the etymology of the term 'dynamic programming,' though. Dynamic programming was formalized in the early 1950s by mathematician Richard Bellman, who was working at RAND Corporation on optimal decision processes. He wanted to concoct an impressive name that would shield his work from US Secretary of Defense Charles Wilson, a man known to be hostile to mathematics research. His work involved time series and planning—thus 'dynamic' and 'programming' (note, nothing particularly to do with computer programming). Bellman especially liked 'dynamic' because "it's impossible to use the word dynamic in a derogatory sense"; he figured dynamic programming was "something not even a Congressman could object to.”

OFFICEUNIVERSITY | | | ||||| COFFEEICEVARSITY OFFICEUNIVERSITY COFFEEICEVARSITY Alignment of 2 “Sequences” (words for demo purposes) “Ungapped Alignment” -OFFICEUNIVERSITY ||| COFFEEICEVARSITY DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS

-OFF--ICEUNIVERSITY ||| ||| | ||||| COFFEEICE---VARSITY OFFICEUNIVERSITY COFFEEICEVARSITY Alignment of 2 “Sequences” (words for demo purposes) “Gapped Alignment” DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS If gaps at any position (and any length) are allowed, the process becomes computationally expensive, and in many cases the alignment does not provide meaningful information. Hence gaps must be limited to a useful and manageable number.

OFFICEUNIVERSITY C O F F E E I C E V A R S I T Y Dynamic Programming (Initialization Step) DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS

OFFICEUNIVERSITY C  O  F  F  E   E   I    C  E   V  A R  S  I    T  Y 

OFFICEUNIVERSITY C  O  F  F  E -0.3   E -3   I    C  E   V  A R  S  I    T  Y  DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS Gap Penalties: 1) Reduce number of gaps in the alignment 2) Ensure a more meaningful alignment 3) Opening a gap is costly 4) Extending a gap is cheap Gap opening penalty: should be 2 – 3 times larger than the most negative value in the substitution matrix that is being used. Gap extension penalty: should be 0.1 to 0.3 times the value of the gap opening penalty.

OFFICEUNIVERSITY C  O  F  F  E -0.9   E -0.6   I    C  E   V  A 2 R  S  I    T  Y  DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS

OFFICEUNIVERSITY C  O  F  F  E 0   E -0.3   I    C  E   V  A -2.9 R  S  I    T  Y  DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS

-OFF--ICEUNIVERSITY ||| ||| ||||||| COFFEEICE---VARSITY -OFF--ICE ||| ||| COFFEEICE VERSITY | ||||| VARSITY DYNAMIC PROGRAMMING and SEQUENCE ALIGNMENTS

Theoretical Basis of Pairwise Sequence Analysis Smith-Waterman Algorithm : Local Alignment Fundamental Principle: based on Needleman-Wunsch, but compares segments of all possible lengths and chooses whichever optimize the similarity measure. Allows user to search for conserved/functional domains within sequences. Functionally, global alignments start aligning at the far end of the alignment matrix and trace back, where local alignments only show the regions of alignment.

Pair Wise AlignmentMultiple AlignmentsSequence Searching Compares 2 sequencesCompares 3 or more sequencesCompares 1 sequence against thousands Process: Objective: Application: Find common sequence motifs Find common sequence motifs, rank based on alignment scores. Sequence Identification, Comparative genomics

BLAST (Basic Local Alignment Search Tool) Why is BLAST so fast? By preindexing all the possible 11-letter words into the database records. EXAMPLE “AGTGTCGATCG” Steps: 1) Find all the 11-letter words in your query sequence, plus a few variations. 2) Look these up in the 11-letter-word index. 3) Retrieve all sequences containing those words. 4) Use a rigorous algorithm (e.g. Smith-Waterman) to extend the match in both directions