Bioinformatics Tutorial I BLAST and Sequence Alignment.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
BLAST Sequence alignment, E-value & Extreme value distribution.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Database Searching for Similar Sequences Search a sequence database for sequences that are similar to a query sequence Search a sequence database for sequences.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
From Pairwise Alignment to Database Similarity Search.
Heuristic methods for sequence alignment in practice Sushmita Roy BMI/CS 576 Sushmita Roy Sep 27 th,
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Access to sequences: GenBank – a place to start and then some more... Links: embl nucleotide archive
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
BLAST What it does and what it means Steven Slater Adapted from pt.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
BLAST : Basic local alignment search tool B L A S T !
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Biology 224 Tom Peavy Sept 20 & 22, 2010
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Biology 4900 Biocomputing.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Phage class: introduction to sequence databases.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
Introduction to Bioinformatics
Introduction to Bioinformatics DNA and Protein Database Searching BLAST: Basic local alignment search tool Xiaolong Wang College of Life Sciences Ocean.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
Courtesy of Jonathan Pevsner
Basic Local Alignment Sequence Tool (BLAST)
Lecture 3.1 BLAST.
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Homology Search Tools Kun-Mao Chao (趙坤茂)
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Welcome to Introduction to Bioinformatics
Identifying templates for protein modeling:
Local alignment and BLAST
Bioinformatics and BLAST
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence alignment, Part 2
Johns Hopkins School of Medicine
Comparative Genomics.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Bioinformatics Tutorial I BLAST and Sequence Alignment

What is BLAST? Online tool from National Center for the Biotechnology Information (NCBI) “Google” for proteins and nucleotide sequences

What can you use BLAST for? Identify an unknown sequence Characterize the gene/protein of interest – Function/activity (gene and protein) – Structure or shape (new protein) – Location or preferred location (protein) – Stability (gene/transcript or protein) Origin of a gene or protein

Sequence alignment approaches 1.Global alignment – Needleman and Wunsch, Local alignment (used in BLAST) – Smith and Waterman, 1980

Global alignment One approach for searching a query sequence is to align the entire sequence against all sequences in a database This approach is very slow and hence impractical

BLAST A much faster approach Divides your search query into short sequences (“words”) and initially looks for exact matches. Once found, these words are then extended i.e. Basic Local Alignment Search Tool Altschul, S.F. et al. Basic local alignment search tool. J Mol Biol. 215(3):403-10(1990). Altschul, S.F. et al.

BLAST algorithm Query sequences are usually split into words Each word is then searched in database Word hits are extended in either direction to generate alignment with score greater than the threshold score

BLAST “The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T” - Alschul et al, 1990

How does BLAST work?

Step 1: Get your sequence NCBI, UCSC etc.. Sequencing facility (unknown gene)

Step 2: Choose BLAST program

The different BLAST programs blastn (nucleotide BLAST) blastp (protein BLAST) blastx (translated BLAST) tblastn (translated BLAST) tblastx (translated BLAST)

Simplified visualization

Why translate in 6 reading frames? 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ DNA sequence can code for six different proteins

Step 3: Search parameters

Step 4: Search results

Important: Tabular output

Score Sequence similarity score is calculated based on the pair-wise alignment quality Alignment score is the sum of scores for each position

Score Nucleotides +1 score for each match -2 score for each mismatch Peptides Each amino acid substitution is given a score

Example AACGTTTCCAGTCCAAATAGCTAGGC ===--=== =-===-==-====== AACCGTTC TACAATTACCTAGGC Hits(+1): 18 Misses (-2): 5 Gaps (existence -2, extension -1): 1 Length: 3 Score = 18 * * (-2) – 2 – 2 = 6 David Fristrom, Introduction to BLAST

E-value E-value – expectation value; the number of different alignments which would yield a similar or better score if searched though the database by chance alone. Low E-value – sequences may be homologous Statistical significance depends on.. – Length of the query sequence – Size of the sequence database

Graphical output

Taxonomy Results

Graphical output

References Figures and text adapted from the following sources: – David Fristrom, Introduction to BLAST – Jonathan Pevsner, BLAST: Basic local alignment search tool – Joanne Fox, BLAST: Finding function by sequence similarity