NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Local alignments Seq X: Seq Y:. Local alignment  What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally.
Structural bioinformatics
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Bioinformatics and Phylogenetic Analysis
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
Overview of sequence database searching techniques and multiple alignment May 1, 2001 Quiz on May 3-Dynamic programming- Needleman-Wunsch method Learning.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Rationale for searching sequence databases June 22, 2005 Writing Topics due today Writing projects due July 8 Learning objectives- Review of Smith-Waterman.
Sequence alignment, E-value & Extreme value distribution
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Effect of gap penalty on Local Alignment Score:Score: 161 at (seq1)[2..36] : (seq2)[53..90] 2 ASTV----TSCLEPTEVFMDLWPEDHSNWQELSPLEPSD || | | |||||||||||||||||||||||||||
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Local alignment, BLAST and Psi-BLAST October 25, 2012 Local alignment Quiz 2 Learning objectives-Learn the basics of BLAST and Psi-BLAST Workshop-Use BLAST2.
Part I: Identifying sequences with … Speaker : S. Gaj Date
What is BLAST? BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
BLAST and Psi-BLAST and MSA Nov. 1, 2012 Workshop-Use BLAST2 to determine local sequence similarities. Homework #6 due Nov 8 Chapter 5, Problem 8 Chapter.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Lab 3.2: Database Similarity Searching “The BLAST Buffet” Stephanie Minnema University of Calgary.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Introduction to Bioinformatics Resources for DNA Barcoding
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Gene Annotation with DNA Subway
Sequence alignment, Part 2
Comparative Genomics.
Basic Local Alignment Search Tool (BLAST)
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

NCBI Review Concepts Chuong Huynh

NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a) sequence(s) in a sequence-repository identification of all homologous sequences the repository identification of domains with sequence similarity Terminology Global alignment Local alignment

NCBI Terminology: Global Alignment Finds the optimal alignment over the entire length of the two compared sequences Unlikely to detect genes that have evolved by recombination (e.g. domain shuffling) or insertion/deletion of DNA Suitable for sequences of homologous molecules

NCBI Terminology: Local Alignment short regions of similarity between a pair of sequences. compared sequences can receive high local similarity scores, without the need to have high levels of similarity over their entire length useful when looking for domains within proteins or looking for regions of genomic DNA that contain coding exons

NCBI An alignment that BLAST can’t find 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| | 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG 61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT | || || || ||| || | |||||| || | |||||| ||||| | | 61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT 121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC |||| || ||||| || || | | |||| || ||| 121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

NCBI BLAST Selection Matrix

NCBI Choosing The Right BLAST Flavor for Proteins What you Want to Do?The Right BLAST Flavor Find out something about the function of the protein Use blastp to compare your protein with other proteins contained in the databases. Discover new genes encoding similar proteins Use tblastn to compare your protein with DNA sequences translated into their 6 possible reading frames Claverie & Notredame 2003

NCBI Choosing the Right BLAST Flavor for DNA QuestionsAnswer Am I interested in non coding DNA? Yes, Use blastn. Rem: blastn is only for closely related DNA sequences (more than 70% identical) Do I want to discover new proteins? Yes, Use tblastx Do I want to discover proteins encoded in my query DNA sequences? Yes, Use blastx Am I unsure of the quality of my DNA? Yes, Use blastx. Especially if you suspsect your DNA sequence codes for a protein, but may contain sequencing errors. Claverie & Notredame 2003

NCBI Choosing The Right BLAST Flavor for DNA Sequences UsageQueryDatabaseProgram Find very similar DNA sequence DNA blastn Protein discovery and ESTs Translated DNA tblastx Analysis of query DNA sequence Translated DNA Proteinblastx Claverie & Notredame 2003

NCBI BLAST Tips It is faster and more accurate to BLAST proteins (blastp) rather than nucleotides. If in doubt use blastp. When possible restrict to the subset of the database you are interested in. Look around for the database you need or create your own custom BLAST database. BUT HOW??? When is the best time to use the BLAST server?

NCBI Asking Biological Problems with BLAST What You Want to DO General (but More Complicated) Computational Method Using BLAST Finding genes in a genome Run gene prediction software or an ORF Finder (for bacteria) Cut your genome sequence in little (2-5kb) overlapping sequences. Use blastx to BLAST each piece of genome against NR (nonredundant protein db). Works better for sequences with no introns (bacteria). Predicting protein function Domain analysis or wet-lab experimentation Use blastp to BLAST your protein sequence against SWISS-Prot (future = UniProt). If you get a good hit (more than 25% identify) over the complete length of the protein, then your protein has the same function as the SWISS-PROT protein Predicting protein 3-D structure Homology modeling, X- ray, NMR analysis of protein of interest Use blastp to BLAST your protein against PDB (Protein structure DB), if you get hit >25% identity, then your protein and the good hit(s) have a similar 3-D structure Finding protein family members Clone new family members using PCR techniques Use blastp (or better use PSI-BLAST) and run against NR (nonredundant protein family). After you have all members of family, you can make multiple sequence alignment  phylogenetic tree Claverie & Notredame 2003

NCBI BLAST and PSI-BLAST Servers on the Internet CountryProgramURL USABLAST/ PSI- BLAST USABLASThttp://genome.wustl.edu/gsc/BLAST EUROPEBLASThttp:// BLAST.html EuropeBLASThttp:// JapanBLAST/ PSI- BLAST homology.html

NCBI Common Mistake Seq1 has domain A & B; Seq2 has domain A and Seq3 has domain B Use Seq 1 as query sequence What happens? E-value of both of these hits may be very high if domain A and B are long and well conserved. Seq1 is homologous to Seq2&3, but remember Seq1 is not homlogous over the entire length to Seq2&3 Just don’t depend on the E-value “BLAST hits are not transitive, unless the alignments are overlapping” Most proteins have more than one domain, so becareful when looking a BLAST results, not all reported hits belong to the same big family. Sequence 1: AAAAAABBBBBB Sequence 2: AAAAAA Sequence 3: BBBBBB

NCBI Alternative Method for Homology Searches Smith-Waterman (ssearch): slower but more accurate FASTA: slower than BLAST, but more accurate when making DNA comparison BLAT: for locating cDNA in a genome or finding close proteins in a genome

NCBI Common Questions When I do a blast job using WU-BLAST vs NCBI BLAST with the same query sequence, I get a different result? Both are based on the same algorithm, but a different implementation. So why the difference? Usually this is due to the slight variation in the database version, but differences in BLAST program version also play a minor role in the difference. Usually the result, do not change in a dramatic manner, but they do change a bit.

NCBI Basic Gene Prediction Flow Chart Obtain new genomic DNA sequence 1. Translate in all six reading frames and compare to protein sequence databases 2. Perform database similarity search of expressed sequence tag Sites (EST) database of same organism, or cDNA sequences if available Use gene prediction program to locate genes Analyze regulatory sequences in the gene

NCBI The Annotation Process DNA SEQUENCE ANNALYSIS SOFTWARE Useful Information Annotator

NCBI DNA sequence RepeatMasker Blastn HalfwiseBlastx Gene finders tRNA scan RepeatsPromotersPseudo-GenesrRNA Genes tRNA FastaBlastPPfamPrositePsortSignalPTMHMM Annotation Process

NCBI How do I do large scale genome analysis? Read Koonin’s book on NCBI Bookshelf

NCBI TaxPlot is a tool for three-way comparisons of genomes on the basis of the protein sequences they encode. Demo TaxPlot

NCBI Demo - VecScreen