Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)

Similar presentations


Presentation on theme: "NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)"— Presentation transcript:

1 NCBI Review Concepts 20040715 Chuong Huynh

2 NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a) sequence(s) in a sequence-repository identification of all homologous sequences the repository identification of domains with sequence similarity Terminology Global alignment Local alignment

3 NCBI Terminology: Global Alignment Finds the optimal alignment over the entire length of the two compared sequences Unlikely to detect genes that have evolved by recombination (e.g. domain shuffling) or insertion/deletion of DNA Suitable for sequences of homologous molecules

4 NCBI Terminology: Local Alignment short regions of similarity between a pair of sequences. compared sequences can receive high local similarity scores, without the need to have high levels of similarity over their entire length useful when looking for domains within proteins or looking for regions of genomic DNA that contain coding exons

5 NCBI An alignment that BLAST can’t find 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| | 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG 61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT | || || || ||| || | |||||| || | |||||| ||||| | | 61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT 121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC |||| || ||||| || || | | |||| || ||| 121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

6 NCBI BLAST Selection Matrix

7 NCBI Choosing The Right BLAST Flavor for Proteins What you Want to Do?The Right BLAST Flavor Find out something about the function of the protein Use blastp to compare your protein with other proteins contained in the databases. Discover new genes encoding similar proteins Use tblastn to compare your protein with DNA sequences translated into their 6 possible reading frames Claverie & Notredame 2003

8 NCBI Choosing the Right BLAST Flavor for DNA QuestionsAnswer Am I interested in non coding DNA? Yes, Use blastn. Rem: blastn is only for closely related DNA sequences (more than 70% identical) Do I want to discover new proteins? Yes, Use tblastx Do I want to discover proteins encoded in my query DNA sequences? Yes, Use blastx Am I unsure of the quality of my DNA? Yes, Use blastx. Especially if you suspsect your DNA sequence codes for a protein, but may contain sequencing errors. Claverie & Notredame 2003

9 NCBI Choosing The Right BLAST Flavor for DNA Sequences UsageQueryDatabaseProgram Find very similar DNA sequence DNA blastn Protein discovery and ESTs Translated DNA tblastx Analysis of query DNA sequence Translated DNA Proteinblastx Claverie & Notredame 2003

10 NCBI BLAST Tips It is faster and more accurate to BLAST proteins (blastp) rather than nucleotides. If in doubt use blastp. When possible restrict to the subset of the database you are interested in. Look around for the database you need or create your own custom BLAST database. BUT HOW??? When is the best time to use the BLAST server?

11 NCBI Asking Biological Problems with BLAST What You Want to DO General (but More Complicated) Computational Method Using BLAST Finding genes in a genome Run gene prediction software or an ORF Finder (for bacteria) Cut your genome sequence in little (2-5kb) overlapping sequences. Use blastx to BLAST each piece of genome against NR (nonredundant protein db). Works better for sequences with no introns (bacteria). Predicting protein function Domain analysis or wet-lab experimentation Use blastp to BLAST your protein sequence against SWISS-Prot (future = UniProt). If you get a good hit (more than 25% identify) over the complete length of the protein, then your protein has the same function as the SWISS-PROT protein Predicting protein 3-D structure Homology modeling, X- ray, NMR analysis of protein of interest Use blastp to BLAST your protein against PDB (Protein structure DB), if you get hit >25% identity, then your protein and the good hit(s) have a similar 3-D structure Finding protein family members Clone new family members using PCR techniques Use blastp (or better use PSI-BLAST) and run against NR (nonredundant protein family). After you have all members of family, you can make multiple sequence alignment  phylogenetic tree Claverie & Notredame 2003

12 NCBI BLAST and PSI-BLAST Servers on the Internet CountryProgramURL USABLAST/ PSI- BLAST http://www.ncbi.nlm.nih.gov/BLAST USABLASThttp://genome.wustl.edu/gsc/BLAST EUROPEBLASThttp://www.ch.embnet.org/software/b BLAST.html EuropeBLASThttp://www.ebi.ac.uk/blast2/ JapanBLAST/ PSI- BLAST http://www.ddbj.nig.ac.jp/E-mail/ homology.html

13 NCBI Common Mistake Seq1 has domain A & B; Seq2 has domain A and Seq3 has domain B Use Seq 1 as query sequence What happens? E-value of both of these hits may be very high if domain A and B are long and well conserved. Seq1 is homologous to Seq2&3, but remember Seq1 is not homlogous over the entire length to Seq2&3 Just don’t depend on the E-value “BLAST hits are not transitive, unless the alignments are overlapping” Most proteins have more than one domain, so becareful when looking a BLAST results, not all reported hits belong to the same big family. Sequence 1: AAAAAABBBBBB Sequence 2: AAAAAA Sequence 3: BBBBBB

14 NCBI Alternative Method for Homology Searches Smith-Waterman (ssearch): slower but more accurate FASTA: slower than BLAST, but more accurate when making DNA comparison BLAT: for locating cDNA in a genome or finding close proteins in a genome

15 NCBI Common Questions When I do a blast job using WU-BLAST vs NCBI BLAST with the same query sequence, I get a different result? Both are based on the same algorithm, but a different implementation. So why the difference? Usually this is due to the slight variation in the database version, but differences in BLAST program version also play a minor role in the difference. Usually the result, do not change in a dramatic manner, but they do change a bit.

16 NCBI Basic Gene Prediction Flow Chart Obtain new genomic DNA sequence 1. Translate in all six reading frames and compare to protein sequence databases 2. Perform database similarity search of expressed sequence tag Sites (EST) database of same organism, or cDNA sequences if available Use gene prediction program to locate genes Analyze regulatory sequences in the gene

17 NCBI The Annotation Process DNA SEQUENCE ANNALYSIS SOFTWARE Useful Information Annotator

18 NCBI DNA sequence RepeatMasker Blastn HalfwiseBlastx Gene finders tRNA scan RepeatsPromotersPseudo-GenesrRNA Genes tRNA FastaBlastPPfamPrositePsortSignalPTMHMM Annotation Process

19 NCBI How do I do large scale genome analysis? Read Koonin’s book on NCBI Bookshelf

20 NCBI TaxPlot is a tool for three-way comparisons of genomes on the basis of the protein sequences they encode. Demo TaxPlot http://www.ncbi.nlm.nih.gov/sutils/taxik2.cgi

21 NCBI Demo - VecScreen http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html


Download ppt "NCBI Review Concepts 20040715 Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)"

Similar presentations


Ads by Google