Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.

Similar presentations


Presentation on theme: "Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching."— Presentation transcript:

1 database search

2 Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

3 FastA History : FastA was developed by Lipman and Pearson in 1985, which is the first database search software. EBI provides fastA service, available at http://www.ebi.ac.uk/Tools/fasta/ Idea: identify the short substring matching with the target sequence.

4 other software commonly used http://www.ebi.ac.uk/Tools/sss/

5 example: protein sequence : EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP parameters input sequence select database

6

7 results 100% identity 17/28=60.7% (identity) 28 aa overlap

8 BLAST Basic Local Alignment Search Tool (BLAST). BLAST was developed by NCBI. BLAST finds regions of similarity between biological sequences.

9 Basic BLAST ProgramSequencedatabaseProgram description BlastnNucleotide Search a nucleotide database using a nucleotide query Algorithms: blastn, megablast, discontiguous megablast BlastpProtein Search protein database using a protein query Algorithms: blastp, psi-blast, phi-blast, delta-blast BlastxNucleotideprotein Search protein database using a translated nucleotide query TblastnProteinNucleotide Search translated nucleotide database using a protein query TblastxNucleotide Search translated nucleotide database using a translated nucleotide query T:translation, n: nucleotide, p:protein ; x: cross

10 BLASTALL Query Sequence Amino acid SequenceDNA Sequence TBLASTxBLASTxBLASTnTBLASTnBLASTp Nucleotide Database Protein Database Nucleotide Database Nucleotide Database Protein Database Translated

11 Blast source 1. NCBI : http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online version) ftp://ftp.ncbi.nih.gov/blast/ (stand alone) 2.other websites : http://life.zsu.edu.cn/blast/ http://www.fruitfly.org/blast/ http://www.mcgb.uestc.edu.cn/blast/blast.html …

12

13 BLAST 1. online : from website 2. stand alone : download the software

14 comparison between them web server advantages : 1. easy. 2. update. 4. database download is no need. disadvantages : 1. not suitable for large data. 2. cannot define your own database.

15 Web Blast provided by NCBI Blastn for nucleotide Blastp for protein http://blast.ncbi.nlm.nih.gov/Blast.cgi

16 An example : 1. cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa 2. gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat

17 the first sequence

18 The second sequence sequence range software similarity from high to low results shown in new window

19 results of pairwise alignment No significant similarity found information of the two sequences parameters selected

20 Why we need the standalone version of BLAST ? 1. specific database 2. privacy 3. batch processing Blast (standalone version)

21 How to download BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release blast-2.2.23-ia32-win32.exe

22

23 unzip, we can get three folders bin: all the exe files data : data for BLAST doc : readme

24 We need to format the database for BLAST. First, save your database as Fasta format; Second, use formatdb provided in BLAST package to format the database. dos command : formatdb –i sequence.fa –p T/F –o T/F –n db_name Blast (standalone version)

25 An example 1. There are 13 proteins in the file “Delta.txt” as the database. 2. 1 protein is selected as the query sequence, and stored in file “seq.txt” ;

26 1. format Delta.txt : formatdb –i Delta.txt –p T parameter : 1. –i: database 2. –p: T-protein , F-nucleotide

27 2. search Delta.txt by using BLAST : Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt parameter : 1. –p: program name : blastp , blastn , blastx , tblastn , tblastx 2. –d: database name 3. –i: query sequences 4. –o: output file

28 3. To read other parameters just type blastall

29 4. Results : Score E Sequences producing significant alignments: (bits) Value P83301|CXO_CONVE 69 1e-017 P69749|CXD6A_CONBU 20 0.009 P69750|CXD6A_CONCN 18 0.036 P24159|CXDB_CONTE P18511|CXDA_CONTE 18 0.042 P60179|CXD66_CONAA 17 0.066 P60513|CXD6A_CONER 17 0.11 P69751|CXD6E_CONCT P69748|CXD6A_CONAI 16 0.19 P69754|CXD6B_CONMA P69753|CXD6A_CONMA 14 0.56 P69752|CXD6B_CONER P58913|CXD6A_CONPU 14 0.62 P69756|CXD6D_CONMA P69755|CXD6C_CONMA 13 0.89 Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE 12 2.6

30 >P83301|CXO_CONVE Length = 33 Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust. Identities = 33/33 (100%) Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP Sbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 >P69749|CXD6A_CONBU Length = 27 Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust. Identities = 13/30 (43%), Gaps = 6/30 (20%) Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28 C A G C RP CCS C FAC Sbjct: 1 DECSAPGAFCLI RPGLCCSEFCFFAC 26

31 5. pairwise alignment : bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt parameter : 1.–p: program name : blastp , blastn…… 2. –i: first sequence 3. –j: second sequence 4. –o: output files To read other parameter, just type bl2seq

32 6. database can be downloaded from : ftp://ftp.ncbi.nih.gov/blast/db/ scoring matrices can be downloaded from : ftp://ftp.ncbi.nih.gov/blast/matrices/

33 PSI-blast Position specific iterative BLAST (PSI- BLAST). Altschul et al. (1997). Gapped Blast and PSI-Blast: a new generation of protein database search programs. Nucleic Acids Research, 25(17):3389-3402 target: only proteins

34 PSI-blast Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile is automatically constructed from the first set of BLAST alignments. PSI- BLAST is similar to NCBI BLAST2 except that it uses position-specific scoring matrices derived during the search, this tool is used to detect distant evolutionary relationships.

35 online source :  http://npsa-pbil.ibcp.fr/cgi- bin/npsa_automat.pl?page=/NPSA/npsa_psiblast.ht ml  http://blast.ncbi.nlm.nih.gov/Blast.cgi http://blast.ncbi.nlm.nih.gov/Blast.cgi  http://www.ebi.ac.uk/Tools/blastpgp/


Download ppt "Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching."

Similar presentations


Ads by Google