Presentation is loading. Please wait.

Presentation is loading. Please wait.

BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.

Similar presentations


Presentation on theme: "BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are."— Presentation transcript:

1 BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are similarity searches good for? One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function BLAST program Database Query

2 NameQuery typeDatabase blastnGenomic blastpProtein blastx Translated genomic Protein tblastnProtein Translated genomic tblastx Translated genomic BLAST Databases

3 http://www.ncbi.nlm.nih.gov/BLAST/

4 Place Query Choose Database ?

5 BLASTN Databases Gene collection GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq) Genomic + Transcript Complete human and mouse genome + transcriptome EST Expressed sequence tags mito Mitochondrial sequences vector Vector subset of GenBank month GenBank, EMBL, DDBJ, PDB from 30 days Envi Environmental samples http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

6 Place Query Choose Database Optimize similarity level of the search Threshold for results significance Limit output size Primary word match (16-64 nt) Reward and penalty for matching and mismatching bases Cost to create and extend a gap Remove low information content Limit search to specific organism ?

7 Search for homologous to chick “olfactory receptor 6” gene

8 Query sequence Matched Areas of database sequences Global Alignments Local Alignments

9 Sequence Identifier Sequence description Score(bits) Coverage Identity E value

10 Score and E value Identities and gaps Strand

11 Multiple hits on a same subject

12 Design of the BLAST survey Consider your research question: Are you looking for an particular gene in a particular species?: BLAST against the genome of that species. Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database. Are you looking for exact motif matches? : increase gap penalty or use megablast.

13 Score and E-value Score (S):  (identities + mismatches)-  gaps Depends on search space Query length(bp) Database length(bp) Depends on scoring system Score Bit Score (S’):

14 Score and E-value The score is a measure of the similarity of the query to the sequence shown. The E-value is a measure of the reliability of the score. The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score.

15 Score and E-value The Size of the E-value The typical threshold for a good E-value from a BLAST search is E=10 -6 ≈e -6 or lower. The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e -6 would only leave one entry due to chance.

16

17 Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT S = 13-1 = 12 S’= (1.37*12 – ln(0.711))/ln(2) S’= 16.44 + 0.341 /0.693 S’= 24.2 S:  (Id+MM)-  GP

18 Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT E= 0.711x150x270x4,554,026xe -1.37*12 E= 131135455683x7.24e-8 E= 9504.27 Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026

19 Exercise What will be the minimal score in order to achieve a significant E value (e -6 ~10 -6 )? 131135455683e -1.37S =10 -6 ln (131135455683e -1.37S )=ln(10 -6 ) ln (131135455683)+ln(e -1.37S )=-13.81 25.6-1.37S=-13.81 S= =-13.81-25.6/-1.37 S≈ 28.76

20 1. חיפוש רצפים הומולוגיים לגן CFTR באדם

21 2. חברי משפחה נוספים לגן CFTR הנמצאים ביצורים אחרים

22 3. חיפוש של גנים נוספים חברי משפחת ABC transporters


Download ppt "BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are."

Similar presentations


Ads by Google