Download presentation
Presentation is loading. Please wait.
1
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are similarity searches good for? One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function BLAST program Database Query
2
NameQuery typeDatabase blastnGenomic blastpProtein blastx Translated genomic Protein tblastnProtein Translated genomic tblastx Translated genomic BLAST Databases
3
http://www.ncbi.nlm.nih.gov/BLAST/
4
Place Query Choose Database ?
5
BLASTN Databases Gene collection GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq) Genomic + Transcript Complete human and mouse genome + transcriptome EST Expressed sequence tags mito Mitochondrial sequences vector Vector subset of GenBank month GenBank, EMBL, DDBJ, PDB from 30 days Envi Environmental samples http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases
6
Place Query Choose Database Optimize similarity level of the search Threshold for results significance Limit output size Primary word match (16-64 nt) Reward and penalty for matching and mismatching bases Cost to create and extend a gap Remove low information content Limit search to specific organism ?
7
Search for homologous to chick “olfactory receptor 6” gene
8
Query sequence Matched Areas of database sequences Global Alignments Local Alignments
9
Sequence Identifier Sequence description Score(bits) Coverage Identity E value
10
Score and E value Identities and gaps Strand
11
Multiple hits on a same subject
12
Design of the BLAST survey Consider your research question: Are you looking for an particular gene in a particular species?: BLAST against the genome of that species. Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database. Are you looking for exact motif matches? : increase gap penalty or use megablast.
13
Score and E-value Score (S): (identities + mismatches)- gaps Depends on search space Query length(bp) Database length(bp) Depends on scoring system Score Bit Score (S’):
14
Score and E-value The score is a measure of the similarity of the query to the sequence shown. The E-value is a measure of the reliability of the score. The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score.
15
Score and E-value The Size of the E-value The typical threshold for a good E-value from a BLAST search is E=10 -6 ≈e -6 or lower. The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e -6 would only leave one entry due to chance.
17
Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026 Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT S = 13-1 = 12 S’= (1.37*12 – ln(0.711))/ln(2) S’= 16.44 + 0.341 /0.693 S’= 24.2 S: (Id+MM)- GP
18
Exercise Calculate the S, S’ and E for the following BLAST hit: ACGTCGATCGAGCT |||||||| ||||| AGGTCGTC-GAGGT E= 0.711x150x270x4,554,026xe -1.37*12 E= 131135455683x7.24e-8 E= 9504.27 Given the following parameters: Query length: 150 =1.37 K=0.711 Average Sequence length in database: 270 Number of sequences in database: 4,554,026
19
Exercise What will be the minimal score in order to achieve a significant E value (e -6 ~10 -6 )? 131135455683e -1.37S =10 -6 ln (131135455683e -1.37S )=ln(10 -6 ) ln (131135455683)+ln(e -1.37S )=-13.81 25.6-1.37S=-13.81 S= =-13.81-25.6/-1.37 S≈ 28.76
20
1. חיפוש רצפים הומולוגיים לגן CFTR באדם
21
2. חברי משפחה נוספים לגן CFTR הנמצאים ביצורים אחרים
22
3. חיפוש של גנים נוספים חברי משפחת ABC transporters
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.