Presentation is loading. Please wait.

Presentation is loading. Please wait.

BLAST & FASTA II SECTION L1. Bioinformatics: A Bioengineering Perspective Search ExercisesL1 Two search exercises: i)Basic BLAST (Basic Local Alignment.

Similar presentations


Presentation on theme: "BLAST & FASTA II SECTION L1. Bioinformatics: A Bioengineering Perspective Search ExercisesL1 Two search exercises: i)Basic BLAST (Basic Local Alignment."— Presentation transcript:

1 BLAST & FASTA II SECTION L1

2 Bioinformatics: A Bioengineering Perspective Search ExercisesL1 Two search exercises: i)Basic BLAST (Basic Local Alignment Search Tool) search to identify an unknown sequences ii)Sequence analysis using computer software Exercise I: Worked-out Example BASIC BLAST Search Problem Statements (a)To access the BLAST server via the NCBI ( National Center for Biotechnology Information ) at NIH (National Institutes of Health ) (b)To select correct BLAST search tool to compare an unknown sequence to sequences in the data at NCBI (c)To identify the probable sequence from the sequence data available

3 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L2 Preliminaries / Background NCBI (at NLM / NIH ) maintains many DNA and protein sequence databases available to public over the Internet Each sequence carries an accession #, information relevant to the sequence and the sequence itself Computer program used for searching the databases: BLAST Programs  Align either DNA sequences or protein sequences blastn program => searches the databases for DNA sequences similar to the DNA sequence of interest (query sequence ) blastp program => searches the databases for amino acid sequences similar to the amino acid query sequence blastx program => compares a nucleotide query sequence, translated in all six reading frames ( 3 in the forward direction and 3 in the reverse direction ) against a protein sequence database Discussion of BLAST Programs: Website

4 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued) L3 Search Procedure with blastn Given the following sequenced fragment of DNA as a query DNA sequence, the problem is to search databases for the DNA sequences similar to the query sequence using the blastn BLAST Search Query DNA Sequence atgaagttgt ttctgctgct ttcagccttt gggttctgct gggcccagta tgccccacaa acccagtctg gacgaacgtc tattgtccat ctgtttgaat ggcgctgggt tgacattgct cttgaatgtg agcggtattt gggcccaaag ggatttggag gggtacaggt ctcccccccc aatgaaaata tagtagtcac taacccttca agaccttggt gggagagata ccaaccagtg agttacaagt tatgtaccag atcaggaaat gaaaatgaat tcagagacat ggtgactaga tgtaacaacg ttggcgtccg tatatatgtg gacgctgtca ttaatcatat gtgtggaagt

5 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L4 Search steps: (with blastn) 1.Type the query sequence supplied 2.Visit NCBI website at: Select BLAST from the menu ( at the top of the screen ) Select “ standard nucleotide – nucleotide BLAST [blastn] “ under the “ Nucleotide BLAST “ option => A new screen will appear 3. Cut and paste the typed query sequence supplied in the “search” box Default settings: “nr” --- in the “Choose database “ box √ --- (check) the “low complexity” box under “Options for advanced blasting” section for “ Choosing filters”

6 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L5 4. Click on the “BLAST!” button: Search commences => A new screen appears and states “Your require has been successfully submitted and put into the BLAST Queue” The request ID is xxx-xxx 5. Click on the “format!” button  “BLAST Search Results” screen appears  Once the search is completed, the results will be displayed. (Search may take a few seconds to a few minutes ) 6. On the new (recent display) screen scroll down to “ Distribution of # Blast Hits on the Query Sequence “  Provides a graphical view of all the sequences that are similar to the query sequence  These are color – coded bars: Red: Very much similar Black: Less similar

7 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L6 7.Place the mouse pointer on a colored bar: Name of the sequence will appear in the box above the bar Name includes? (i) Information about the source of the DNA ( what organism, what tissue ) (ii) Identity of the sequence (name of the gene ) 8.Click on a colored-bar  This will take you to the alignment of that sequence with the query sequence  Alignment is presented further down the page 9.Scroll back up to the top of the page ( or click on “ Back” )

8 Bioinformatics: A Bioengineering Perspective Search Exercises (Continued)L7 10. Scroll pass the box with colored bars to reach the list of “ sequences producing significant alignments”  Left side contains links to the sequence file in the database and the following are displayed: (a) Name of the database ( gb: GenBank; emb: European Molecular Biology etc. ) and (b) Accession # of the sequence presented  In the sequence file, the following are presented: (i)Name of the gene (ii)Source of the DNA (iii)Title of the relevant journals (iv)Translated protein sequence (v)DNA sequence + other information  Right-hand side of the list, there is a “Score” and an “E-value” for each sequence Score: Metric on the degree of similarity between the sequence searched and the query sequence

9 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L8 => Higher the score, better is the match E-value: Lower the E-value, higher is the probability that the match observed is not due to some random chance e.g., Good matches yield E-value expressed as a negative exponent (e.g., e  134 )  A very low probability prevails as regard to the sequence match is just a chance  E ≈ 0, denotes identity or near identity of the database sequence with the query sequence. [Shorter sequence of identical matches yield higher E-value because such a short query fragment could have been by chance matched to a database sequence] 11. There could be sequences found without color bars 12. Below the list of similar sequences, alignment of each database sequence against the query sequence itself is presented. Below the name of each sequence, the similarity score and E-values are shown along with percentage of identity observed

10 Bioinformatics: A Bioengineering Perspective Search Exercises(Continued)L9 Questions? (a)What is the name of the gene ( or protein encoded by the gene ) of the database sequence that is most similar to the query sequences? (i.e., Identify the database sequence which is very closely similar to the query sequence ) (b)What are the source details ( type of organism and the tissue ) of the database sequence, which is most akin to the query sequence? (c)Using mouse pointer click on the appropriate color bars for the next three most similar sequences. Determine in each case, (i) Gene identification (ii) Organism / tissue of the sequence -- source details (iii) Percentage of nucleotide identity between the database and query sequences (d)Identify the gene of the query sequence that was subjected to search procedure

11 Bioinformatics: A Bioengineering Perspective BLAST SearchL10 Homework: EXERCISE (A) Supplied below is a query amino acid sequence BLAST Search Query Amino Acid Sequence Problem statement Repeating the steps indicated in the worked out exercise, perform search to find amino acid sequences of the database similar to the query amino acid sequence supplied [Hint: Select “ Standard Protein – protein BLAST [blastp ]” under the “Protein BLAST “ option]

12 Bioinformatics: A Bioengineering Perspective BLAST Search (Continued)L11 Question? Continue the search procedure to reach the page that contains results. Using the results answering the following: (a) Why does an amino acid query sequence leads to a higher similarity than a nucleotide query sequence? (Hint: genetic code) (b) Identify 5 database sequences that are very closely similar to the query sequence. For them list the following: (i)Name of the protein; (ii)Source ( organism / tissue ) of the protein; (iii)The amino acid identity; (iv)Similarity scores; (v)E-values; (vi)Percentage of identity (vii)Percentage positives ( that is, similar amino acids ) (c) Find the animals ( apart from those five listed earlier ) have this ( query ), protein with a similarity score in excess of 220 bits

13 Bioinformatics: A Bioengineering Perspective BLAST Search( Continued ) L12 Homework: EXERCISE (B) Given below is a query DNA sequence. Search for amino acid sequences that are all similar to all six reading frames of the query DNA sequence supplied

14 Bioinformatics: A Bioengineering Perspective BLAST Search( Continued ) L13 Solution Hint: (i) Select “ Nucleotide query – protein db [blastx] “ under the “ Translates BLAST searches “ option (ii) Search the “nr” database and remove the “√” (check) from the “ low complexity” filter box Question? To what extent the present search yields results comparable to those obtained in the worked out exercise and Homework Exercise (A)?

15 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer SoftwareL14 Purpose of the Exercise (a)To obtain a DNA sequence from GenBank using the “ search” function of Entrez at NCBI (b)To learn using computer software that translates the DNA sequence into at least 3 reading frames (c)To identity the open reading frame ( ORF ) that most likely corresponds to the gene (d)To learn using BLAST to verify the correct identification of the ORF (e)To learn using computer software to construct a restriction map of the gene

16 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer Software(Continued)L15 Background Hints DNA sequences are available in public databases DNA/Protein analysis software: Two websites provide programs for DNA analysis for 100 units of computer time on the site for free by setting up an account 1)Bionavigator at: 2)Jellyfish at: Analysis exercise To analyze the α-amylase gene of B. licheniformis

17 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer Software (Continued)L16 Analysis procedure Step 1: To obtain a DNA sequence  Visit  Click on “ Nucleotide “ to access the nucleotide database Step 2: Search the nucleotide database for the Bacillus licheniformis -amylase gene ( Accession # x03236 )  Type the Accession # in the box and press the “Go” button  Clicking on the number, opens the sequence file x03236

18 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer Software(Continued)L17 Step 3: Ascertain the following: (a)Number o nucleotides present in the sequence with accession #x03236 (b)In the “ Features” Section of the file, the region of the sequence that encodes the α-amylase precursor is indicated by “CDS”. Determine numbers of the starting and ending nucleotides for the sequence that encodes the α- amylase precursor. (c)Part of the precursor protein is a signal peptide that is cleaved as the protein is secreted from the cell. After the signal peptide is removed, the protein is known as the mature protein. In this context, (i)Identify the part of the gene that encodes the signal peptide (sig peptide ) using the “ Features “ section of the file (ii)Scrolling to the bottom of the screen, the entire nucleotide sequence can be seen. ( This can be copied and kept on the clip board for future use ) (iii)Identify the part of the gene that encodes the mature peptide (mat peptide ) using “ Features” section of the file

19 Bioinformatics: Sequence Analysis Using Computer Software (Continued) L18 Step 4: Sequence Analysis Open the software program and open a new DNA file Placing the cursor in the area of sequence, copy and paste the α- amylase gene sequence in the space provided Save the sequence file with a suitable name BSAMYT Using appropriate program in the software package translate the DNA sequence such that the option exists at least for 3 forward reading frames ( Depending on the software used, either a single or all frames can be viewed on the same screen )  The ORF in each translated reading frame will be identified by the program  Length of each ORF can be noted for each of 3 forward reading frames and the reading frame that gives the longest ORF can be chosen to denote the α- amylase gene under analysis  Check ! (i)Whether all three forward reading frames of the DNA sequence contain ORFs ( list all the ORFs longer than 50 amino acids ) (ii)Whether the longest ORF corresponds to the coding region of the α- amylase gene ( Determine the length of this ORF )

20 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer Software(Continued)L19 Step 5: ORF Identification Select the amino acids that belong to the long ORF and copy this amino acid sequence to the clipboard  Following the instructions given for EXERCISE (A) the amino acid sequence as above is searched using blastb program  If the procedure all followed correctly, the BLAST search will indicate the protein as: α- amylase from B. licheniformis  Check whether the BLAST search indicates that this ORF corresponds to the gene that was originally adopted for this analysis  Check whether other genes also correspond to this sequence  Verify how similar are these other genes

21 Bioinformatics: A Bioengineering Perspective Sequence Analysis Using Computer Software(Continued)L20 Step 6: Locating and MAPPING the Restriction Enzyme Cleavage sites  Returning to the DNA sequence for α-amylase in the DNA analysis program, use of appropriate program in the software package will identify the location of restriction cleavage sites in the following enzymes: Cla I, Hinc II, Kpn I, Pst I, and Sal I  List the location of these cleavage sites in the α- amylase gene  Using the positions of the cleavage sites for these enzymes, construct a simple restriction map of the α- amylase gene. ( The program can construct it and one has to get the print out )


Download ppt "BLAST & FASTA II SECTION L1. Bioinformatics: A Bioengineering Perspective Search ExercisesL1 Two search exercises: i)Basic BLAST (Basic Local Alignment."

Similar presentations


Ads by Google