Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss 274-4330 X 1373

Similar presentations


Presentation on theme: "How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss 274-4330 X 1373"— Presentation transcript:

1 How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss ethan.strauss@promega.com 274-4330 X 1373 http://www.q7.com/~ethan

2 Objectives At the end of this session you should be able to do all of the following freely available tools on the world wide web: Use Genbank or a similar database to find nucleic acid sequences of interest Understand the parts of a Genbank entry Use a BLAST server (e.g. ) to find related sequences. Perform an alignment of several nucleic acid sequences Obtain the protein sequence which corresponds to a specific Nucleic acid sequence

3 How to find all those dang URLs! http://q7.com/~ethan/molbio/

4 Outline Sequence Databases –What does a Genbank Entry look like? BLAST Multiple Sequence Alignment PCR Primer Design Translation and other Utilities

5 Sequences Databases NCBI databases – Nucleic acids, proteins, Literature, genomes, taxonomy, SNPs and more!NCBI databases EMBL – Nucleic acid, protein, structure, microarray data and more.EMBL DBJJ – Nucleic acid, protein.DBJJ SwissProt – Very well annotated protein database.SwissProt Many other general and specialized databases exist.Many other

6 Sequences Databases NCBI/Genebank Nation Center for Biotechnology InformationNation Center for Biotechnology Information (NCBI) Sponsored and run by the US government. Contains many different databases and huge amounts of information. Most or all data is freely downloadable.freely downloadable This one site is probably sufficient for all your Nucleic acid a protein database needs!

7 Sequences Databases Entrez Allows searching and access to NCBI databases.

8 Sequences Databases Sequence Records LOCUS NumberSizeTypeTopology DivisionDate DEFINITION - Name of the Sequence ACCESSION - Unique Id number VERSION - Other numbers which are associated KEYWORDS SOURCE – What was it isolated from ORGANISM - More taxonomic detail REFERENCE - Paper or papers about the sequence –AUTHORS –TITLE –JOURNAL FEATURES - A complete list of all of the features of a sequence. Can be very extensive and useful! ORIGIN – The actual Sequence! http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=58533118

9 Hands on Find a gene of interest using the Entrez interface. We will be working with this sequence throughout class, so you may want to open a word processing program and save the sequence (only) there for future reference

10 General Utilities http://searchlauncher.bcm.tmc.edu/seq-util/seq- util.htmlhttp://searchlauncher.bcm.tmc.edu/seq-util/seq- util.html –Translation –Restriction Digestion –Reformatting (alternately FASTA Formatter) FASTA Formatter –Complement/Reverse –Etc. http://www.promega.com/biomath/calc11.htm –Melting Temperature of an oligo.

11 Hands on Translate your sequence in all 6 reading frames.

12 BLAST Basic Local Alignment Search Tool Compares a query sequences against all sequences in a database. Very powerful for finding biologically significant relationships and full gene sequences in the database when you have a fragment etc. Different types: –Nucleic acid – Nucleic Acid –Protein- Protein –Nucleic Acid Translation – Protein –Protein – Nucleic Acid Translation –Translation - Translation

13 BLAST

14

15 Hands on Use ~120 bases (2 lines) from your sequence to find at least two other sequences related to it. Note that if we all hit NCBI BLAST at once, it will be slow. We may not have time to wait. Get all 3 sequences (your original and two others) into FASTA format using READSEQ.

16 Multiple Sequence Alignment Many programs can align multiple sequences with each other to find the best fit for all. This is generally more biologically meaningful for protein sequences since they are more highly conserved. ClustalClustal is the most common.

17 Multiple Sequence Alignment MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDSX ETIKALA MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDS...ETIKALA MEA..YLNAII.VLV.TIIAVIS..L.RTEPC.IkITGESITV.ACklDa.....I..L. MEAgaYLNAIIfVLVaTIIAVISrgLtRTEPCtIrITGESITVhAChiDsx etIkaLa LK PLSLERLFQ LK.PLSLERLFQ......L..... lk plsLerlfq

18 Hands on Use your FASAT Formatted sequences to perform a multiple sequence alignment. Transfer the alignment to a word processing program and see if you can make it look decent. Change to Courier or Courier New Reduce Font Size Change to Landscape view

19 PCR Primer Design There are many PCR primer design programs online and off. I recommend Primer 3. It is complex, but powerful.Primer 3 You can ignore most parameters.

20 Hands on Design primers for the sequence you have been working with.

21 Homework Assignments due next session 1.Find a DNA or RNA Sequence of interest to you which codes for a single protein product. It is fine if your sequence has untranslated regions as long as part of it codes for a protein. 2.Use the first 300 bases of this sequence and an online BLAST server (http://www.ncbi.nlm.nih.gov/blast/) to find 4 related sequences (for a grand total of 5 sequences).http://www.ncbi.nlm.nih.gov/blast/ 3.Perform multiple sequence alignment of the first 300 bases of all 5 sequences. 4.Use this alignment to design PCR primers which will amplify all of 5 sequences. If this does not seem to be possible, please explain why. 5.Translate your initial sequences into its corresponding protein sequence.

22 Homework Please turn in a report which includes the following: 1) Information about your initial sequence including: –Genebank Accession Number –Species –Description –Location of ORF and any other important features. –Protein Sequence of the translation 2) Information about the 4 other sequences including the above Genebank Accession Number Species Description Location of ORF and any other important features. BLAST Score as compared to your first sequence 3) The sequences of the PCR primers you chose or a short explanation of why you could not find primers to amplify all of these genes. 4) The multiple sequence alignment with the locations of the primers clearly marked. Note that the Multiple Sequence Alignment is best viewed in Courier font and with the screen in Landscape view. This actually matters a lot! If you have done this and it is still not lining up correctly, reduce font size.


Download ppt "How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss 274-4330 X 1373"

Similar presentations


Ads by Google