How to use the web for bioinformatics Molecular Technologies Ethan Strauss 274-4330 X 1171

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Bioinformatics growth curves Medline records Computer power DNA sequences 3-D structures.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Homework Assignments due next session 1.Find a entry of interest in OMIM ( )
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies October 14, 2006 Ethan Strauss X 1171
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
How to use the web for bioinformatics Molecular Technologies October 15, 2005 Ethan Strauss X 1171
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
The Protein Data Bank (PDB)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
How to use the web for bioinformatics Ethan Strauss X 1171
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
A Study of Cystic Fibrosis Using Web-Based Tools Anuradha Datta Murphy Graduate Student, Dept. of Molecular and Integrative Physiology, University of Illinois.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Gene Expression Omnibus (GEO)
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
NCBI Literature Databases: PubMed
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Archives and Information Retrieval
What is Bioinformatics?
Mangaldai College, Mangaldai
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Lesson 3 Bioinformatics Laboratory
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

How to use the web for bioinformatics Molecular Technologies Ethan Strauss X

Objectives At the end of this session you should be able to do all of the following using freely available tools on the world wide web: Use Genbank or a similar database to find nucleic acid sequences of interest Understand the parts of a Genbank entry Use some of the databases at NCBI to find more information about a sequence. Perform an alignment of several nucleic acid sequences Find an arbitrary tool or database on the web.

How to find all those dang URLs!

Outline What is Bioinformatics Sequence Databases –What does a Genbank Entry look like? Other NCBI databases Multiple Sequence Alignment New tools & Databases

What is Bioinformatics? Bioinformatics refers to the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems posed by or inspired from the management and analysis of biological data (Wikipedia)Wikipedia

What is Bioinformatics? (my working definition) Anything done on a computer in which knowledge of biology is helpful. or Anything done in biology in which knowledge of computers is helpful.

What sort of questions can Bioinformatics answer? Sequence analysis –Where are restriction sites? –How does an RNA molecule fold? –What changes can be made to a DNA sequence to get a new protein with specific functional changes? Computational evolutionary biology –How are two sequences related? Analysis of gene expression –Is this gene highly expressed in cancer cells?

What sort of work is done in Bioinformatics? Measuring biodiversity –How diverse are individuals of a species? –Is it one species or two? Analysis of regulation –What does this drug do to expression of a gene? Analysis of mutations in cancer –What is different about these cancer cells as compared to none cancer cells? High-throughput image analysis –How can we analyze the affects of 1000 different compounds on the location of a specific protein? And more!

Sequence Databases NCBI databases – Nucleic acids, proteins, Literature, genomes, taxonomy, SNPs and more!NCBI databases EMBL – Nucleic acid, protein, structure, microarray data and more.EMBL DBJJ – Nucleic acid, protein.DBJJ SwissProt – Very well annotated protein database.SwissProt Many other general and specialized databases exist.Many other

Sequences Databases NCBI/Genebank Nation Center for Biotechnology InformationNation Center for Biotechnology Information (NCBI) Sponsored and run by the US government. Contains many different databases and huge amounts of information. Most or all data is freely downloadable.freely downloadable This one site is probably sufficient for all your Nucleic acid and Protein database needs!

Sequences Databases Entrez Allows searching and access to NCBI databases.

Sequences Databases Sequence Records LOCUS NumberSizeTypeTopology DivisionDate DEFINITION - Name of the Sequence ACCESSION - Unique Id number VERSION - Other numbers which are associated KEYWORDS SOURCE – What was it isolated from ORGANISM - More taxonomic detail REFERENCE - Paper or papers about the sequence –AUTHORS –TITLE –JOURNAL FEATURES - A complete list of all of the features of a sequence. Can be very extensive and useful! ORIGIN – The actual Sequence!

Other NCBI databases Online Mendelian Inheritance in Man (OMIM) A catalog of human genes and genetic disorders with links to other NCBI databases, including sequence databases. This is a good starting point if you want to get sequences for a specific disorder. D=search&DB=omim&term=HFI

Other NCBI databases Gene Database Gathers information about a single gene. Exactly one entry per Gene. A good place to dig deeper into a single gene or to reduce redundancy about a single gene.

Other NCBI databases HomoloGene Gathers homologs from various species 3D Domains Protein Structure collection Taxonomy Species information Geo (Gene Expression Omnibus) A gene expression/molecular abundance repository

General Utilities util.htmlhttp://searchlauncher.bcm.tmc.edu/seq-util/seq- util.html –Translation –Restriction Digestion –Reformatting (alternately FASTA Formatter) FASTA Formatter –Complement/Reverse –Etc. –Melting Temperature of an oligo.

Database search by sequence similarity Basic Local Alignment Search Tool (BLAST)

Multiple Sequence Alignment Many programs can align multiple sequences with each other to find the best fit for all. This is generally more biologically meaningful for protein sequences since they are more highly conserved. ClustalClustal is the most common.

Multiple Sequence Alignment MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDSX ETIKALA MEAGAYLNAIIFVLVATIIAVISRGLTRTEPCTIRITGESITVHACHIDS...ETIKALA MEA..YLNAII.VLV.TIIAVIS..L.RTEPC.IkITGESITV.ACklDa.....I..L. MEAgaYLNAIIfVLVaTIIAVISrgLtRTEPCtIrITGESITVhAChiDsx etIkaLa LK PLSLERLFQ LK.PLSLERLFQ......L..... lk plsLerlfq

New Tools Development of new tools and databases is ongoing. Your needs will probably change over time. You can find new tools using Google Lists Nucleic Acids Research Annual Database issue

Homework Assignments due next session 1.Find a entry of interest in OMIM ( ) 2.Find a Gene associated with that entry 1.Click on the “links” link on the right and choose “Gene”

Homework 3.The Gene page has gathered scads of information about this one gene. Find homologs in other species. From this page again choose “links” and go to Homologene

Homework 1.Gather the protein sequences for each homologous gene (or 5 of them if there are more than that). 1.Click “DownLoad” in the homologene listing 2.Download everything with the default settings.

Homework You will get a text file in “Fasta” format. Save it somewhere convenient.

Homework Go to the Clustal server at align.htmlhttp://searchlauncher.bcm.tmc.edu/multi-align/multi- align.html Paste your complete Fasta file contents into the input box and click submit. This takes awhile, so be patient. You will get output that looks something like this.

Homework At the bottom of the alignment file is the same results in “Fasta” format. Copy the complete Fasta results and paste it into the input box at a BoxShade server (

Homework Depending on the parameters chosen for BoxShade, you will see something like this. Regions which are the same in all species are likely involved in function in some way.

Homework After all that work, your boss comes to you ands says that sequence comparison is obsolete! He wants you do structural alignments of these proteins. Figure out what a structural alignment is, find two different tools to find conserved 3D structures and choose which one you would use for this. Describe why this tool is preferable to the other. NOTE: You do not need to actually do any structural alignments. Just find out how you would go about doing on if you had to.