Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.

Slides:



Advertisements
Similar presentations
DNA BLAST Lab.
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
The Protein Data Bank (PDB)
Similar Sequence Similar Function Charles Yan Spring 2006.
Prosite and UCSC Genome Browser Exercise 3. Protein motifs and Prosite.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Protein and Function Databases
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
An introduction to using the AmiGO Gene Ontology tool.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
On line (DNA and amino acid) Sequence Information
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Gene Expression Omnibus (GEO)
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Copyright OpenHelix. No use or reproduction without express written consent1.
Motif discovery and Protein Databases Tutorial 5.
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Bioinformatics and Computational Biology
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Using BLAST to Identify Species from Proteins
INFORMATION FLOW AARTHI & NEHA.
Introduction to Bioinformatics
Welcome to the Protein Database Tutorial
Searching the NCBI Databases
Basic Local Alignment Search Tool
Explore Evolution: Instrument for Analysis
Standard Mutation Nomenclature in Molecular Diagnostics
Gene Safari (Biological Databases)
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
How to search NCBI.
BLAST Slides adapted & edited from a set by
Presentation transcript:

Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado

Tour of Major Biological Databases There is a tremendous amount of information about biomolecules in publicly available databases. There is a tremendous amount of information about biomolecules in publicly available databases. Today, we will look at a few of the main databases and what kind of information they contain. Today, we will look at a few of the main databases and what kind of information they contain. Lab project will give you a little practice at browsing databases Lab project will give you a little practice at browsing databases

What can be discovered about a gene by a database search? A little or a lot, depending on the gene A little or a lot, depending on the gene Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Structural information: associated protein structures, fold types, structural domains Structural information: associated protein structures, fold types, structural domains Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases

Using a database How to get information out of a database: How to get information out of a database: Browsing: no targeted information to retrieve Browsing: no targeted information to retrieve Search: looking for particular information Search: looking for particular information Searching a database: Searching a database: Must have a key that identifies the element(s) of the database that are of interest. Must have a key that identifies the element(s) of the database that are of interest. Name of gene Name of gene Sequence of gene Sequence of gene Other information Other information Helps to have particular informational goals Helps to have particular informational goals

Searching for information about genes and their products Gene and gene product databases are often organized by sequence Gene and gene product databases are often organized by sequence Genomic sequence encodes all traits of an organism. Genomic sequence encodes all traits of an organism. Gene products are uniquely described by their sequences. Gene products are uniquely described by their sequences. Similar sequences among biomolecules indicates both similar function and an evolutionary relationship Similar sequences among biomolecules indicates both similar function and an evolutionary relationship Macromolecular sequences provide biologically meaningful keys for searching databases Macromolecular sequences provide biologically meaningful keys for searching databases

Searching sequence databases Start from sequence, find information about it Start from sequence, find information about it Many kinds of input sequences Many kinds of input sequences Could be amino acid or nucleotide sequence Could be amino acid or nucleotide sequence Genomic or mRNA/cDNA or protein sequence Genomic or mRNA/cDNA or protein sequence Complete or fragmentary sequences Complete or fragmentary sequences Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences. Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences. Both small (mutations) and large (required for function) differences within “similar” can be interesting. Both small (mutations) and large (required for function) differences within “similar” can be interesting.

What might we want to know about a sequence? Is this sequence similar to any known genes? How close is the best match? Significance? Is this sequence similar to any known genes? How close is the best match? Significance? What do we know about that gene? What do we know about that gene? Genomic (chromosomal location, allelic information, regulatory regions, etc.) Genomic (chromosomal location, allelic information, regulatory regions, etc.) Structural (known structure? structural domains? etc.) Structural (known structure? structural domains? etc.) Functional (molecular, cellular & disease) Functional (molecular, cellular & disease) Evolutionary information: Evolutionary information: Is this gene found in other organisms? Is this gene found in other organisms? What is its taxonomic tree? What is its taxonomic tree?

NCBI and Entrez

One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine. One of the most useful and comprehensive sources of databases is the NCBI, part of the National Library of Medicine. NCBI provides interesting summaries, browsers for genome data, and search tools NCBI provides interesting summaries, browsers for genome data, and search tools Entrez is their database search interface Entrez is their database search interface Can search on gene names, sequences, chromosomal location, diseases, keywords,... Can search on gene names, sequences, chromosomal location, diseases, keywords,...

BLAST: Searching with a sequence Goals is to find other sequences that are more similar to the query than would be expected by chance (and therefore are homologous). Goals is to find other sequences that are more similar to the query than would be expected by chance (and therefore are homologous). Can start with nucleotide or amino acid sequence, and search for either (or both) Can start with nucleotide or amino acid sequence, and search for either (or both) Many options Many options E.g. ignore low information (repetitive) sequence, set significance critical value E.g. ignore low information (repetitive) sequence, set significance critical value Defaults are not always appropriate: READ THE NCBI EDUCATION PAGES! Defaults are not always appropriate: READ THE NCBI EDUCATION PAGES!

Major choices: Major choices: Translation Translation Database Database Filters Filters Restrictions Restrictions Matrix Matrix

Close hit: Rat ADH alpha

Distant hit: Human sorbitol dehydrogenase

Parameters (at bottom!)

Click on:

Taxonomy report (link from “Results of BLAST” page)

What did we just do? Identify loci (genes) associated with the sequence. Input was Alcohol Dehydrogenase Identify loci (genes) associated with the sequence. Input was Alcohol Dehydrogenase For each particular “hit”, we can look at that sequence and its alignment in more detail. For each particular “hit”, we can look at that sequence and its alignment in more detail. See similar sequences, and the organisms in which they are found. See similar sequences, and the organisms in which they are found. But there’s much more that can be found on these genes, even just inside NCBI… But there’s much more that can be found on these genes, even just inside NCBI…

More from Entrez Gene

And more…

PubMed

Gene Expression

Detailed expression information

NCBI is not all there is... Links to non-NCBI databases Links to non-NCBI databases Reactome & KEGG for pathways Reactome & KEGG for pathways HGNC for nomenclature HGNC for nomenclature UCSC Human Genome Browser UCSC Human Genome Browser Other important gene/protein resources not linked to: Other important gene/protein resources not linked to: UniProt (most carefully annotated) UniProt (most carefully annotated) PDB (main macromolecular structure repository) PDB (main macromolecular structure repository) Other key biological data sources Other key biological data sources Gene Ontology/Open Biological Ontologies Gene Ontology/Open Biological Ontologies Enzyme Enzyme Scientific society: iscb.org Scientific society: iscb.org Journals, Conferences… Journals, Conferences…

Gene Names: Harder than you think…

Take home messages There are a lot of molecular biology databases, containing a lot of valuable information There are a lot of molecular biology databases, containing a lot of valuable information Not even the best databases have everything (or the best of everything) Not even the best databases have everything (or the best of everything) These databases are moderately well cross- linked, and there are “linker” databases These databases are moderately well cross- linked, and there are “linker” databases Sequence is a good identifier, maybe even better than gene name! Sequence is a good identifier, maybe even better than gene name!