Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.

Slides:



Advertisements
Similar presentations
Databases (“knowledge bases”) used in genome analysis
Advertisements

Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
1.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Archives and Information Retrieval
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
Lecture 2.21 Retrieving Information: Using Entrez.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Introduction to Bioinformatics BLAST. Introduction –What is BLAST? –Query Sequence Formats –What does BLAST tell you? Choices –Variety of BLAST –BLAST.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
On line (DNA and amino acid) Sequence Information
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Organizing information in the post-genomic era The rise of bioinformatics.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
NCBI Literature Databases: PubMed
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Computer Storage of Sequences
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.
PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007.
Introduction to PubChem BioAssay
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
What is BLAST? Basic BLAST search What is BLAST?
Retrieving Information: Using Entrez
Archives and Information Retrieval
Introduction to PubChem BioAssay
Mangaldai College, Mangaldai
Bioinformatics and BLAST
Genomes and Their Evolution
BLAST.
BLAST.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Presentation transcript:

Keeping Current: Genetics Resources

This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles Nucleotide sequences Protein sequences Three-dimensional structures Complete genomes and maps

Information Pyramid Start at the bottom of the pyramid for background information

Finding Background Information Genetics Home Reference Consumer information about genetic conditions Includes condition summaries, gene summaries, chromosome summaries, and glossary

Finding Background Information OMIM--Online Mendelian Inheritance in Man database Catalog of human genes and genetic disorders Links to MEDLINE and sequence information ov/sites/entrez?db=omim ov/sites/entrez?db=omim

Finding Background Information NCBI Bookshelf Collection of biomedical books Searchable by topic v/sites/entrez?db=books v/sites/entrez?db=books

Finding Journal Articles PubMed/MEDLINE Coverage from the early 1950’s-present, with over 17 million articles. Remote access to licensed fulltext articles available via Get VCU Helpful limits and links to other databases including OMIM, NCBI Books, and Entrez databases Import citations into RefWorks Tutorials available at ed/pubmed.html ed/pubmed.html

Using Entrez to Find Molecular Biology Information The Data Many different types of data domains (databases) Sources are generally comprehensive databases Domain can include both primary (archival) and curated data Search Features Text term searches Related records Links across databases

Searching all databases at once

Searching Individual Entrez Databases Four Ways to Search Basic Advanced Method 1 Do a separate search for each term or phrase and combine searches using History Advanced Method 2 Stack your query one step at a time (iterative searching) using Preview/Index Complex Boolean Query

Understanding Search Results Mixed types of data In Entrez Nucleotides, for example, search results can include: Archival data from GenBank/EMBL/DDBJ Complete & characterized sequences Sequence fragments Long genomic regions that are finished or in progress Sequences of different molecule types (e.g., DNA, mRNA) Curated sequence records (RefSeq) Sequence records drawn from PDB structure records

Entrez Nucleotide A collection of sequences from several sources GenBank RefSeq PDB Can limit by field (such as organism), molecule, gene location, source, and date

GenBank Archival database of nucleotide sequences from >160,000 organisms Records annotated with coding region (CDS) features also include amino acid translations Each record represents the work of a single lab Redundant; can have many sequence records for a single gene

RefSeq Database of reference sequences Curated Non-redundant A representative GenBank record is used as the source for a RefSeq record Value-added information is added by experts Each record offers an encapsulation of the current understanding of a gene or protein, similar to a review article Variety of accession number prefixes (NM, NP, etc.) and status codes (provisional, reviewed, etc.). Includes genomic DNA, mRNA, and protein sequences

Entrez Gene Genes that have been annotated on the complete genomes in Entrez Genomes (i.e., genes annotated on RefSeq NC_* records).RefSeqNC_* records Over 5086 organisms are represented in Entrez Gene Includes organisms with genomes that were completely sequenced or are in progress. Each record represents a single gene from a given organism Types and quantity of information present in an individual record depend upon what is available for a particular organism or gene A unique identifier or GeneID assigned by NCBI A preferred symbol Any one or more of: sequence information map information official nomenclature from an authority list alternate gene symbols summary of gene/protein function published references that provide additional information on function expression homology data

Representative rather than comprehensive information for a gene Reference list provides a selective rather than exhaustive list of articles Sequence records listed in an Entrez Gene record are representative rather than exhaustive Additional information can be retrieved by following links to the various Entrez databases, then displaying additional, "related records" within a database.

Entrez Genomes Over 3000 completely sequenced organisms in GenBank, including archaea, bacteria, eukaryotes, viruses, viroids, and plasmids. Includes organisms whose genomes are completed, as well as data from some genome sequencing projects that are currently in progress. Provides graphical overviews of complete genomes/chromosomes, and the ability to explore regions of interest in progressively greater detail

Sample Eukaryotic Genome Homo sapiens Homo sapiens

BLAST Compares sequence similarities Use when you have a SNP sequence in FASTA format Run nucleotide and protein queries

Entering a BLAST Query Enter the sequence in FASTA Format FASTA definition line ("def line") that begins with a >, followed by single line description Up to 80 nucleotide bases or amino acids per line

What does BLAST tell you? Gives the putative identity and function of the sequence Helps to direct experimental design to prove the function Find similar sequences in model organisms (e.g., yeast, C. elegans, mouse), which can be used to further study the gene Compare complete genomes against each other to identify similarities and differences among organisms

BLAST Programs: Which One to Use? Depends on-- what type of query sequence you have (nucleotide or protein) what type of database you will search against (nucleotide or protein)

Jurassic Park DNA Sequence >DinoDNA "Dinosaur DNA" from Crichton JURASSIC PARK p. 103 nt GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT

Search Results: Understanding the Output Reminders about your specific query RID Query sequence reminder (contains the information from your FASTA def line) What database you searched against Graphical summary Shows where the hits aligned to your query Colors indicate score range Mouse over a colored bar to see info about that hit Text summary (GI numbers and Def lines) GI links to complete record in Entrez Score links to pairwise alignment between your query sequence and the hit Pairwise alignments BLAST statistics for your search

Protein-Protein BLAST (blastp) Usually better to use than nucleotide- nucleotide BLAST Since the genetic code is degenerate, blastn can often give less specific results than blastp Translated BLAST (blastx) One way to do a protein BLAST search if you have a nucleotide query sequence BLAST program does the translating for you, in all 6 reading frames

Structures Molecular Modeling Database (MMDB) Drawn from PDB Includes only experimentally resolved structures (i.e., by NMR and X-ray crystallography) Includes value added information, such as: Explicit chemical graph showing the bonding between atoms Atomic substructure classification Direct 3-D structure comparisons using the VAST (Vector Alignment Search Tool) algorithm to identify structural neighbors Links to associated records in other Entrez databases

Fewer Structures than Protein Sequences Most protein sequences in the Entrez Protein database do not yet have a resolved structure The chances of finding a resolved structure for a given protein sequence are low. Therefore, if the traditional methods for searching an Entrez database do not find a structure for your protein of interest, you might need to use some creative "back doors" for retrieving structures that might have some relationship to your protein of interest.

"Front Doors" and "Back Doors" for Retrieving Structures If a protein sequence already HAS a resolved structure: Front Door #1: Search the Entrez Structure database directly Front Door #2: Search the Entrez Protein database and then "Display" the associated Structure records. If a protein sequence DOES NOT HAVE a resolved structure: Back Door #1: Find similar protein sequences and see if any of them have a resolved 3-D structure. Back Door #2: Identify the conserved domain(s) in the protein of interest and view the 3-D structure of the domain(s).

Cn3D ("See in 3-D") Viewing Software Before we can view a 3-D structure from the structure summary page, we need to install a structure viewing software program. A variety of public domain programs exist, such as: Cn3D ("See in 3-D") Rasmol

Structure Record Summary

PubChem Small organic molecules Structures and biological activities Search by descriptive terms, chemical properties, and structural similarity Links to other NCBI databases

PubChem Databases PubChem Substance Substance information includes chemical structures, synonyms, registration IDs, descriptions, related urls Database cross-reference links to PubMed, protein 3D structures PubChem Compound There is one PubChem Compound record for each unique substance, and for each unique substance component. There can be multiple PubChem Substance records associated Includes all standardized structures, mixture components, and precalculated structure neighboring links. Compound information includes structure, compound property information (molecular weight, formula, xLogP, count of the rotatable bonds, H bond donor, H bond acceptor, etc.), and structure description (SMILES, IUPAC name, INCHI). PubChem BioAssay The assay database consists of deposited bioactivity data and descriptions of bioactivity assays used for screening of the chemical substances contained in PubChem Substance, including descriptions of the conditions and the readouts (bioactivity levels) specific to the screening procedure. The assay database includes DTP/NCI's 710 million lines of in vitro and in vivo data covering from cancer, HIV, to many other fields.

Education Resources Introduction to Molecular Biology Resources ex.html ex.html NCBI Education Resources