NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1.

Slides:



Advertisements
Similar presentations
GBrowse at TAIR Philippe Lamesch TAIR curator. Seqviewer.
Advertisements

Part I: Tips and Techniques from curators GBrowse at TAIR David Swarbreck.
What is RefSeqGene?.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Created as a part of NLM in 1988 Establish public databases Research in computational biology Develop software tools for sequence analysis Disseminate.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
NCBI Genome Resources Using NCBI Resources for Gene Discovery Kim D. Pruitt Transcriptome 2002 National Center for Biotechnology Information (NCBI) National.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Map Curation on GrainGenes Victoria Carollo, Gerard Lazo, David Matthews, Olin Anderson Biological Databases Curators Meeting October 2003.
class web site /evolgenome Model system toolkit Genome sequences EST collections.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
BioSci 145B lecture 1 page 1 © copyright Bruce Blumberg All rights reserved mRNA frequency and cloning mRNA frequency classes –classic references.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
On line (DNA and amino acid) Sequence Information
The Ensembl Gene set The “Genebuild” 21 April 2008.
TAIR, PMN, SGN and Gramene workshop Focus on comparative genomics and new tools Philippe Lamesch, A. S. Karthikeyan, Aureliano Bombarely Gomez, Pankaj.
Tomato genome annotation pipeline in Cyrille2
NCBI FieldGuide A Minimal Guide to NCBI Nucleotide Resources.
Meiosis Organisms that reproduce sexually have specialized cells called gametes (sex cells) Gametes are the result of a type of cell division called meiosis.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
Essential Bioinformatics and Biocomputing Module (Tutorial) Biological Databases Lecturer: Chen Yuzong Jan 2003 TAs: Cao Zhiwei Lee Teckkwong, Bernett.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Genomics and Personalized Health Care Databases Bailee Ludwig Quality Management.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
NCBI FieldGuide NCBI Molecular Biology Resources Part 2 November 2008 Peter Cooper.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Introduction to Bioinformatics Introduction to Databases
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
An Introduction to Ensembl Presented By Hilary O. Pavlidis.
Organizing information in the post-genomic era The rise of bioinformatics.
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Cluster I. Cluster II Cluster III (contiued) Cluster IV.
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
NCBI Literature Databases: PubMed
The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes.
A Field Guide to GenBank and NCBI Molecular Biology Resources
GENE INDEXING Janice Ward Indexer/Reviser Index Section, NLM.
Copyright OpenHelix. No use or reproduction without express written consent1.
It will help in preparing for the exam to read:
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Copyright OpenHelix. No use or reproduction without express written consent1.
What is BLAST? Basic BLAST search What is BLAST?
Welcome to the combined BLAST and Genome Browser Tutorial.
WSSP Chapter 10 Literature Search Where do you learn about the function of your gene? atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
NCBI FieldGuide NCBI Molecular Biology Resources A Field Guide part 2 (post intermission) September 30, 2004 ICGEB.
DEVELOPMENTAL BIOLOGY
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
Introduction to Genes and Genomes with Ensembl
Retrieving Information: Using Entrez
NCBI Molecular Biology Resources
Introduction to Bioinformatics II
Part I: Tips and Techniques from curators
Gene Safari (Biological Databases)
How to search NCBI.
Presentation transcript:

NCBI FieldGuide September 29, 2004 ICGEB NCBI Molecular Biology Resources A Field Guide part 1

NCBI FieldGuide Types of Databases Primary Databases –Original submissions by experimentalists –Database staff review and may organize the data, but we don’t add/modify additional information –Records are “owned” and updated by their authors Examples: GenBank, SNP, GEO Derivative Databases –Human-curated (compilation and correction of data)  Examples: Gene(LocusLink), Structure & Literature databases –Computationally-Derived  Example: UniGene –Combination  Examples: RefSeq, Genome Assembly, Domain databases

NCBI FieldGuide NCBI’s Derivative Sequence Database genomes transcriptsproteins GenBank

NCBI FieldGuide –Forming the “best representative” sequence –Standardizing nomenclature and record structure –Adding annotation (references, sequence features) RELEASE 6 IS NOW AVAILABLE ON THE FTP SITE!

NCBI FieldGuide Curated genomic DNA (NC, NT, NW) Curated Model mRNA (XM) (XR) Curated mRNA (NM) (NR) Model protein (XP) RefSeq Curation Processes Protein (NP) Scanning....

NCBI FieldGuide LOCUS NC_ bp DNA circular BCT 30-JUL-2003 DEFINITION Escherichia coli K12, complete genome. ACCESSION NC_ VERSION NC_ GI: KEYWORDS. SOURCE Escherichia coli K12. ORGANISM Escherichia coli K12 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Escherichia. REFERENCE 1 (bases 1 to ) AUTHORS Blattner,F.R., Plunkett,G. III, Bloch, C.A., Perna, N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode, C.K., Mayhew,G.F., Gregor,J., Davis,N.W., Kirkpatrick,H.A., Goeden,M.A., Rose,D.J., Mau,R. and Shao,Y. TITLE The complete genome sequence of Esherichia coli K12. JOURNAL Science 277 (5331), (1997) MEDLINE PUBMED REFERENCE 2 (bases 1 to ) AUTHORS Blattner,F.R. TITLE Direct submission JOURNAL Sumbitted (16-JAN-1997) Guy Plunkett III, Laboratory of Genetics, University of Wisconsin, 445 Henry Mall, Madison, WI 53706, USA. Phone: Fax: RefSeq Chromosomes: NC_ gene /gene="mutL" /locus_tag="b4170" /note="synonym: mut-25" CDS /gene="mutL" /locus_tag="b4170" /function="methyl-directed mismatch repair" /codon_start=1 /transl_table=1111 /product="MutL" /protein_id="NP_ "NP_ /db_xref="GI: " /translation="MPIQVLPPQLANQIAAGEVVERPASVVKELVENSLDAGATRIDI DIERGGAKLIRIRDNGCGIKKDELALALARHATSKIASLDDLEAIISLGFRGEALASI SSVSRLTLTSRTAEQQEAWQAYAEGRDMNVTVKPAAHPVGTTLEVLDLFYNTPARRKF LRTEKTEFNHIDEIIRRIALARFDVTINLSHNGKIVRQYRAVPEGGQKERRLGAICGT AFLEQALAIEWQHGDLTLRGWVADPNHTTPALAEIQYCYVNGRMMRDRLINHAIRQAC EDKLGADQQPAFVLYLEIDPHQVDVNVHPAKHEVRFHQSRLVHDFIYQGVLSVLQQQL ETPLPLDDEPQPAPRSIPENRVAAGRNHFAEPAAREPVAPRYTPAPASGSRPAAPWPN AQPGYQKQQGEVYRQLLQTPAPMQKLKAPEPQEPALAANSQSFGRVLTIVHSDCALLE RDGNISLLSLPVAERWLRQAQLTPGEAPVCAQPLLIPLRLKVSAEEKSALEKAQSALA ELGIDFQSDAQHVTIRAVPLPLRQQNLQILIPELIGYLAKQSVFEPGNIAQWIARNLM SEHAQWSMAQAITLLADVERLCPQLVKTPPGGLLQSVDLHPAIKALKDE" Annotation of Gene, CDS, and other features BASE COUNT a c g t ORIGIN 1 cgtcttcatt gtcagacagc agaatttgta cgcgctgttc ggcttgttgt aatttggcct 61 gcccctgacg tgccagctgc acgccgcgtt cgaactcgtt cagcgcctct tccagcggca 121 ggtcgccact ttccagacgg gttacaatct gttccagctc gctcagcgcc ttttcaaagc 181 tggcgggcgc ctcatttttc ttcggcataa tgaatgtctg actctcaata tttttcgccc 241 cgtcatggta acggactcag ggcaaatagc aaataacgcg caatggtaag gtgatgtgca 301 cagcaaagcg atgttagtgg tatacttccg cgcctggatg cagccgcagg tgtgggctgc 361 tgtatttttc cctatacaag tcgcttaagg cttgccaacg aaccattgcc gccatgaagt 421 ttatcattaa attgttcccg gaaatcacca tcaaaagcca atctgtgcgc ttgcgcttta 481 taaaaatcct taccgggaac attcgtaacg ttttaaagca ctatgatgag acgctcgctg 541 tcgtccgcca ctgggataac atcgaagttc gcgcaaaaga tgaaaaccag cgtctggcta 601 ttcgcgacgc tctgacccgt attccgggta tccaccatat tctcgaagtc gaagacgtgc 661 cgtttaccga catgcacgat attttcgaga aagcgttggt tcagtatcgc gatcagctgg 721 aaggcaaaac cttctgcgta cgcgtgaagc gccgtggcaa acatgatttt agctcgattg 781 atgtggaacg ttacgtcggc ggcggtttaa atcagcatat tgaatccgcg cgcgtgaagc 841 tgaccaatcc ggatgtgact gtccatctgg aagtggaaga cgatcgtctc ctgctgatta 901 aaggccgcta cgaaggtatt ggcggtttcc cgatcggcac ccaggaagat gtgctgtcgc 961 tcatttccgg tggtttcgac tccggtgttt ccagttatat gttgatgcgt cgcggctgcc Genome sequence

NCBI FieldGuide Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators ftp://ftp.ncbi.nih.gov/refseq/release RefSeq: NCBI’s Derivative Sequence Database RefSeq Benefits

NCBI FieldGuide Genes: The Gene Summary Database Summary pages of curated information about genetic loci for organisms in the RefSeq project. ►Graphics ►Gene information ►Bibliography (PubMed links) ►General gene information ►NCBI Reference Sequences ►Related sequences ►Additional Links Announcing!

NCBI FieldGuide Entrez Gene

NCBI FieldGuide

NM/NP Records in Entrez Gene

NCBI FieldGuide UniGene Records are clusters of mRNAs and ESTs that ideally represent single genes Records are created automatically by a modified BLAST algorithm UniGene provides a means to identify an EST or unannotated mRNA Clustering Expressed Sequences

NCBI FieldGuide A Cluster of ESTs: Arabidopsis serine protease query 5’ EST hits 3’ EST hits Sequence & Expression

NCBI FieldGuide Embryophyta Cycadopsida Pinus taeda (loblolly pine) Bryopsida Physcomitrella patens Eudicotyledons Arabidopsis thaliana (thale cress) Glycine max (soybean) Helianthus annus (sunflower) Lactuca sativa (lettuce) Lotus corniculatus (lotus flower) Lycopersicon esculentum (tomato) Malus x domestica (apple) Medicago truncatula (barrel medic) Populus tremula/tremuloides (poplar) Solanum tuberosum (potato) Vitis vinifera (wine grape) Liliopsida Hordeum vulagre (barley) Oryza sativa (rice) Saccharum officinarum (noble cane) Sorghum bicolor (sorghum) Triticum aestivum (bread wheat) Zea mays (corn) UniGene Collections As of July 2004 Chordata Mammalia Bos taurus (cow) Canis familiaris (dog) Homo sapiens (human) Mus musculus (mouse) Ovis aries (sheep) Rattus norvegicus (rat) Sus scrofa (pig) Aves Gallus gallus (chicken) Amphibia Xenopus laevis (african clawed frog) Xenopus tropicalis (western clawed frog) Actinopterygii Danio rerio (zebra fish) Oncorhynchus mykiss (rainbow trout) Oryzias Latipes (japanese rice fish) Salmo salar (salmon) Ascidiacea Ciona intestinalis (sea squirt) Arthropoda Insecta Anopheles gambiae (malaria mosquito) Apis mellifera (honeybee) Drosophila melanogaster (fruit fly) Bombyx mori (silkworm) Mycetozoa Dictyosteliida Dictyostedlium discoideum (slime mold) Echinodermata Echinoidea Strongylocentrotus purpuratus Nematoda Chromadorea Caenorhabditis elegans Platyhelminthes Trematoda Schistosoma mansoni Chlorophyta Chlorophycaea Chlamydomonas reinhardii Apicomplexa Coccidia Toxoplasma gondii

NCBI FieldGuide Finding UniGene Clusters by link by Entrez search

NCBI FieldGuide UniGene Cluster for PRNP

NCBI FieldGuide Complete Genomes as of June 2004 Organelles: – Mitochondria (558) – Plastids (40) – Plasmids (626) – Nucleomorphs (3) Viruses (1923) Archaebacteria (44) Eubacteria (176) Eukaryotes (61)

NCBI FieldGuide Simple Genomes Full chromosomal sequences are provided Genes are annotated The annotation can be shown graphically and linked to sequence records

NCBI FieldGuide

mutL

Complex Genomes Sequences are provided complete or we help assemble Heavy annotation: Genes, transcript regions & ORFs, sequence variations & markers, clones, ESTs, etc. MapViewerThe annotation can be shown graphically and linked to other databases using the MapViewer

NCBI FieldGuide Viewing Complex Genomes Map Viewer Home Page Shows all supported organisms Provides links to genomic BLAST –Genome Overview Page Provides links to individual chromosomes Shows hits on a genome graphically –Chromosome Viewing Page Allows interactive views of annotation details Provides numerous maps unique to each genome NCBI Map Viewer

NCBI FieldGuide Map Viewer Home Page

NCBI FieldGuide Genome Overview Page Genomic BLAST Species-specific help! Search the maps

NCBI FieldGuide PRNP Search For Human PRNP

NCBI FieldGuide Human PRNP on Genome View

NCBI FieldGuide Chromosome Viewing Page Master Map with exploded content Genes UniGene Clone Add or remove maps Zooming Controls Map Summary

NCBI FieldGuide Zooming in… Left click

NCBI FieldGuide Map Viewer Analysis Tools Link to OMIM Link to Protein Evidence Viewer Homologene Sequence Viewer Download Sequence ModelMaker Homologene

NCBI FieldGuide Homologene

NCBI FieldGuide Homology Comparisons on Map Viewer

NCBI FieldGuide Intermission