Bunu databases’in icine koy lecture 5i de sonuna

Slides:



Advertisements
Similar presentations
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Advertisements

Introduction to PubMed® (pubmed.gov)
Databases (“knowledge bases”) used in genome analysis
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Biology 4900 Biocomputing. Chapter 2 Molecular Databases and Data Analysis.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
NCBI web resources I: databases and Entrez Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Introduction to Bioinformatics Monday, November 19, 2012 Jonathan Pevsner Bioinformatics M.E:
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Archives and Information Retrieval
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
Introductory Overview
Databases. Where to get data? GenBank – Protein Databases –SWISS-PROT:
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Bioinformatics Jack Min Office 3012 Office hours: TR 12:15 – 4.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy January 29, 2008.
Introduction to Bioinformatics Part 1 of 2 Jonathan Pevsner, Ph.D. M.E: September 8, 2003.
PubMed and other Online Tools Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries/ U.F. Genetics Institute GMS 6014 January.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 31, 2009.
Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner Bioinformatics M.E:
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
1 Database Resources of the National Center for Biotechnology Information Baharak Rastegari MEDG 505 presentation February 3, 2005 David.
Genomics and Personalized Health Care Databases Bailee Ludwig Quality Management.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Introduction to Bioinformatics Introduction to Databases
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy Sept 1, 2010.
NCBI FieldGuide NCBI Molecular Biology Resources March 2007 Using Entrez.
NCBI Literature Databases: PubMed
EB3233 Bioinformatics Introduction to Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Genome Bioinformatics DNA and protein Databases I.
Instructor Prof. Chandrama P. Upadhyaya 220, Life Sciences Building ,
PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.
Chapter 2: Access to Information Jonathan Pevsner, Ph.D.
Introduction to Genes and Genomes with Ensembl
Introduction to Bioinformatics
Retrieving Information: Using Entrez
NCBI Molecular Biology Resources
Archives and Information Retrieval
Access to Sequence Data and Related Information
محسن شیرازی کارشناسي علوم کتابداري و اطلاع رساني پزشکی
Lívia Vasas, PhD 2018 The Nation Library of Medicine and its databases Mozilla Firefox or Google Chrome Lívia Vasas, PhD.
Presentation transcript:

www.ncbi.nlm.nih.gov Bunu databases’in icine koy lecture 5i de sonuna National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov Bunu databases’in icine koy lecture 5i de sonuna Page 24

Fig. 2.5 Page 25 www.ncbi.nlm.nih.gov

Fig. 2.5 Page 25

PubMed is… National Library of Medicine's search service 16 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar) Page 24

Entrez integrates… the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes Page 24

Entrez is a search and retrieval system that integrates NCBI databases Page 24

BLAST is… Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 100,000 searches per day Page 25

OMIM is… Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU Page 25

Cancer Chromosomes Contains cytogenetic, clinical, and reference information from integrated information from the NCI Mitelman Database of Chromosome Aberrations in Cancer, the NCI Recurrent Aberrations in Cancer database, and the NCI/NCBI SKY/M-FISH & CGH Database.

CDD Conserved Domain Database, a collection of sequence alignments and profiles representing protein domains conserved in molecular evolution. Select 'Domains' from the Entrez pull down menu.

CoreNucleotide Contains all nucleotide sequences not included in the EST or GSS subsets.   3D Domains Contains protein domains from the Entrez Structure database. EST A Nucleotide database subset that contains only Expressed Sequence Tag records. Gene Genes and associated information for a number of organisms in addition to and including human.

Genome Genomes of over 1,200 organisms can be found in this database, representing both completely sequenced organisms and those for which sequencing is in progress.   Genome Project A searchable collection of complete and incomplete (in-progress) large-scale sequencing, assembly, annotation, and mapping projects for cellular organisms. dbGaP Associated genotype and phenotype data. GENSAT Gene expression atlas of the mouse central nervous system.

GEO Datasets Curated gene expression and molecular abundance DataSets from NCBI's Gene Expression Omnibus, a gene expression and hybridization array repository.   GEO Profiles Individual gene expression and molecular abundance profiles assembled from the GEO repository. http://www.ncbi.nlm.nih.gov/About/tools/restable_mol.html

Books is… searchable resource of on-line books Page 26

TaxBrowser is… browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms Page 26

Structure site includes… Molecular Modelling Database (MMDB) biopolymer structures obtained from the Protein Data Bank (PDB) Cn3D (a 3D-structure viewer) vector alignment search tool (VAST) Page 26

Accessing information on molecular sequences Page 26

Accession numbers are labels for sequences NCBI includes databases (such as GenBank) that contain information on DNA, RNA, or protein sequences. You may want to acquire information beginning with a query such as the name of a protein of interest, or the raw nucleotides comprising a DNA sequence of interest. DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequence or other record relevant to molecular data. Page 26

What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 GenBank genomic DNA sequence NT_030059 Genomic contig Rs7079946 dbSNP (single nucleotide polymorphism) N91759.1 An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript) NP_007635 RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein Page 27

Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Note: LocusLink at NCBI was recently retired. The third printing of the book has updated these sections (pages 27-31). Page 27

4 ways to access protein and DNA sequences [1] Entrez Gene with RefSeq Entrez Gene is a great starting point: it collects key information on each gene/protein from major databases. It covers all major organisms. RefSeq provides a curated, optimal accession number for each DNA (NM_006744) or protein (NP_007635) Page 27

From the NCBI home page, type “rbp4” and hit “Go” revised Fig. 2.7

revised Fig. 2.7 Page 29

By applying limits, there are now just two entries

Entrez Gene (top of page) Note that links to many other RBP4 database entries are available revised Fig. 2.8 Page 30

Entrez Gene (middle of page)

Entrez Gene (bottom of page)

Fig. 2.9 Page 32

Fig. 2.9 Page 32

Fig. 2.9 Page 32

FASTA format Fig. 2.10 Page 32

What is an accession number? An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence. Examples (all for retinol-binding protein, RBP4): X02775 GenBank genomic DNA sequence NT_030059 Genomic contig Rs7079946 dbSNP (single nucleotide polymorphism) N91759.1 An expressed sequence tag (1 of 170) NM_006744 RefSeq DNA sequence (from a transcript) NP_007635 RefSeq protein AAC02945 GenBank protein Q28369 SwissProt protein 1KT7 Protein Data Bank structure record DNA RNA protein Page 27

NCBI’s important RefSeq project: best representative sequences RefSeq (accessible via the main page of NCBI) provides an expertly curated accession number that corresponds to the most stable, agreed-upon “reference” version of a sequence. RefSeq identifiers include the following formats: Complete genome NC_###### Complete chromosome NC_###### Genomic contig NT_###### mRNA (DNA format) NM_###### e.g. NM_006744 Protein NP_###### e.g. NP_006735 Page 29-30

NCBI’s RefSeq project: accession for genomic, mRNA, protein sequences Accession Molecule Method Note AC_123456 Genomic Mixed Alternate complete genomic AP_123456 Protein Mixed Protein products; alternate NC_123456 Genomic Mixed Complete genomic molecules NG_123456 Genomic Mixed Incomplete genomic regions NM_123456 mRNA Mixed Transcript products; mRNA NM_123456789 mRNA Mixed Transcript products; 9-digit NP_123456 Protein Mixed Protein products; NP_123456789 Protein Curation Protein products; 9-digit NR_123456 RNA Mixed Non-coding transcripts NT_123456 Genomic Automated Genomic assemblies NW_123456 Genomic Automated Genomic assemblies NZ_ABCD12345678 Genomic Automated Whole genome shotgun data XM_123456 mRNA Automated Transcript products XP_123456 Protein Automated Protein products XR_123456 RNA Automated Transcript products YP_123456 Protein Auto. & Curated Protein products ZP_12345678 Protein Automated Protein products

Four ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31

DNA RNA protein complementary DNA (cDNA) UniGene Fig. 2.3 Page 23

UniGene: unique genes via ESTs • Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene UniGene clusters contain many expressed sequence tags (ESTs), which are DNA sequences (typically 500 base pairs in length) corresponding to the mRNA from an expressed gene. ESTs are sequenced from a complementary DNA (cDNA) library. • UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Pages 20-21

Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 Fig. 2.3 Page 23

Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10

Cluster sizes in UniGene (human) Cluster size (ESTs) Number of clusters 1  42,800 2 6,500 3-4 6,500 5-8 5,400 9-16 4,100 17-32 3,300 500-1000 2,128 2000-4000 233 8000-16,000 21 16,000-30,000 8 UniGene build 194, 8/06

UniGene: unique genes via ESTs Conclusion: UniGene is a useful tool to look up information about expressed genes. UniGene displays information about the abundance of a transcript (expressed gene), as well as its regional distribution of expression (e.g. brain vs. liver). We will discuss UniGene further later (gene expression). Page 31

Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 31

Ensembl to access protein and DNA sequences Try Ensembl at www.ensembl.org for a premier human genome web browser. We will encounter Ensembl as we study the human genome, BLAST, and other topics.

click human

enter RBP4

Five ways to access DNA and protein sequences [1] Entrez Gene with RefSeq [2] UniGene [3] European Bioinformatics Institute (EBI) and Ensembl (separate from NCBI) [4] ExPASy Sequence Retrieval System (separate from NCBI) Page 33

ExPASy to access protein and DNA sequences ExPASy sequence retrieval system (ExPASy = Expert Protein Analysis System) Visit http://www.expasy.ch/ Page 33

Fig. 2.11 Page 33

Example of how to access sequence data: HIV-1 pol There are many possible approaches. Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol Page 34

Following the “genome” link yields a manageable three results Searching for HIV-1 pol: Following the “genome” link yields a manageable three results Page 34

Example of how to access sequence data: HIV-1 pol For the Entrez query: hiv-1 pol there are about 40,000 nucleotide or protein records (and >100,000 records for a search for “hiv-1”), but these can easily be reduced in two easy steps: --specify the organism, e.g. hiv-1[organism] --limit the output to RefSeq! Page 34

over 100,000 nucleotide entries for HIV-1 only 1 RefSeq

Examples of how to access sequence data: histone query for “histone” # results protein records 21847 RefSeq entries 7544 RefSeq (limit to human) 1108 NOT deacetylase 697 At this point, select a reasonable candidate (e.g. histone 2, H4) and follow its link to Entrez Gene. There, you can confirm you have the right gene/protein. 8-12-06

Access to Biomedical Literature Page 35

PubMed at NCBI to find literature information

PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has >14 million records dating back to 1966. Page 35

MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature. Page 35

PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries (capitalize AND, OR, NOT) lipocalin AND disease Try using “limits” Try “Links” to find Entrez information and external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/ Page 35

1 AND 2 1 2 lipocalin AND disease (60 results) 1 OR 2 1 2 lipocalin OR disease (1,650,000 results) 1 NOT 2 1 2 lipocalin NOT disease (530 results) Fig. 2.12 Page 34 8/04

“globin” is present “globin” is absent Article contents: “globin” is present “globin” is absent Search result: false positive (article does not discuss globins) “globin” is found true positive false negative (article discusses globins) “globin” is not found true negative 8/06

WelchWeb is available at http://www.welch.jhu.edu

Brian Brown (bbrown20@jhmi.edu) and http://www.welch.jhu.edu Brian Brown (bbrown20@jhmi.edu) and Carrie Iwema (iwema@jhmi.edu) are the Welch Medical Library liasons to the basic sciences