Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-utilities: Short course. The Entrez Query System at NCBI.

Similar presentations


Presentation on theme: "E-utilities: Short course. The Entrez Query System at NCBI."— Presentation transcript:

1 E-utilities: Short course

2 The Entrez Query System at NCBI

3 Search one or all of 31 databases. Generate brief “document summaries” for a list of records. Link from one list of records to another. Perform boolean operations on lists of records. Format records for display and download. Entrez Functions

4 Each record in an Entrez database is assigned an integer called a UID, or “unique identifier”. Entrez transactions are performed on lists of UIDs. Transactions include boolean operations and the tracking of links within and between database records. Entrez Transactions

5 Entrez supports text searches with field restrictions, boolean operators (sometimes implicit), and term grouping Field restrictions vary among the databases Term-mapping happens Explicitly fielded searches are not term- mapped Quoted phrases are searched as a unit Entrez Database Queries

6 PubMed : "chronic obstructive pulmonary disease"[Text Word] OR "pulmonary disease, chronic obstructive"[MeSH Terms] OR ("common cold"[TIAB] NOT Medline[SB]) OR "common cold"[MeSH Terms] OR "cold"[MeSH Terms] OR cold[Text Word] PMC : "pulmonary disease, chronic obstructive"[MeSH Terms] OR "common cold"[MeSH Terms] OR "cold"[MeSH Terms] OR cold[Text Word] Nucleotide : cold[All Fields] Taxonomy : cold[All Names] Term: cold

7 PubMed : ("mice"[TIAB] NOT Medline[SB]) OR "mice"[MeSH Terms] OR mouse[Text Word] PMC : "mice"[MeSH Terms] OR mouse[Text Word] Nucleotide : "Mus musculus"[Organism] OR mouse[All Fields] Taxonomy : mouse[All Names] Genome : "Mus musculus"[Organism] OR mouse[All Fields] Term: mouse

8

9

10 Viewing Indexed Terms on the Web Preview-Index Tab

11 miller baker: miller[All Fields] AND baker[All Fields] miller j baker m: miller j[Author] AND baker m[Author] AF123456, P12243,555 : direct retrieval of record PubMed, PMC, Nucleotide, Protein, Structure and others All Databases Patterns are Recognized

12 Separate search history is maintained for each database. Previous searches can be recalled and combined using a query key and a cookie, called a “WebEnv”. Available on the Web under the 'History Tab' Search History

13 Brief summaries of database records are generated quickly on frontend servers. Full records are retrieved from backend machines. DocSums

14 A set of eight server-side programs. Support a uniform URL syntax. Translate a standard set of URL-encoded input parameters for the array of programs comprising the Entrez system. Eutilities

15 Searches: esearch.fcgi DocSums: esummary.fcgi Links: elink.fcgi Uploads: epost.fcgi Downloads: efetch.fcgi Global Query: egquery.fcgi Spelling: espell.fcgi Information: einfo.fcgi Entrez Functions and EUtils

16 The Base URL http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ esearch.fcgi? egquery.fcgi? esummary.fcgi? efetch.fcgi? einfo.fcgi?elink.fcgi? epost.fcgi? eutil.fcgi?

17 URL Parameters esearch.fcgi?BASE/ db=nucleotide&term=mouse[orgn] Parameters are separated by & symbols db = nucleotide term = mouse[orgn] We need to know the following: 1.What parameters are available 2.What values they accept

18 A Docsum via esummary.fcgi and via the Web

19 A Simple Eutilities Pipeline

20

21

22 An Esearch Followed by Multiple Rounds of Efetch http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?usehistory=y&db=gene&term=mammalia[orgn] Elapsed time: 0 seconds 0%, 0 records of 161815 retrieved. Tue Jan 25 20:46:32 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=0&rettype=native&WebEnv=0ImHxGDH2z I93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 40 seconds 0.3%, 500 records of 161815 retrieved. Tue Jan 25 20:47:09 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=500&rettype=native&WebEnv=0ImHxGDH 2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 79 seconds 0.61%, 1000 records of 161815 retrieved. Tue Jan 25 20:47:48 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1000&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 118 seconds 0.92%, 1500 records of 161815 retrieved. Tue Jan 25 20:48:27 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=1500&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 158 seconds 1.23%, 2000 records of 161815 retrieved. Tue Jan 25 20:49:07 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2000&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xmlElapsed time: 204 seconds 1.54%, 2500 records of 161815 retrieved. Tue Jan 25 20:49:53 EST 2005 http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?tool=datahog&db=gene&retmax=500&retstart=2500&rettype=native&WebEnv=0ImHxGD H2zI93UMwxrTRTh-nretlso4ZBZO_FNE7AFZXOOxNd9Rz@Qfb2dYIOFhUAABd9ei4&query_key=1&retmode=xml

23 A Download of 161825 Mammalian Entrez Gene Records Efetch calls SECONDSSECONDS

24 EFetch Retrieves formatted data records matching a set of UIDs INPUT db Entrez database to search OUTPUT Varied Formatted data records efetch.fcgi?BASE/ db=nucleotide&id=49619226,49615287 id Set of UIDs To download data records Why us it?

25 Databases that Support EFetch Literature PubMed Journals PubMed Central OMIM Sequences Nucleotide Protein Genome Popset SNP Other Gene Taxonomy PC Substance PC Compound Unique queuing interface!

26 EFetch Formatting Parameters rettype retmode Determines the type of data record returned (flat file, FASTA, EST, accession, etc.) Determines the format (mode) of data record returned (text, HTML, XML) Be warned! These settings are very dependent on the database These settings interact with one another Not all possible combinations are supported

27 The Entrez History Server Entrez History Server Stores UID lists resulting from previous searches ESearch EPost The History Server represents the location of stored UID sets with two parameters: WebEnv query_key A string specifying a cookie assigned by the History Server An integer equivalent to the History number on the web ELink

28 EPost Stores a list of UIDs on the History Server INPUT db Entrez database containing UIDs OUTPUT XML epost.fcgi?BASE/ &db=nucleotide&id=49619226,49615287 id List of UIDs WebEnvquery_key To upload a large file or set of UIDs Why use it? WebEnv Pre-existing WebEnv to use

29 Using ESearch to Post Results db=nucleotide&term=mouse[orgn]&usehistory=y WebEnv query_key

30 Accessing the History Entrez History Server EPost ESearch usehistory=y ELink cmd=neighbor_history ESearch ESummary ELink EFetch WebEnv query_key

31 The Big Picture ESearch EPost ESummary EFetch ELink Entrez History Server UID List Entrez query WebEnv query_key UID List usehistory=y cmd=neighbor_history

32 ELink Retrieves UIDs in database B linked to a set of UIDs in database A INPUT db Entrez database(s) to link to; can be a list! OUTPUT XML Set(s) of linked UIDs elink.fcgi?BASE/ dbfrom=nucleotide&db=protein&id=49619226 id List of UIDs dbfrom Entrez database to link from cmd ELink command mode (default = neighbor) To find related data in another database To find neighbors within a database Why use it?

33 Computational Neighbors in ELink Retrieves UIDs linked to other UIDs in the same database dbdbfrom = elink.fcgi?BASE/ dbfrom=protein&db=protein&id=15718680 term Entrez query that ELink uses to limit the set of neighbors Supported databases: pubmedcdd nucleotidegeo proteingds domains

34 Link Names All possible link names for a database are given by EInfo Link names for a given call are given in the ELink XML output gene_protein Links from gene to protein protein_gene Links from protein to gene Links from gene to snp gene_snp gene_snp_genegenotype Links from gene to snps that have genotype data genome_nucleotide_comp_mrna Links from a chromosome to all mRNAs transcribed by genes on that chromosome

35 Passing One UID Set to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1,G2,G3

36 Passing Multiple UID Sets to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1&id=G2&id=G3

37 Passing Multiple UID Sets to ELink G1G2G3 P1P2P3P4P5P6 dbfrom=gene&db=protein&id=G1,G2&id=G3

38 Finally, Now for Your UID! Please use both of these parameters in your URLs in case there are problems tool email a unique name for your software package your email address, so we can contact you… &tool=mr.gene&email=funwithgenes@big.genomics.com

39 Accessing Entrez links –Hard links between databases –Computational links within a database –Filtering according to the existence of links

40 Entrez Links for GI 4680720 Microarray datasets for M17755 Gene annotation based on M17755 DNA/RNA sequences similar to M17755 Human phenotypes involving TPO Protein translation of M17755 Literature abstracts about M17755 Sequence polymorphisms in M17755 Source organism of M17755 STS markers in the TPO gene TPO links beyond NCBI Full text online articles about M17755 All polymorphisms in the TPO gene Graphical view of TPO gene annotation


Download ppt "E-utilities: Short course. The Entrez Query System at NCBI."

Similar presentations


Ads by Google