Presentation is loading. Please wait.

Presentation is loading. Please wait.

PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007.

Similar presentations


Presentation on theme: "PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007."— Presentation transcript:

1 PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007

2 PubChem—Substance, Compound, BioAssay What is PubChem?  A public repository of electronic representations of small molecules and associated bioactivity assay data  A component of the NIH Molecular Libraries RoadMap  Part of the NCBI Entrez search and linking system  A system of four components:  PubChem Substance  PubChem Compound  PubChem BioAssay  PubChem Structure Search

3 PubChem—Substance, Compound, BioAssay http://nihroadmap.nih.gov/

4 PubChem—Substance, Compound, BioAssay Chemical Diversity Technology Development Screening Instrumentation Assay Development Predictive ADMET Compound Repository (MLSMR) Informatics Chem- informatics Research Centers The Molecular Libraries Roadmap: An Integrated Initiative Molecular Libraries Screening Centers Network ( M L S C N )

5 PubChem—Substance, Compound, BioAssay Bethesda The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH –Establish public databases –Research in computational biology –Develop software tools for sequence analysis –Disseminate biomedical information

6 PubChem—Substance, Compound, BioAssay What does NCBI do? Accepts submissions of primary data. Develops tools to analyze these data. Uses these tools to create derivative databases based on the primary data. Provides free search, linking, and retrieval of data, mainly through the Entrez system.

7 PubChem—Substance, Compound, BioAssay http://www.ncbi.nlm.nih.gov BLASTSequence VASTProteinStructure EntrezText PubChemStructureSearch Small Molecule Structure

8 PubChem—Substance, Compound, BioAssay http://pubchem.ncbi.nlm.nih.gov http://pubchem.ncbi.nlm.nih.gov/search/ Data Analysis Tools: Differential display of data via structure clustering, structure- activity heat maps and customizable result retrieval tables. http://pubchem.ncbi.nlm.nih.gov/assay/assaycluster.cgi

9 PubChem—Substance, Compound, BioAssay Types of Databases Primary Databases –Original submissions by experimentalists –Content controlled by the submitter Examples: GenBank, SNP, GEO, PubChem Substance and BioAssay Derivative Databases –Built from primary data –Content controlled by third party (NCBI) Examples: RefSeq, RefSNP, GDS, PubChem Compound

10 PubChem—Substance, Compound, BioAssay PubChem Databases  Composed of Experimental data with Background, Protocols and Results for bioactivity screens of chemical substances described in PubChem Substance  Submitters add “Hard” links to PubChem Substance records and outside sources.  Composed of Substances which may be of known or unknown composition and also may contain a discrete compound or mixtures of compounds.  Submitters add “Hard” links to PubChem BioAssay records and outside sources.  Composed of discrete compounds with known chemical structure.  Summary reports about the known chemical compounds described in PubChem Substance.  Addition of Automated “Soft” Links which can be replicated on PubChem Substance & BioAssay records. Primary Databases: information is provided, updated and “owned” by Submitters. Derivative Database: information is provided, updated and “owned” by Submitters.

11 PubChem—Substance, Compound, BioAssay How does data get into PubChem?

12 PubChem—Substance, Compound, BioAssay Top PubChem Depositors DiscoveryGate4608994 ZINC3813892 ChemDB3564938 Thomson Pharma2303628 ChemBridge433971 ChemBank413586 ChemIDplus383789 Asinex362469 DTP/NCI268696 Specs204658 DTP/NCI 173 NIH Chemical Genomics Center 60 Structural Genomics Consortium - Oxford 43 Scripps Research Institute 37 University of Pittsburg MLSC 33 Southern Research MLSC 29 San Diego Center for Chemical Genomics 22 BindingDB 20 Penn Center for Molecular Discovery 19 Emory MLSC; Vanderbilt MLSC 15 56 current depositors 22 current depositors

13 PubChem—Substance, Compound, BioAssay PC Substance Record Substance ID Compound ID Link to depositor Synonyms supplied by the depositor Identical substances

14 PubChem—Substance, Compound, BioAssay Redundancy in PC Substance 13 completely identical records for (-)epinephrine!

15 PubChem—Substance, Compound, BioAssay Non-uniformity in PC Substance

16 PubChem—Substance, Compound, BioAssay The Bizarre in PC Substance Grapefruit extract Chamomile tea Blood hydrolysate

17 PubChem—Substance, Compound, BioAssay PubChem Compound What we do: Standardize Structures Verify Chemical Data ◦ Atom description (label, element) ◦ Functional group clean-up ◦ Atom valence verification to prevent non-sense structures “Normalize” and “Standardize” ◦ Valence-Bond canonicalize (for Tautomer invariance) ◦ Aromaticity detection and self- consistency ◦ Stereochemistry detection ◦ Explicit hydrogen assignment Structural Representations ◦ 2D Coordinate generation ◦ Images created Structures that fail to standardize… ◦ Have no records in PC Compound ◦ Cannot be searched by structure

18 PubChem—Substance, Compound, BioAssay Compound Substance

19 PubChem—Substance, Compound, BioAssay Known stereochemistry Unknown stereo Unknown E/Z isomers Compound Substance

20 PubChem—Substance, Compound, BioAssay Stereoisomers in PC Compound No stereochemical assignment (+)epinephrine (-)epinephrine No stereochemistry is a stereochemical assignment in PubChem!

21 PubChem—Substance, Compound, BioAssay MeSH is NLM’s controlled vocabulary used for indexing articles for MEDLINE/PubMed. PubChem Compound Calculate Properties and Links Nomenclature ◦ IUPAC ◦ SMILES & SMARTS ◦ InChI Structural Information ◦ Calculate & store “Fingerprints” ◦ Calculate & link to similar structures (90% level) Physical Properties ◦ Molecular Formula ◦ Molecular Weight ◦ Number of H-bonds donor/acceptor sites ◦ XLogP value ◦ Lipinski value (bioavailability) ◦ Number of Rotatable bonds Links to NCBI Database Records ◦ Structures (MMDB records) ◦ Protein sequences (from Structure links) ◦ Genes (from Protein links) Links to MeSH Terms through IUPAC name What we do: Standardize Structures Verify Chemical Data ◦ Atom description (label, element) ◦ Functional group clean-up ◦ Atom valence verification to prevent non-sense structures “Normalize” and “Standardize” ◦ Valence-Bond canonicalize (for Tautomer invariance) ◦ Aromaticity detection and self- consistency ◦ Stereochemistry detection ◦ Explicit hydrogen assignment Structural Representations ◦ 2D Coordinate generation ◦ Images created Structures that fail to standardize… ◦ Have no records in PC Compound ◦ Cannot be searched by structure

22 PubChem—Substance, Compound, BioAssay PC Compound Record

23 PubChem—Substance, Compound, BioAssay MeSH Links

24 PubChem—Substance, Compound, BioAssay Calculated Properties Links for downloading or viewing the full record

25 PubChem—Substance, Compound, BioAssay Handling Mixtures Asmatane mist CID for the mixture Each standardized component has its own CID

26 PubChem—Substance, Compound, BioAssay PubChem Databases  Composed of Experimental data with Background, Protocols and Results for bioactivity screens of chemical substances described in PubChem Substance  Submitters add “Hard” links to PubChem Substance records and outside sources.  Composed of Substances which may be of known or unknown composition and also may contain a discrete compound or mixtures of compounds.  Submitters add “Hard” links to PubChem BioAssay records and outside sources.  Composed of discrete compounds with known chemical structure.  Summary reports about the known chemical compounds described in PubChem Substance.  Addition of Automated “Soft” Links which can be replicated on PubChem Substance & BioAssay records. Primary Databases: information is provided, updated and “owned” by Submitters. Derivative Database: information is provided, updated and “owned” by Submitters.

27 PubChem—Substance, Compound, BioAssay PC BioAssay Record

28 PubChem—Substance, Compound, BioAssay BioAssay Protocol Description of the BioAssay methods Listing of the data fields provided in the BioAssay

29 PubChem—Substance, Compound, BioAssay PubChem integration in Entrez Protein Sequences Literature VAST Structure Similarity Bioactivity Assay Results Small Molecule Structures 3D Structures Term Frequency Statistics Chemical Structure Similarity Activity Profile Similarity

30 PubChem—Substance, Compound, BioAssay What is Entrez?  System of 31 linked databases  Text search engine  Tool for finding biologically linked data  Data retrieval engine  Virtual workspace for manipulating large datasets  Free public access

31 PubChem—Substance, Compound, BioAssay The Entrez Databases

32 PubChem—Substance, Compound, BioAssay Text Queries in Entrez term1[limit] OP term2[limit] OP … limit = Entrez indexing field (organism, author, …) OP = Boolean operator = AND, OR, NOT where term1 term2 Complex queries: ((A[limit1] OR B[limit2]) AND C[limit3]) NOT D[limit4] 1:200[MW] Ranges: Phrases in quotes: “malic acid”[synonym]

33 PubChem—Substance, Compound, BioAssay 300:500[MW] AND “pcsubstance structure”[Filter] epinephrine[CompleteSynonym] ca[Element] AND chemidplus[SourceName] “lipinski”[Filter] AND “antineoplastic agents”[PharmAction] Sample Entrez Queries epinephrine[synonym] Find records that have synonyms containing “epinephrine” Find records that have a synonym that is exactly “epinephrine” Find records deposited by ChemIDPlus that contain calcium Find substances with molecular weights of 300-500 that are ligands in 3D protein structures Find antineoplastic agents that obey the Lipinski rule of 5

34 PubChem—Substance, Compound, BioAssay Entrez Limits 250 500 x x x

35 PubChem—Substance, Compound, BioAssay Details Entrez query equivalent to your selected Limits

36 PubChem—Substance, Compound, BioAssay Preview/Index CompleteSynonym epinephrine

37 PubChem—Substance, Compound, BioAssay Entrez History Query keys —E ntrez History keeps track of all of your searches— Your history is deleted only after 8 hours of inactivity. You can purposefully store searches for later use. You can “concatenate” searches with ANDs, ORs, NOTs by using the query keys: #2 NOT 1:1000[mw] #6 AND #4

38 PubChem—Substance, Compound, BioAssay Downloading Reports

39 PubChem—Substance, Compound, BioAssay Downloading Bulk Data Output is a temporary FTP file

40 PubChem—Substance, Compound, BioAssay Linking in Entrez Follow links to related data: Links Hard Links: Curated links based on biology nucleotide  taxonomy (based on organism identifier) protein  domain relatives (based on domain assignment) domains  pubmed (based on supporting literature) pcsubstance  structures/mmdb (based on source information ) Soft Links: Pre-computed analyses nucleotide  related sequences (BLAST neighbors) protein  conserved domains (CDD/RPS-BLAST search) pccompound  pccompound (structure-based neighboring)

41 PubChem—Substance, Compound, BioAssay PubChem Links

42 PubChem—Substance, Compound, BioAssay Linking in Bulk Will return the corresponding Compounds for all of these Substances

43 PubChem—Substance, Compound, BioAssay The PubChem FTP Site

44 PubChem—Substance, Compound, BioAssay NCBI Toolbox: In-house source code useful for incorporating NCBI-like functionality into their programs. Three main parts: Data Model, Data Encoding and Programming Libraries. Examples: BLAST, Cn3D, Sequin, Data format conversion scripts http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi Programming Tools http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html E-Utilities: Guidelines for Entrez “URL calls” used to access data. Designed for use in scripts. Examples: ESearch, EPost, ESummary, EFetch and ELink Caution: Overuse may result in blocked IPs!

45 PubChem—Substance, Compound, BioAssay PubChem Help

46 PubChem—Substance, Compound, BioAssay PubChem: Bird’s Eye View Depositors PubChem BioAssays PubChem Compound PubChem Substance Chemical Structure Similarity


Download ppt "PubChem—Substance, Compound, BioAssay Part 1: Essentials Principles of May 24, 2007."

Similar presentations


Ads by Google