Presentation is loading. Please wait.

Presentation is loading. Please wait.

PubChem—Substance, Compound, BioAssay Part 3: Essentials.

Similar presentations


Presentation on theme: "PubChem—Substance, Compound, BioAssay Part 3: Essentials."— Presentation transcript:

1 PubChem—Substance, Compound, BioAssay Part 3: Essentials

2 PubChem—Substance, Compound, BioAssay Global Entrez Search Page All[Filter]

3 PubChem—Substance, Compound, BioAssay Overall Goal: An on-line resource providing comprehensive information on the biological activities of small molecules

4 PubChem—Substance, Compound, BioAssay Why Are Small Molecules Important?  Constituents to all macromolecules (DNA, RNA, protein, carbohydrates, etc.)  Serve as cofactors and signaling molecules to thousands of proteins  The chemistry part of “biochemistry”  Most drug entities and drug types are small molecules  Most biomarkers used in clinical chemistry are small molecules

5 PubChem—Substance, Compound, BioAssay PubChem Databases and Tools: http://pubchem.ncbi.nlm.nih.gov/

6 PubChem—Substance, Compound, BioAssay Chemical Diversity Technology Development Screening Instrumentation Assay Development Predictive ADMET Compound Repository (MLSMR) Informatics Chem- informatics Research Centers The Molecular Libraries Roadmap: An Integrated Initiative Molecular Libraries Screening Centers Network ( M L S C N )

7 PubChem—Substance, Compound, BioAssay PubChem =  Repository for small molecules and bioactivity assay data  Part of Entrez search and linking system  Links to other NCBI databases, e.g., PubMed, MeSH Protein structures (MMDB) Protein/Nucleotide sequences (GenPept/GenBank)  Contains complete chemical structures  Standardized for uniformity  Small set of computed properties  Structure similarity searching

8 PubChem—Substance, Compound, BioAssay and more… Other Depositors to PubChem

9 PubChem—Substance, Compound, BioAssay PubChem: Bird’s Eye View Depositors PubChem BioAssays PubChem Compound PubChem Substance Chemical Structure Similarity

10 PubChem—Substance, Compound, BioAssay How does data get into PubChem?

11 PubChem—Substance, Compound, BioAssay PubChem integration in Entrez Protein Sequences Literature VAST Structure Similarity Bioactivity Assay Results Small Molecule Structures 3D Structures Term Frequency Statistics Chemical Structure Similarity Activity Profile Similarity

12 PubChem—Substance, Compound, BioAssay

13 Primary Database

14 PubChem—Substance, Compound, BioAssay Depositor Data No “Global” rules or standards –Based on organizational needs –Lots of data overlap –Often based on individual Scientist preferences PubChem accepts data from many organizations –Previously unseen data representation –Combinatorial explosion of ways for drawing the same structure

15 PubChem—Substance, Compound, BioAssay Redundancy, mixtures Mixture

16 PubChem—Substance, Compound, BioAssay Derivative Database

17 PubChem—Substance, Compound, BioAssay Chemical Structures may be represented in many different ways

18 PubChem—Substance, Compound, BioAssay Chemical Structures may be represented in many different ways

19 PubChem—Substance, Compound, BioAssay Compound Substance

20 PubChem—Substance, Compound, BioAssay Known stereochemistry Unknown stereo Unknown E/Z isomers Compound Substance

21 PubChem—Substance, Compound, BioAssay Most molecules come out right, even complex ones Vancomycin Need to fix heme bond orders Result Sometimes there is a need to fix problems, e.g. bond orders PDB lacks chemical detail –no bond order information –no hydrogens Substances (heterogens) from Protein 3D structures (PDB) Deposited structure receives –bond information –hydrogens –stereochemistry (where possible) Dopamine

22 PubChem—Substance, Compound, BioAssay PubChem Compound Processing Chemical Data Verification –Atom description (label, element?) –Functional group clean-up –Atom valence verification to prevent non-sense “Normalize” and “Standardize” –Valence-Bond canonicalize (for Tautomer invariance) –Aromaticity detection and self-consistency –Stereochemistry detection –Explicit hydrogen assignment Calculation –2-D Coordinate generation –Image Depictions –Fingerprints –IUPAC Name –SMILES, InChI, Hash Codes –xLogP, TPSA, HBD, HBA, MW, MF

23 PubChem—Substance, Compound, BioAssay Chemical Structure “Sanitization”  Chemical Structures that fail Sanitization  Are not part of the aggregated PubChem Compound Database  Still “searchable” via PubChem Substance Database  Keeps the PubChem Compound Database “Clean” for Chemical Informatic Analysis  Collapses structures represented in various ways into a uniform, identical representation

24 PubChem—Substance, Compound, BioAssay Compound for mixture Component compounds

25 PubChem—Substance, Compound, BioAssay Components of a mixture

26 PubChem—Substance, Compound, BioAssay Substance vs. Compound Substance summary Compound summary

27 PubChem—Substance, Compound, BioAssay Substance vs. Compound

28 PubChem—Substance, Compound, BioAssay "InChI=1/Ca.3H2O/h;3*1H2/q 2;;;/p-3/fCa.3HO/h;3*1h/qm;3*-1"[InChI]  200[MW]  300:500[MW]  “ dopamine”[CompleteSynonym]  “ pcsubstance structure"[Filter]  “ ca"[Element] AND 300:500[MW] AND "chemidplus"[SourceName]  "lipinski"[Filter] AND "antineoplastic agents"[PharmAction] Examples of queries Lipinski rule of 5 -- a molecule is likely to be bioactive if it has: not more than 5 hydrogen bond donors (OH and NH groups) <10 hydrogen bond acceptors (N or O) a molecular weight under 500 a LogP under 5

29 PubChem—Substance, Compound, BioAssay All [ALL] -- All of the following fields are searched; default search field. Uid[UID] -- The integer represents SID for PCSubstance database. By default, an integer without a field alias is recognized as a UID. Same as [SID]. Filter [Filter] -- Limits the records to various indexed filters. ActiveAid [AA] -- Active BioAssay identifier, integer. ActiveAidCount [AC, ACNT] -- # bioassays where tested active. AtomChiralCount [ACC, ACCNT] -- Total count of chiral atoms in a given compound. BioAssayID [BAID, AID] -- BioAssay identifier. BondChiralCount [BCC, BCCNT] –- Number of chiral bonds. Comment [CMT] -- Substance or bioassay comment. CompleteSynonym [CSYN, CSYNO] – exactly matching name for substance/compound. CompoundID [CID] -- Compound identifier, integer. DepositDate [DDAT, DEPDAT] -- Deposition timestamp for a substance. Element [ELMT, EL] -- Chemical element in a substance/compound. ExactMass [EMAS, EXMASS]-- The calculated mass of an ion or a molecule containing most likely isotopic composition for a single random molecule, corresponding to mass of most intense ion/molecule peak in a MS spec. A real number. HeavyAtomCount [HAC, HACNT] -- Atom count in a compound except hydrogen, integer. HydrogenBondAcceptorCount [HBAC, HBACNT] -- Hydrogen bond acceptors for a compound, integer. HydrogenBondDonorCount [HBDC, HBDCNT] -- Hydrogen bond donors for a compound, integer. InChI [inchi] -- IUPAC International Chemical Identifier. Examples of PubChem Index Fields …

30 PubChem—Substance, Compound, BioAssay IUPACName [UPAC, IUPAC] -- Standard IUPAC name for compound. MeSHDescription [MHD] MeSHTerm [MSHT, MESHT] -- Medical Subject Heading term. MeSHTreeNode [MSHN, MESHTN] -- Medical Subject Heading tree node (tree structures). MolecularWeight [MW, MWT, MOLWT] -- Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. E.g., Carbon has two natural isotopes 12 and 13 with relative abundances of 98.9% and 1.1% to yield an average mass of 12.011 g/mol. A real number. MonoisotopicMass [MMAS, MIMASS] -- Mass of a molecule calculated using the mass of the most abundant isotope of each element. E.g., Carbon has a monoisotopic mass of 12.000 g/mol. A real number. PharmAction [PHMA, PHARMA] -- MeSH pharmacological actions heading. RotatableBondCount [RBC, RBCNT] – Number of rotatable bonds. SourceCategory [SRCC, SRCCAT, SRCCATG] -- Depositor categories. SourceID [SRID, SRCID] -- Depositor's external id. SourceName [SRC, SRCNAM, SRCNAME] -- official depositor name. SubstanceID [SID] -- Substance ID. Same as [UID]. Synonym [SYNO] -- Synonyms for substance. TautomerCount [TC, TCNT, TTMC] -- Possible tautomer count for each given structure, ≤ 200. TotalFormalCharge [TFC, CHG, CHRG] -- Total formula charge. TPSA [TPSA] -- Topological Polar Surface Area. XLogP [XLGP, LOGP] Examples of PubChem Index Fields, contd.

31 PubChem—Substance, Compound, BioAssay Preview/Index Tab

32 PubChem—Substance, Compound, BioAssay History Tab Substances of MW 300-500Da having antineoplastic properties and obeying Lipinski rule of 5

33 PubChem—Substance, Compound, BioAssay Links For the whole set or only selected records

34 PubChem—Substance, Compound, BioAssay Property Report

35 PubChem—Substance, Compound, BioAssay SDF format

36 PubChem—Substance, Compound, BioAssay

37

38

39 Medical Subject Headings (MeSH)  MeSH is the National Library of Medicine's controlled vocabulary thesaurus.  Consists of sets of terms naming descriptors in a hierarchical and alphabetic structure, e.g.: "Mental Disorders”, “Pharmacological action”, “Catecholamine hormones”, etc.  Permits searching at various levels of specificity  MeSH thesaurus is used for indexing articles for the MEDLINE/PubMed database  MeSH is continually updated  PubChem assigns MeSH headings to Compound records

40 PubChem—Substance, Compound, BioAssay  Contains bioactivity screens of chemical substances described in PubChem Substance  Provides searchable descriptions of each bioassay, including descriptions of the conditions and readouts specific to a screening protocol  Depositor decides on data definitions and interpretation  Data can be plotted as graphs of statistical histograms  Cross-indexed to other Entrez databases Primary Database

41 PubChem—Substance, Compound, BioAssay

42

43

44

45

46

47 Click to view structure

48 PubChem—Substance, Compound, BioAssay

49 NCBI FTP >> PubChem Folder

50 PubChem—Substance, Compound, BioAssay Entrez PubChem: Help and Tabs

51 PubChem—Substance, Compound, BioAssay PubChem is part of NIH Molecular Libraries Roadmap for Medicine Initiative PubChem consists of 3 databases, Substance, Compound and BioAssay, and a poweful Structure Search engine Substance = samples; Compounds = calculated structures, properties PubChem is integrated into NCBI’s Entrez Search and Linking system of databases Records are indexed using number of terms Records are linked to each other and to other databases at NCBI Brief Summary

52 PubChem—Substance, Compound, BioAssay For More Information…

53 PubChem—Substance, Compound, BioAssay For More Information… General Helpinfo@ncbi.nlm.nih.gov BLASTblast-help@ncbi.nlm.nih.govblast-help@ncbi.nlm.nih.gov Telephone: Voice: +1 (301) 496-2475 Fax: +1 (301) 480-9241 E-mail addresses The (free!) NCBI Newsletter The NCBI Handbook http://www.ncbi.nih.gov/Education/index.html The NCBI Education Page http://www.ncbi.nih.gov/About/newsletter.html Follow the link from the NCBI Home Page


Download ppt "PubChem—Substance, Compound, BioAssay Part 3: Essentials."

Similar presentations


Ads by Google