Presentation is loading. Please wait.

Presentation is loading. Please wait.

The NIH Roadmap and PubChem Gary Wiggins I533 Spring 2006.

Similar presentations

Presentation on theme: "The NIH Roadmap and PubChem Gary Wiggins I533 Spring 2006."— Presentation transcript:

1 The NIH Roadmap and PubChem Gary Wiggins I533 Spring 2006

2 NIH Roadmap Series of initiatives designed to pursue major opportunities in biomedical research and gaps in current knowledge that cannot be addressed by any single NIH Institute or Center Goal: enable rapid transformation of new scientific knowledge into tangible benefits for public health

3 NIH Molecular Libraries and Imaging Initiative Part of the New Pathways to Discovery area Goal: augment the toolbox for understanding the functionally interconnected molecular events that maintain health and lead to disease Build on high-throughput, highly specific, mechanism-based biological assays Aims to develop and discover small molecules that hold promise as research tools to probe cellular physiology and pathophysiology

4 NIH Molecular Imaging Roadmap High specificity/high sensitivity molecular imaging probes Molecular imaging and contrast database Imaging probe development center

5 NIH Roadmap Molecular Libraries Initiative (MLI) A series of integrated research programs with the goal of making small molecule screening and screening data more widely available to the research community s/index.asp s/index.asp

6 MLI Aims Go beyond the identification of compounds with potential therapeutic properties Will result in the identification of compounds to use as probes to study cellular processes in health and disease Biological screening data, assay protocols, and chemical structures for compounds to be publicly available in PubChem

7 NIH MLI Components Molecular Libraries Screening Center Network (MLSCN) Cheminformatics (centered around PubChem) Technology development

8 NIH MLI Technology Development Areas Chemical diversity –Pilot-scale libraries for investigation of novel chemical diversity space –Novel methods for natural product chemistry Development of assays Novel instrumentation and detection technologies for high throughput screening Datasets and algorithms for better prediction of absorption, distribution, metabolism, excretion, and toxicity properties of small molecules

9 Assay Guidance Manual Originally written as a guide for therapeutic projects teams within Eli Lilly; covers: –Identifying potential assay formats compatible with High Throughput Screen (HTS) and Structure Activity Relationship (SAR) –Developing optimal assay reagents –Optimizing assay protocol with respect to sensitivity, dynamic range, signal intensity and stability –Adaptation of the assay to the microtiter plate formats –Validation of the assay performance –Orthogonal follow-up assays for chemical probe validation and refinement

10 NIH Molecular Libraries Small Molecule Repository Run under contract by Discovery Partners International Collects samples for high throughput biological screening and distributes them to the NIH Molecular Libraries Screening Center Network R_HomePage/ R_HomePage/

11 Roadmap MLI Funded Areas Molecular Libraries Screening Centers (MLSCN) –Ten of them at academic institutions –NIH Chemical Genomics Center s/fundedresearch.asp s/fundedresearch.asp

12 Roadmap MLI Funded Areas Submitting assays for HTS in the MLSCN –28 different submissions Pilot-scale libraries for HTS (8) New methodologies for natural product chemistry (6) Assay development for HT molecular Screening (39) Molecular libraries screening instrumentation (4)

13 Roadmap MLI Funded Areas Novel preclinical tools for predictive ADME-Toxicology (5) Innovation in molecular imaging probes (11) Development of high-resolution probes for cellular imaging (9)

14 Roadmap MLI Funded Areas Exploratory Centers for Cheminformatics Research at: –Indiana University –University of Michigan –Rensselaer Polytechnic Institute –MIT –North Carolina State University, Raleigh –University of North Carolina, Chapel Hill

15 IU Projects Underway Innovative cross-screen analysis of NIH Developmental Therapeutics Project Human Tumor Cell Line data Development of cheminformatics web services and use cases in Taverna Development of a novel interface for the analysis of PubChem HTS data A structure storage and searching system for Distributed Drug Discovery Quantum chemical computer simulations database Training modules for cheminformatics instruction on the Web Web guide for essential cheminformatics resources ( Design of a grid-based distributed data architecture for chemistry

16 NIH NCI Developmental Therapeutics Program The NCI has been collecting and testing compounds for 50 years. For about 30 years this has been managed by the Developmental Therapeutics Program (DTP). From 1955 to 1985 the primary test was to look for increase in survival of mice bearing transplantable tumors. In 1990, the primary screen switched to looking for inhibition of growth of 60 human tumor cell lines in culture. DTP also ran the anti-HIV screen for about 10 years and managed the yeast anti-cancer screen in which compounds were tested for their ability to inhibit the growth of yeast strains with defined mutations in cell cycle genes. These assays provide the bulk of the data DTP makes publicly available.

17 NIH NCI DTP DTPs correlation analyses allow one to associate a list of genes with a given compound or vice versa Want to get workflows running that integrate chemical structure data with the gene expression and sequence data in the bioinformatics world Need help in the practical details of creating web services that will work in the mygrid/Taverna (or equivalent) framework

18 NIH DTP Data CompoundsData Points Chemical Structures ~265, Cell Assay~43,000~12,000,000 Anti-HIV Assay~45,000~90,000 Yeast Assay~110,000~600,000 in vivo Antitumor ~120,000~1,100,000

19 NCI Panel of 60 Human Cell Cancer Lines Protein levels RNA measurements Mutation status Enzyme activity levels

20 NIH DTPs COMPARE Program The pattern of activity across all 60 cell lines that a compound exhibits is related to the mechanism of action –Can be used to discover the mechanism of a compounds actions by looking at which compounds of known activity are correlated with the unknown –Has been used to discover novel compounds with a given activity by testing the top correlating compounds to a compound with the activity of interest –Used to prioritize compounds that seem to have a novel mechanism –Calculates a correlation coefficient between two vectors in 60-dimensional space

21 NIH DTP Given a compound tested in the 60 cell assay, one can look for the genes whose expression most highly correlates with the ability of the compound to inhibit cell growth. Conversely, given a gene, one can look for compounds whose ability to inhibit cell growth is most highly correlated with the expression of that gene.

22 NIH DTP Needs Grid Web services Visualization – may use VOTables Tools to squish a set of points in a large dimensional space down into 2D or 3D while attempting to preserve the relative distances –Looking at the nearest neighbors of the point of interest with such a map could reveal relations that would be missed in just a table listed by distance

23 NIH DTP Main Search Page

24 High-Throughput Screening (HTS) the integration of biological, chemical and clinical data automated & standardized statistical analysis of large and complex data volumes biological and chemical profiling by use of statistical analyses on combined data from screening, pharmacological profiling, and structural properties

25 Other Potential Partners Center for Chemical Genomics at the University of Michigan – Milos Novotny (IUB Chemistry): $3.5 million National Center for Research Resources (NIH) grant to conduct research in the analysis of glycoproteins David Flockhart (IUB School of Medicine): Cytochrome P450 database

26 PubChem 5,298,729 compounds as of 1/16/2006 the place to go for biological and related data the central depository of all information related to the NIH Roadmap project expected that the actual data will reside there, and only some things may be held elsewhere, with PubChem acting as a pointer –May even have the images from screens and assays chemical structures from Elsevier's xPharm database

27 PubChem Data (as of 10/25/2005) Bioassays deposited 177 Bioassay test results 3,158,669 Substances deposited 7,848,390 Unique Substances 5,269,228

28 PubChem Technical Details Entrez database system –For all textual information in the database NCBI Toolkit - an open-source infrastructure toolkit OpenEye OEChem toolkit and associated software –for most structure standardization tasks, plus some structure identifier computations like SMILES and IUPAC name generation. NIST InChI library –for computing the InChI identifier CACTVS Chemoinformatics Toolkit –for structure depictions, structure database system, structure query execution, structure deduplication, some property calculations and the WWW structure and image editors Various general low-level support libraries, e.g., – zlib, png, gd and freetype libraries In-house code –for the queuing system, deposition system, display CGIs, structure standardization set-up, update scripts, etc.

29 PubChem Database Display and Query Subsystems - 1 A special Entrez version –stores textual and numerical data –hosted on a MS SQL Server relational database cluster –holds precomputed structure images for display, ASN.1 structure data blobs for download, and extensive crosslinking functions for linking to other NCBI databases

30 PubChem Display and Query Subsystems - 2 structure search component –based on the CACTVS structure search system –pseudo-relational in nature (the underlying storage manager is the Sleepycat BDB database manager) –hosted on a Linux server cluster –structure search file is not stored in the SQL database, but there is an automatic synchronization and update mechanism –Some data, such as Lipinski filter criteria, are stored in both databases

31 PubChem Programming Utilities Entrez Programming Utilities – _help.html _help.html CACTVS chemoinformatics toolkit –a full ASN.1 parser for CACTVS understands the full data spec for structures and assay data –modules for talking to the Entrez database for accessing structure blobs and some other NCBI systems

32 PubChem Data Deposition PubChem Deposition Gateway

33 PubChem Sketcher No need to worry about the type of structure definition displayed in the top line uses a hidden internal representation to transfer the information

34 InChI, The IUPAC International Chemical Identifier Official site: Unofficial InChI FAQ: WSDL InChI server at sphere sphere

35 Searching InChIs Sample search: InChI=1/C17H14O4S/c1-22(19,20) ( ) (18)16(15) /h2-10H,11H2,1H3 Must include the quotation marks no carriage return or line feed in the string InChI code for C60 fullerene: –InChI=1/C60/c (1) (1)9-11-7(2) (5) (6)22-18(8)28-20(12) (10)15(9) (11)27(17) (21)33(23) (24)32(22)42- 38(28)48-40(30)46-36(26)35(25)45-39(29)47(37)55- 49(41)51(43)57-52(44)50(42)56(48)59- 54(46)53(45)58(55)60(57)59

36 ACD Labs and InChIs Transferring structures from PubChem to ACD/ChemSketch es/90/draw_db/pubchem.pdf es/90/draw_db/pubchem.pdf

37 InChI Support in BKChem BKchem - a free chemical drawing program Successfully reads most InChIs

38 InChI PubChem sketcher also supports generation of InChI strings –change the format selector to "InChI"

39 Protein Data Bank (PDB) Data Dictionaries develop software and data definitions to support the structural genomics efforts enable high-throughput data deposition data dictionaries define items at the level of detail of the materials and methods section of a journal uses macromolecular Crystallographic Information File (mmCIF) data dictionaries

40 Translate WSDL to Human Readable Form

Download ppt "The NIH Roadmap and PubChem Gary Wiggins I533 Spring 2006."

Similar presentations

Ads by Google