Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University Yan He (SLIS MLS Student) Meredith.

Similar presentations


Presentation on theme: "Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University Yan He (SLIS MLS Student) Meredith."— Presentation transcript:

1 Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University Yan He (SLIS MLS Student) Meredith Saba (SLIS MLS Student)

2 Provocative Thought While much bioscience is published with the knowledge that machines will be expected to understand at least part of it, almost all chemistry is published purely for humans to read. Murray-Rust et al. Org. Biomol. Chem. 2004, 2, Murray-Rust et al. Org. Biomol. Chem. 2004, 2, 3201.

3 Overview of the Talk Review of ACS CINF 2004 Papers Review of ACS CINF 2004 Papers Review of Relevant Articles Review of Relevant Articles Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Overview of Web Services Overview of Web Services NIH-funded Projects Underway or Planned at Indiana University NIH-funded Projects Underway or Planned at Indiana University

4 The Bigger Picture Linking Bioinformatics to Cheminformatics American Chemical Society Division of Chemical Information (CINF) Symposium, Anaheim, Spring 2004 American Chemical Society Division of Chemical Information (CINF) Symposium, Anaheim, Spring 2004 All-day session with 16 papers All-day session with 16 papers 7nm/227cinfabstracts.htm 7nm/227cinfabstracts.htm 7nm/227cinfabstracts.htm 7nm/227cinfabstracts.htm

5 Problems from ACS CINF 2004 Both technical and people factors hinder knowledge exchange between biology and chemistry. (Lipinski) Both technical and people factors hinder knowledge exchange between biology and chemistry. (Lipinski) People Problems per Chris Lipinski People Problems per Chris Lipinski Meta data capture is complicated by people issues, particularly those between chemists and biologists. Meta data capture is complicated by people issues, particularly those between chemists and biologists. Discipline-based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity. Discipline-based disconnects occur distressingly often and are frequently overlooked as a cause of lost productivity.

6 Interdisciplinary Collaborations: Biology and Chemistry [Whats]... important for these collaborations is, not only do you have to accept the other guys paradigm or at least live with it; you have to be willing to accept the other guys foibles or your perception of the other guys foibles (and recognize the opposite of this). We each have our own approaches to how we do science, and its just different cultures. [Whats]... important for these collaborations is, not only do you have to accept the other guys paradigm or at least live with it; you have to be willing to accept the other guys foibles or your perception of the other guys foibles (and recognize the opposite of this). We each have our own approaches to how we do science, and its just different cultures. --Thom Kauffman interview in ACS LiveWire, March 2005,

7 Some Questions from the ACS CINF 2004 Symposium "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures. "Find all proteins related to protein A (i.e. within a given path length of A) in a protein interaction graph, and retrieve related assay results and compound structures. Find all pathways where compound X inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction. Find all pathways where compound X inhibits or slows a reaction, and retrieve Gene Ontology classifications for all proteins involved in the reaction.

8 Problems from ACS CINF 2004 Commercial vs. public data Commercial vs. public data Batch mode data processing possible in biology, but primitive in chemistry Batch mode data processing possible in biology, but primitive in chemistry Primary HTS data has a very high noise factor Primary HTS data has a very high noise factor Data format standardization problem Data format standardization problem Chemoinformatics and bioinformatics use completely different data formats and analysis tools Chemoinformatics and bioinformatics use completely different data formats and analysis tools Chemical and protein sequence information has been largely analyzed separately Chemical and protein sequence information has been largely analyzed separately

9 Solutions from ACS CINF 2004 Linking biological and chemical information in computational approaches to predict biological activity, ADME profiles, and adverse drug reactions (ADR) Linking biological and chemical information in computational approaches to predict biological activity, ADME profiles, and adverse drug reactions (ADR) Energetics of binding for more accurate and sensitive chemical representation of DNA- protein interactions Energetics of binding for more accurate and sensitive chemical representation of DNA- protein interactions A discovery informatics platform that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data A discovery informatics platform that facilitates archival, sharing, integration, and exploration of synthetic methods and biological activity data

10 Solutions from ACS CINF 2004 Data pipelining approach makes it possible to apply bioinformatics and chemoinformatics data and analyses together. Data pipelining approach makes it possible to apply bioinformatics and chemoinformatics data and analyses together. Visualizations are the best way for people to understand data. Visualizations are the best way for people to understand data.

11 Solutions from ACS CINF 2004 Cabinet (Chemical And Biological Information NETwork, formerly Fedora) servers include Cabinet (Chemical And Biological Information NETwork, formerly Fedora) servers include Metabolic pathway network chart (Empath) Metabolic pathway network chart (Empath) Protein-Ligand Association Network (Planet) Protein-Ligand Association Network (Planet) Enzyme Commission Codebook (EC Book) Enzyme Commission Codebook (EC Book) Traditional Chinese Medicines (TCM) Traditional Chinese Medicines (TCM) World Drug Index (WDI), and others. World Drug Index (WDI), and others. Built on the Daylight HTTP toolkit Built on the Daylight HTTP toolkit ml ml ml ml

12 Overview of the Talk Review of ACS CINF 2004 Papers Review of ACS CINF 2004 Papers Review of Relevant Articles Review of Relevant Articles Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Overview of Web Services Overview of Web Services NIH-funded Projects Underway or Planned at Indiana University NIH-funded Projects Underway or Planned at Indiana University

13 What is Chemoinformatics? (Brown) …the essence of chemoinformatics is integration and focus rather than its components, which are independent disciplines. …the essence of chemoinformatics is integration and focus rather than its components, which are independent disciplines. Supporting disciplines: Supporting disciplines: Chemical information Chemical information Computational chemistry Computational chemistry Chemometrics Chemometrics

14 Chemoinformatics and Disease

15 Toolkits as Integrators (Brown) Companies such as Daylight, Advanced Visual Systems, OpenEye, and SciTegic provide integration systems for: Companies such as Daylight, Advanced Visual Systems, OpenEye, and SciTegic provide integration systems for: Statistical methods Statistical methods Text mining Text mining Computational chemistry Computational chemistry Visualization Visualization

16 Genegos MetaDrug Product Toxicogenomics platform for the prediction of human drug metabolism and toxicity of novel compounds Toxicogenomics platform for the prediction of human drug metabolism and toxicity of novel compounds Enables the visualization of pre-clinical and clinical high-throughput data in the context of the complete biological system Enables the visualization of pre-clinical and clinical high-throughput data in the context of the complete biological system Integrates chemical, biological, and protein function data Integrates chemical, biological, and protein function data

17 BioWisdom Examination of vast amounts of available information using its Sofia KnowledgeScan methodology Examination of vast amounts of available information using its Sofia KnowledgeScan methodology SRS data integration platform SRS data integration platform

18 Lessons from Hip Hop (Salamone) Mashup technique Mashup technique Bring together disparate informatics, biological, chemical, and imaging information when conducting research Bring together disparate informatics, biological, chemical, and imaging information when conducting research Example of an integration tool: iSpecies.org Example of an integration tool: iSpecies.org A search for a species returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar A search for a species returns a page with NCBI genomics information, Yahoo images of the species, and articles culled from Google Scholar

19 iSpecies.org Search For mus musculus For mus musculus

20 Chemogenomics and Chemoproteomics (Gagna) Chemogenomics (def.)The description of all potential drugs that can be used against all possible target sites, OR the actions of target- specific chemical ligands and how they are used to globally examine genes Chemogenomics (def.)The description of all potential drugs that can be used against all possible target sites, OR the actions of target- specific chemical ligands and how they are used to globally examine genes Chemoproteomics (def.)Uses chemistry to characterize protein structure and functions Chemoproteomics (def.)Uses chemistry to characterize protein structure and functions They are... a form of chemical biology brought up to date in the area of genome and proteome analysis. They are... a form of chemical biology brought up to date in the area of genome and proteome analysis.

21 New Interdisciplinary Journals ACS Chemical Biology (ACS) ACS Chemical Biology (ACS) ChemBioChem; A European Journal of Chemical Biology (Wiley/VCH) ChemBioChem; A European Journal of Chemical Biology (Wiley/VCH) Chemical Biology and Drug Design (Blackwell) Chemical Biology and Drug Design (Blackwell) JBIC; Journal of Biological and Inorganic Chemistry (Springer) JBIC; Journal of Biological and Inorganic Chemistry (Springer) Journal of Biochemical and Molecular Toxicology (Wiley) Journal of Biochemical and Molecular Toxicology (Wiley) Molecular Biosystems (RSC) Molecular Biosystems (RSC) Nature Chemical Biology (Nature Publishing) Nature Chemical Biology (Nature Publishing) Organic & Biomolecular Chemistry (RSC) Organic & Biomolecular Chemistry (RSC)

22 Open Source Software (Geldenhuys) Log P calculator from Interactive Analysis Log P calculator from Interactive Analysis University of Utahs Computational Science and Engineering Online University of Utahs Computational Science and Engineering Online Can submit jobs for molecular mechanics, quantum chemical calculations, and biomolecular interfaces for viewing PDB files Can submit jobs for molecular mechanics, quantum chemical calculations, and biomolecular interfaces for viewing PDB files Virtual Computational Chemistry Laboratory Virtual Computational Chemistry Laboratory

23 The Blue Obelisk (Guha) Several open chemistry and chemoinformatics projects that have pooled forces to enhance interoperability Several open chemistry and chemoinformatics projects that have pooled forces to enhance interoperability Maintain: Maintain: Chemoinformatics Algorithms Dictionary Chemoinformatics Algorithms Dictionary Data Repository for standardized data for chemical properties and other facts (e.g., mass) Data Repository for standardized data for chemical properties and other facts (e.g., mass)

24 BlueObelisk.org Working collaboratively on projects such as: Working collaboratively on projects such as: Chemistry Development Kit (CDK) Chemistry Development Kit (CDK) JChemPaint JChemPaint Jmol Jmol JUMBO JUMBO NMRShiftDB NMRShiftDB Octet Octet Open Babel Open Babel QSAR QSAR World Wide Molecular Matrix (WWMM) World Wide Molecular Matrix (WWMM)

25 Barriers to the Use of Open Source Software Unix command line Unix command line Problem: Lack of known standards and datasets of compounds for validation, e.g., in docking programs Problem: Lack of known standards and datasets of compounds for validation, e.g., in docking programs

26 Lessons from the Human Genome Project (Austin) Keys to success in the HGP were: Keys to success in the HGP were: Comprehensiveness Comprehensiveness Commitment to open access to the sequence as a research tool without encumbrance Commitment to open access to the sequence as a research tool without encumbrance Proposed tools for a genome functionation toolbox: Proposed tools for a genome functionation toolbox: Whole-genome transcriptome and proteome characterization Whole-genome transcriptome and proteome characterization Development of small inhibitory RNAs (siRNAs) and knockout mice for every gene Development of small inhibitory RNAs (siRNAs) and knockout mice for every gene Small molecules and the druggable genome Small molecules and the druggable genome

27 ChemDB

28 ChEBI, Chemical Entities of Biological Interest Dictionary of molecular entities focused on small chemical compounds Dictionary of molecular entities focused on small chemical compounds Features an ontological classification, showing the relationships between molecular entities or classes of entities and their parents and/or children Features an ontological classification, showing the relationships between molecular entities or classes of entities and their parents and/or children

29 Vioxx Entry in ChEBI

30 The IUPAC International Chemical Identifier (InChI) Open source, non-proprietary, public-domain identifier for chemicals Open source, non-proprietary, public-domain identifier for chemicals String of characters that uniquely represent a molecular substance String of characters that uniquely represent a molecular substance Independent of the way the chemical structure is drawn Independent of the way the chemical structure is drawn Enables reliable structure recognition and easy linking of diverse data compilations Enables reliable structure recognition and easy linking of diverse data compilations Accepts as input MOLfiles (or SDfiles) and CML files Accepts as input MOLfiles (or SDfiles) and CML files Download the program to your computer at: Download the program to your computer at:

31 Generation of InChI for Vioxx with wInChI

32 Vioxx Entry in PubChem Compounds Found with InChI

33 Vioxx Bioassay Data in PubChem

34 Vioxx PubChem Link to External Sources of Information

35 The Elsevier MDL/NIH Link via PubChem and DiscoveryGate Cross-indexes PubChem to the Compound Index hosted on Elsevier MDLs DiscoveryGate platform Cross-indexes PubChem to the Compound Index hosted on Elsevier MDLs DiscoveryGate platform MDL added 5 million structures from PubChem to their index, resulting in over 14 million unique chemical structures MDL added 5 million structures from PubChem to their index, resulting in over 14 million unique chemical structures Links go both ways Links go both ways Can move from biological data in PubChem to bioactivity, chemical sourcing, synthetic methodology, and EHS data in DiscoveryGate sources Can move from biological data in PubChem to bioactivity, chemical sourcing, synthetic methodology, and EHS data in DiscoveryGate sources

36 Elsevier MDLs xPharm Comprehensive set of records linking: Comprehensive set of records linking: Agents (compounds) (2300) Agents (compounds) (2300) Targets (600) Targets (600) Disorders (450) Disorders (450) Principles that govern their interactions (180) Principles that govern their interactions (180) Answers questions such as: Answers questions such as: What targets are associated with control of blood pressure?What targets are associated with control of blood pressure? What adverse effects are associated with monoamine oxidase inhibitors?What adverse effects are associated with monoamine oxidase inhibitors?

37 Text Datamining (Banville) In the pharmaceutical field, it is ideally the marriage of biological and chemical information that needs to be the ultimate focus of text data mining applications. In the pharmaceutical field, it is ideally the marriage of biological and chemical information that needs to be the ultimate focus of text data mining applications. Problems: Problems: Lack of universal publication standards for identifying each unique chemical entity Lack of universal publication standards for identifying each unique chemical entity Selective indexing policies of A&I services Selective indexing policies of A&I services Need to understand how chemical structures link to biological processes Need to understand how chemical structures link to biological processes

38 Chemical Datamining Software SureChem SureChem CLiDE CLiDE Recognizes structures, reactions, and text Recognizes structures, reactions, and text OSCAR OSCAR OSCAR1 to check experimental data OSCAR1 to check experimental data xperimentalDataChecker/http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/E xperimentalDataChecker/http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/E xperimentalDataChecker/http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/E xperimentalDataChecker/ CSR (Chemical Structure Reconstruction) CSR (Chemical Structure Reconstruction) MDL DocSearchcombines MDLs Isentris platform and EMCs Documentum MDL DocSearchcombines MDLs Isentris platform and EMCs Documentum

39 Overview of the Talk Review of ACS CINF 2004 Papers Review of ACS CINF 2004 Papers Review of Relevant Articles Review of Relevant Articles Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Overview of Web Services Overview of Web Services NIH-funded Projects Underway or Planned at Indiana University NIH-funded Projects Underway or Planned at Indiana University

40 Themes from SwissProts 20 th Anniversary Conference, In silico Analysis of Proteins Knowledgebases, databases and other information resources for proteins Knowledgebases, databases and other information resources for proteins Sequence searches and alignments Sequence searches and alignments Protein sequence analysis Protein sequence analysis Protein structure prediction, analysis and visualization Protein structure prediction, analysis and visualization Proteomics data analysis Proteomics data analysis

41 Chemoinformatics Databases (Jónsdóttir) Lists databases relevant to drug discovery and development, including: Lists databases relevant to drug discovery and development, including: General databases General databases DBs for screening compounds DBs for screening compounds DBs for medicinal agents DBs for medicinal agents DBs with ADMET properties DBs with ADMET properties DBs with physico-chemical properties DBs with physico-chemical properties Curiously does not mention Chemical Abstracts Curiously does not mention Chemical Abstracts

42 Databases with Protein and Ligand Information (Jónsdóttir) Protein Data Bank Protein Data Bank Target Registration Database Target Registration Database Relibaseuses structural info to analyze protein-ligand interactions; Relibase+ for protein-protein interaction searching Relibaseuses structural info to analyze protein-ligand interactions; Relibase+ for protein-protein interaction searching Cambridge Structural Database Cambridge Structural Database KEGG LIGAND DB for enzyme reactions KEGG LIGAND DB for enzyme reactions

43 Other Databases with Protein and Ligand Information SitesBase--a database of known ligand binding sites within the PDB SitesBase--a database of known ligand binding sites within the PDB html html html html Binding MOAD Binding MOAD sc-PDB (Kellenberger) sc-PDB (Kellenberger) strasbg.fr:8080/scPDB/index.jsp strasbg.fr:8080/scPDB/index.jsp strasbg.fr:8080/scPDB/index.jsp strasbg.fr:8080/scPDB/index.jsp

44 sc-PDB

45 Isatin Search on sc-PDB

46 Other Databases with Protein- Protein Interaction Data (Jónsdóttir) YPD, Yeast Proteome Database (for proteins from S. cerevisiae) YPD, Yeast Proteome Database (for proteins from S. cerevisiae) Human Protein Reference Database Human Protein Reference Database BIND, Biomolecular Interaction Network Database (ceased as of 11/16/2005?) BIND, Biomolecular Interaction Network Database (ceased as of 11/16/2005?)

47 International Molecular Exchange (IMEx) Consortium BIND (http://www.blueprint.org) The Blueprint Initiative AsiaPte. Ltd, Singapore and The Blueprint Initiative North America,Toronto Canadahttp://www.blueprint.org DIP (http://dip.doe-mbi.ucla.edu) UCLA-DOE Institute for Genomics & Proteomics IntAct (http://www.ebi.ac.uk/intact), EMBL–European Bioinformatics Institute, Hinxton, UK;http://www.ebi.ac.uk/intact MINT (http://mint.bio.uniroma2.it/mint/) University of Rome Tor Vergata, Rome Italyhttp://mint.bio.uniroma2.it/mint/ MPact (http://mips.gsf.de/genre/proj/mpact), MIPS / Institute for Bioinformatics, Munich, Germany.http://mips.gsf.de/genre/proj/mpact

48 Protein Sites from IU I533 Students and others LigandDepotintegrated source for small molecules LigandDepotintegrated source for small molecules PSIPRED Protein Structure Prediction Server PSIPRED Protein Structure Prediction Server DSSP--a database of secondary structure assignments (and much more) for all protein entries in the PDB DSSP--a database of secondary structure assignments (and much more) for all protein entries in the PDB Dr. Predrag Radivojacs I690 class on Structural Bioinformatics Dr. Predrag Radivojacs I690 class on Structural Bioinformatics springi690.htm springi690.htm springi690.htm springi690.htm

49 Protein Secondary Structure Prediction Methods Methods Neural Network Neural Network Rule Based Rule Based Other Machine Learning Other Machine Learning Homology Based Homology Based

50 Protein Secondary Structure Prediction Software PredictProtein PredictProtein Chou-Fasman NN Predict NN Predict html html html html

51 Structure-Based Docking Methods Method Method Scans many small molecules and docks them to a site of interest on a protein structure Scans many small molecules and docks them to a site of interest on a protein structure Predicts free energy of binding Predicts free energy of binding Filters thousands of compounds relatively quickly Filters thousands of compounds relatively quickly Top hits can be used for more rigorous computational/experimental characterization and optimization Top hits can be used for more rigorous computational/experimental characterization and optimization

52 Structure-Based Docking Methods DOCK DOCK Accelryss Insight (built on DOCK) Accelryss Insight (built on DOCK) FlexX FlexX Glide Glide hp?mID=6&sID=6 hp?mID=6&sID=6 hp?mID=6&sID=6 hp?mID=6&sID=6 GOLD GOLD /gold/ /gold/ /gold/ /gold/

53 Useful Structure Databases ModBase ModBase new/search_form.cgi new/search_form.cgi new/search_form.cgi new/search_form.cgi Dali Database (Fold classification; based on PDB) Dali Database (Fold classification; based on PDB) Protein Structure Analysis, Comparison, &/or Classification [Guide] Protein Structure Analysis, Comparison, &/or Classification [Guide]

54 SCOP, Structural Classification of Proteins Curated database of structural and evolutionary relationships Curated database of structural and evolutionary relationships All known protein folds (v. 1.69, July 2005) All known protein folds (v. 1.69, July 2005) 70,859 domains organized into 2,845 families, 1,539 superfamilies, and 945 folds70,859 domains organized into 2,845 families, 1,539 superfamilies, and 945 folds Detailed information about close relatives Detailed information about close relatives Links to coordinates, images of structures, interactive viewers, and literature references Links to coordinates, images of structures, interactive viewers, and literature references

55 SCOP Search Options Homology search yields a list of structures with significant levels of sequence similarity Homology search yields a list of structures with significant levels of sequence similarity Keyword search matches words in SCOP and PDB Keyword search matches words in SCOP and PDB

56 CATH Protein Structure Classification Like SCOP, structured hierarchically by: Like SCOP, structured hierarchically by: Class (determined by secondary structure) Class (determined by secondary structure) Architecture (overall shape, e.g., barrel, sandwich, roll, etc.) – no equivalent in SCOP Architecture (overall shape, e.g., barrel, sandwich, roll, etc.) – no equivalent in SCOP Topology (grouped into fold families based on overall shape and connectivity of secondary structures) Topology (grouped into fold families based on overall shape and connectivity of secondary structures) Homologous Superfamily (domains thought to share a common ancestor) Homologous Superfamily (domains thought to share a common ancestor) As of January 2005, had 43,229 domains classified into 1,467 superfamilies and 5,107 sequence families; A protein family database (CATH-PFDB) contained a total of 616,470 domain sequences classified into 23,876 sequence families As of January 2005, had 43,229 domains classified into 1,467 superfamilies and 5,107 sequence families; A protein family database (CATH-PFDB) contained a total of 616,470 domain sequences classified into 23,876 sequence families

57 CATH Search Options Can browse or search the classification by CATH code Can browse or search the classification by CATH code CATH codes can be used to search other databases, e.g., DHS, Gene3D, and Impala CATH codes can be used to search other databases, e.g., DHS, Gene3D, and Impala

58 Gasteigers Biochemical Pathways Database Database of biochemical pathways that represents chemical structures and reactions on the atomic level Database of biochemical pathways that represents chemical structures and reactions on the atomic level Gives access to each atom and bond of the substrates of enzyme reactions Gives access to each atom and bond of the substrates of enzyme reactions Allows the study of transition state hypotheses of enzyme reactions Allows the study of transition state hypotheses of enzyme reactions Analysis of the physicochemical effects operating at the reaction site allows a classification of enzyme reactions that goes beyond the traditional EC code for enzymes. Analysis of the physicochemical effects operating at the reaction site allows a classification of enzyme reactions that goes beyond the traditional EC code for enzymes biochemical molecules and 2175 reactions 1533 biochemical molecules and 2175 reactions erlangen.de/services/biopath/index.html erlangen.de/services/biopath/index.html erlangen.de/services/biopath/index.html erlangen.de/services/biopath/index.html

59 A Gene Expression Database for NCI60 (Scherf) Published in Nature Genetics, 2000 Published in Nature Genetics, 2000 First study to integrate gene expression with molecular pharmacology databases First study to integrate gene expression with molecular pharmacology databases Gene expression profiles for NCI60 assessed using microarray technology Gene expression profiles for NCI60 assessed using microarray technology Gene-drug relationships investigated by how the gene transcription levels vary with respect to drug activities Gene-drug relationships investigated by how the gene transcription levels vary with respect to drug activities

60 Correlation Matrix Between Drug Activity and Gene Expression

61 Other Relevant Databases/Servers Each year Nucleic Acids Research publishes a Database Issue in January and a Web Server Issue in July (See refs in Bibliography section). Examples from the most recent issues: Each year Nucleic Acids Research publishes a Database Issue in January and a Web Server Issue in July (See refs in Bibliography section). Examples from the most recent issues: DatabasesServers KEGGBASys PDBBRIDGEP PINTSCRATCH MutDBGlyprot GLIDAI2I-SiteEng DrugBankPatchDock SPACE SymmDock DeNovoID

62 Overview of the Talk Review of ACS CINF 2004 Papers Review of ACS CINF 2004 Papers Review of Relevant Articles Review of Relevant Articles Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Overview of Web Services Overview of Web Services NIH-funded Projects Underway or Planned at Indiana University NIH-funded Projects Underway or Planned at Indiana University

63 Web Services Overview What are Web Services? What are Web Services? A distributed invocation system built on Grid computing A distributed invocation system built on Grid computing Independent of platform and programming languageIndependent of platform and programming language Built on existing Web standardsBuilt on existing Web standards A service oriented architecture with A service oriented architecture with Interfaces based on Internet protocolsInterfaces based on Internet protocols Messages in XML (except for binary data attachments)Messages in XML (except for binary data attachments)

64 Service-Oriented Architecture From Curcin et al. DDT, 2005, 10(12),867 From Curcin et al. DDT, 2005, 10(12),867

65 Web Services for Chemistry: Problems Performance and scalability Performance and scalability Proprietary data Proprietary data Competition from high-performance desktop applications Competition from high-performance desktop applications -- Geoff Hutchison, its a puzzle blog, ALSO: ALSO: Lack of a substantial body of trustworthy Open Access databases Lack of a substantial body of trustworthy Open Access databases Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another) Non-standard chemical data formats (over 40 in regular use and requiring normalization to one another)

66 Overview of the Talk Review of ACS CINF 2004 Papers Review of ACS CINF 2004 Papers Review of Relevant Articles Review of Relevant Articles Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Public Chemistry Databases and Data Repositories with Bioinformatics Info/Links Overview of Web Services Overview of Web Services NIH-funded Projects Underway or Planned at Indiana University NIH-funded Projects Underway or Planned at Indiana University

67 Indiana University Planned Projects: Design of a Grid-based distributed data architecture Design of a Grid-based distributed data architecture Development of tools for HTS data analysis and virtual screening Development of tools for HTS data analysis and virtual screening Database for quantum mechanical simulation data Database for quantum mechanical simulation data Chemical prototype projects Chemical prototype projects Novel routes to enzymatic reaction mechanisms Novel routes to enzymatic reaction mechanisms Mechanism-based drug design Mechanism-based drug design Data-inquiry-based development of new methods in natural product synthesis Data-inquiry-based development of new methods in natural product synthesis

68 Web Services for Chemistry at IU PurposeTechnologies Interaction Layer Interactive software for creative access and exploitation of information by humans Microsoft.NET Smart Clients, portlets, Java applets, and browser clients, visualization technologies Aggregation Layer Workflows and data schemas customized for particular domains, applications and users BPEL, Taverna and other workflow modeling tools, aggregate web services Web service layer Comprehensive data and computation provision including storage, calculation, semantics and meta-data exposed as web services Apache web services, SOAP wrappers, WSDL, UDDI, XML, Microsoft.NET

69 NCI Developmental Therapeutics Program (DTP) Downloadable data: Downloadable data: In vitro 60 cell line results In vitro 60 cell line results in vitro anti-HIV results in vitro anti-HIV results Yeast assay Yeast assay 200,000+ chemical structures 200,000+ chemical structures molecular targets molecular targets microarray data microarray data Or search the database at: Or search the database at:

70 IU Database of NIH DTP Data Contains over 200,000 chemical structures tested in 60 cellular assays from different human tumor cell lines Contains over 200,000 chemical structures tested in 60 cellular assays from different human tumor cell lines Also includes microarray assay profiles for the untreated cell lines (~14,000 datapoints) Also includes microarray assay profiles for the untreated cell lines (~14,000 datapoints) A local PostgreSQL database containing the data that is exposed as a web service A local PostgreSQL database containing the data that is exposed as a web service Using workflows and complex SQL queries, we can do advanced data mining that exploits the chemical, biological and genomic information for particular audiences (chemists, biologists, etc) Using workflows and complex SQL queries, we can do advanced data mining that exploits the chemical, biological and genomic information for particular audiences (chemists, biologists, etc)

71 Mining the NIH DTP database ~200,000 compounds 60 cell lines ~14,000 gene expression values Cell lines can be clustered based on gene expression similarity Compounds can be clustered based on similarity of profile across cell lines, or by chemical structure fingerprint similarity

72 Use of Taverna at IU A protein implicated in tumor growth is supplied to the docking program (in this case HSP90 taken from the PDB 1Y4 complex) A protein implicated in tumor growth is supplied to the docking program (in this case HSP90 taken from the PDB 1Y4 complex) The workflow employs our local NIH DTP database service to search 200,000 compounds tested in human tumor cellular assays for similar structures to the ligand. The workflow employs our local NIH DTP database service to search 200,000 compounds tested in human tumor cellular assays for similar structures to the ligand. Client portlets are used to browse these structures Client portlets are used to browse these structures Once docking is complete, the user visualizes the high-scoring docked structures in a portlet using the JMOL applet. Once docking is complete, the user visualizes the high-scoring docked structures in a portlet using the JMOL applet. Similar structures are filtered for drugability, and are automatically passed to the OpenEye FRED docking program for docking into the target protein. Similar structures are filtered for drugability, and are automatically passed to the OpenEye FRED docking program for docking into the target protein. A 2D structure is supplied for input into the similarity search (in this case, the extracted bound ligand from the PDB IY4 complex) A 2D structure is supplied for input into the similarity search (in this case, the extracted bound ligand from the PDB IY4 complex) Correlation of docking results and biological fingerprints across the human tumor cell lines can help identify potential mechanisms of action of DTP compounds Correlation of docking results and biological fingerprints across the human tumor cell lines can help identify potential mechanisms of action of DTP compounds

73 Taverna Workflow Visual depiction of workflow Workflow definition Available web services (WSDL)

74 Taverna in Action

75 Overall Workflow

76 Pre-Closing Quote There is not going to be a voila moment at the computer terminal. Instead, there is systematic use of wide-ranging computational tools to facilitate and enhance the drug discovery process. There is not going to be a voila moment at the computer terminal. Instead, there is systematic use of wide-ranging computational tools to facilitate and enhance the drug discovery process. Jorgensen. Science, March 19, 2004, 303, Jorgensen. Science, March 19, 2004, 303, 1814.

77 Closing quote The future of chemistry depends on the automated analysis of chemical knowledge, combining disparate data sources in a single resource, such as the World-Wide Molecular Matrix, which can be analysed using computational techniques to assess and build on these data. Townsend et al. Org. Biomol. Chem. 2004, 2, Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

78 Post-closing quote: zzzzzCAS In an industry first, Chemical Abstracts Service (CAS) has unveiled a revolutionary new literature searching tool which will permit scientists to search and retrieve the worlds chemical literature including patents and obscure technical reportsin their sleep. --Author unknown

79 Acknowledgements Randy Arnold Randy Arnold Xiao Dong Xiao Dong Sean Mooney Sean Mooney Peter Murray-Rust Peter Murray-Rust David J. Wild David J. Wild I533 Chemical Informatics Seminar Students I533 Chemical Informatics Seminar Students Elsevier Science Elsevier Science

80 Bibliography: Articles, Books, and Conference Papers The Bigger Picture: Linking Bioinformatics to Cheminformatics [CINF Symposium] Abstracts [1-16], 227th ACS National Meeting Anaheim, CA, March 28-April 1, tm The Bigger Picture: Linking Bioinformatics to Cheminformatics [CINF Symposium] Abstracts [1-16], 227th ACS National Meeting Anaheim, CA, March 28-April 1, tm tm tm Austin, C.P. The completed human genome: implications for chemical biology. Current Opinion in Chemical Biology 2003, 7, Austin, C.P. The completed human genome: implications for chemical biology. Current Opinion in Chemical Biology 2003, 7, Bajorath, Jürgen, ed. Chemoinformatics: concepts, methods, and tools for drug discovery. Totowa, N.J. : Humana Press, c2004. (Methods in molecular biology ; v. 275) Bajorath, Jürgen, ed. Chemoinformatics: concepts, methods, and tools for drug discovery. Totowa, N.J. : Humana Press, c2004. (Methods in molecular biology ; v. 275) Banville, Debra L. Mining chemical structural informationo from the drug literature. Drug Discovery Today January 2006, 11(1/2), Banville, Debra L. Mining chemical structural informationo from the drug literature. Drug Discovery Today January 2006, 11(1/2), Brown F. Editorial opinion: chemoinformatics - a ten year update. Current Opinion in Drug Discovery and Development 2005 May; 8(3): Brown F. Editorial opinion: chemoinformatics - a ten year update. Current Opinion in Drug Discovery and Development 2005 May; 8(3):

81 Bibliography: Articles (contd) Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.; Zhang, Yong. Enhancement of the chemical semantic web through InChIfication. Organic & Biomolecular Chemistry 2005, 3, Coles, Simon J.; Day, Nick E.; Murray-Rust, Peter; Rzepa, Henry S.; Zhang, Yong. Enhancement of the chemical semantic web through InChIfication. Organic & Biomolecular Chemistry 2005, 3, Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the life sciences." Drug Discovery Today 2005, 10(12), Curcin, Vera; Ghanem, Moustafa; Guo, Yike. "Web services in the life sciences." Drug Discovery Today 2005, 10(12), Gagna CE, Winokur D, Clark Lambert W. Cell biology, chemogenomics and chemoproteomics. Cell Biol Int. 2004; 28(11): Gagna CE, Winokur D, Clark Lambert W. Cell biology, chemogenomics and chemoproteomics. Cell Biol Int. 2004; 28(11): Geldenhuys, W.J.; Gaasch, K.E.; Watson, M.; Allen, D.D.;Van Der Schyf, C.J. Optimizing the use of open-source software applications in drug discovery. Drug Discovery Today February 2006, 11(3/4), Geldenhuys, W.J.; Gaasch, K.E.; Watson, M.; Allen, D.D.;Van Der Schyf, C.J. Optimizing the use of open-source software applications in drug discovery. Drug Discovery Today February 2006, 11(3/4), Guha, R.; Howard, M.T.; Hutchison, G.R.; Murray-Rust, P.; Rzepa, H.; Steinbeck, C; Wegner, J.; Willighagen, E.L. The Blue Obelisk Interoperability in chemical informatics. Journal of Chemical Information and Modeling 2006 Web Release Date: 22-Feb-2006; DOI: /ci050400b Guha, R.; Howard, M.T.; Hutchison, G.R.; Murray-Rust, P.; Rzepa, H.; Steinbeck, C; Wegner, J.; Willighagen, E.L. The Blue Obelisk Interoperability in chemical informatics. Journal of Chemical Information and Modeling 2006 Web Release Date: 22-Feb-2006; DOI: /ci050400b

82 Bibliography: Articles (contd) Jónsdóttir, S.O.; Jorgensen, F.S.; Brunak, S. Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates. Bioinformatics 2005 May 15; 21(10): Jónsdóttir, S.O.; Jorgensen, F.S.; Brunak, S. Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates. Bioinformatics 2005 May 15; 21(10): Jorgensen, William L. The many roles of computation in drug discovery. Science March 19, 2004, 303, Jorgensen, William L. The many roles of computation in drug discovery. Science March 19, 2004, 303, Kauffman, Thom. Profile. [interview] LiveWire, March 2005, 7.3; Kauffman, Thom. Profile. [interview] LiveWire, March 2005, 7.3; Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. Communication and re-use of chemical information in bioscience. BMC Bioinformatics 2005, 6, 180. Murray-Rust, Peter S.; Mitchell, John B.O.; Rzepa, Henry S. Communication and re-use of chemical information in bioscience. BMC Bioinformatics 2005, 6, 180. Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. Chemistry in bioinformatics. BMC Bioinformatics 2005, 6, Murray-Rust, Peter; Mitchell, John B.O.; Rzepa, Henry S. Chemistry in bioinformatics. BMC Bioinformatics 2005, 6, Povolna, Vera; Dixon, Scott; Weininger, David. CabinetChemical and Biological Informatics NETwork. in: Oprea, Tudor I., ed. Chemoinformatics in Drug Discovery. Weinheim: Wiley-VCH, 2004, Povolna, Vera; Dixon, Scott; Weininger, David. CabinetChemical and Biological Informatics NETwork. in: Oprea, Tudor I., ed. Chemoinformatics in Drug Discovery. Weinheim: Wiley-VCH, 2004,

83 Bibliography: Articles (contd) Salamone, Salvatore. Hip Hop offers lessons on life sciences data integration. Bio-IT World February 2006, 36. Salamone, Salvatore. Hip Hop offers lessons on life sciences data integration. Bio-IT World February 2006, 36. Scherf Uwe, Ross Douglas T., Waltham Mark, Smith Lawrence H., Lee Jae K., Tanabe Lorraine, Kohn Kurt W., Reinhold William C., Myers Timothy G., Andrews Darren T., Scudiero Dominic A., Eisen Michael B., Sausville Edward A., Pommier Yves, Botstein David, Brown Patrick O., Weinstein John N. A gene expression database for the molecular pharmacology of cancer. Nature Genetics 2000, 24, Scherf Uwe, Ross Douglas T., Waltham Mark, Smith Lawrence H., Lee Jae K., Tanabe Lorraine, Kohn Kurt W., Reinhold William C., Myers Timothy G., Andrews Darren T., Scudiero Dominic A., Eisen Michael B., Sausville Edward A., Pommier Yves, Botstein David, Brown Patrick O., Weinstein John N. A gene expression database for the molecular pharmacology of cancer. Nature Genetics 2000, 24, Souchelnytskyi, S. "Bridging proteomics and systems biology: What are the roads to be traveled?" Proteomics 2005 (November), 5(16), Souchelnytskyi, S. "Bridging proteomics and systems biology: What are the roads to be traveled?" Proteomics 2005 (November), 5(16), Tetko, Igor V. Computing chemistry on the web. Drug Discovery Today November 2005, 10(22), Tetko, Igor V. Computing chemistry on the web. Drug Discovery Today November 2005, 10(22),

84 Bibliography: Articles (contd) Zimmermann, Marc; Thi, Le Thuy Bui; Hofmann, Martin. Combating illiteracy in chemistry: Towards computer-based chemical structure reconstruction. ERCIM News January 2005, 60, Zimmermann, Marc; Thi, Le Thuy Bui; Hofmann, Martin. Combating illiteracy in chemistry: Towards computer-based chemical structure reconstruction. ERCIM News January 2005, 60, ERCIM05_04.pdf ERCIM05_04.pdf ERCIM05_04.pdf ERCIM05_04.pdf Zimmermann, Marc; Fluck, Juliane; Thi, Le Thuy Bui; Kolarik, Corinna; Kumpf, Kai; Hofmann, Martin. Information extraction in the life sciences: Perspectives for medicinal. chemistry, pharmacology and toxicology. Current Topics in Medicinal Chemistry 2005, 5(8), Zimmermann, Marc; Fluck, Juliane; Thi, Le Thuy Bui; Kolarik, Corinna; Kumpf, Kai; Hofmann, Martin. Information extraction in the life sciences: Perspectives for medicinal. chemistry, pharmacology and toxicology. Current Topics in Medicinal Chemistry 2005, 5(8),

85 Bibliography: Databases Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 2004, 32 Database issue D226-D229 doi: /nar/gkh039 Andreeva, A.; Howorth, D.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 2004, 32 Database issue D226-D229 doi: /nar/gkh039 Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P. ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics Nov 15; 21(22): Chen J, Swamidass SJ, Dou Y, Bruand J, Baldi P. ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics Nov 15; 21(22): Dunkel, M.; Fullbeck, M.; Neumann, S.; Preissner, R. SuperNatural: a searchable database of available natural compounds. Nucleic Acids Research 2006, 34, Database issue D678-D683 doi: /nar/gkj132 Dunkel, M.; Fullbeck, M.; Neumann, S.; Preissner, R. SuperNatural: a searchable database of available natural compounds. Nucleic Acids Research 2006, 34, Database issue D678-D683 doi: /nar/gkj132 Gold, Nicola D.; Jackson, Richard M. A searchable database for comparing protein-ligand binding site for the analysis of structure- function relationships. Journal of Chemical Information and Modeling 2006, 46(2), Gold, Nicola D.; Jackson, Richard M. A searchable database for comparing protein-ligand binding site for the analysis of structure- function relationships. Journal of Chemical Information and Modeling 2006, 46(2),

86 Bibliography: Databases (contd) Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, F. Itoh, M.; Kawashima, S.; Katayama, T.; Araki, M; Hirakawa, M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Research 2006, 34, Database issue D354-D357. doi: 10:1093/nar/gkj102. Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, F. Itoh, M.; Kawashima, S.; Katayama, T.; Araki, M; Hirakawa, M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Research 2006, 34, Database issue D354-D357. doi: 10:1093/nar/gkj102. Kellenberger, Esther; Muller, Pascal; Schalon, Clarire; Bret, Guillaume; Foata, Nicolas; Rognan, Didier. sc-PDB: An annotated database of druggable binding sites from the Protein Data Bank. Journal of Chemical Information and Modeling 2006, 46(2), Kellenberger, Esther; Muller, Pascal; Schalon, Clarire; Bret, Guillaume; Foata, Nicolas; Rognan, Didier. sc-PDB: An annotated database of druggable binding sites from the Protein Data Bank. Journal of Chemical Information and Modeling 2006, 46(2), Kirwin, J.J.; Shoichet, B.K. ZINCA free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 2005, 45, Kirwin, J.J.; Shoichet, B.K. ZINCA free database of commercially available compounds for virtual screening. Journal of Chemical Information and Modeling 2005, 45, Kouranov, A.; Xie, L. de la Cruz, J.; Chen, L.; Westbrook, J.; Bourne, P.E.; Berman, H.M. The RCSB PDB information protal for structural genomics. Nucleic Acids Research 2006, 34, Database issue D302-D305 doe: 10:1093/nar/gkj120 Kouranov, A.; Xie, L. de la Cruz, J.; Chen, L.; Westbrook, J.; Bourne, P.E.; Berman, H.M. The RCSB PDB information protal for structural genomics. Nucleic Acids Research 2006, 34, Database issue D302-D305 doe: 10:1093/nar/gkj120 Kumar, M.D.S.; Gromiha, M.M. PINT: Protein-protein interactions thermodynamic database. Nucleic Acids Research 2006, 34 Database issue D195-D198 doi: /nar/gkj017 Kumar, M.D.S.; Gromiha, M.M. PINT: Protein-protein interactions thermodynamic database. Nucleic Acids Research 2006, 34 Database issue D195-D198 doi: /nar/gkj017

87 Bibliography: Databases (contd) Lo Conte, L.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Research 2002, 30(1): Lo Conte, L.; Brenner, S.E.; Hubbard, T.J.P.; Chothia, C.; Murzin, A.G. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Research 2002, 30(1): Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 1995, 247, Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 1995, 247, Okuno, Y.; Yang, J.; Taneishi, K.; Yabuuchi, H.; Tsujimoto, G. GLIDA: GPCR-ligand database for chemical genomic drug discovery. Nucleic Acids Research 2006, 34, Database issue D673- D677 doi: /nar/gkj028. Okuno, Y.; Yang, J.; Taneishi, K.; Yabuuchi, H.; Tsujimoto, G. GLIDA: GPCR-ligand database for chemical genomic drug discovery. Nucleic Acids Research 2006, 34, Database issue D673- D677 doi: /nar/gkj028. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Research. 2005, 33 Database Issue D247-D251. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Research. 2005, 33 Database Issue D247-D251.

88 Bibliography: Databases (contd) Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 2006, 34 Database Issue D173-D180 doi: /nar/gkj158 Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 2006, 34 Database Issue D173-D180 doi: /nar/gkj158 Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey, Jennifer. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res Jan 1;34(Database issue): D Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey, Jennifer. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res Jan 1;34(Database issue): D

89 Biotech Validation Suite for Protein Structures Send the server a PDB file Send the server a PDB file Server provides a comprehensive check of the protein, including: Server provides a comprehensive check of the protein, including: Atomic volume analysis Atomic volume analysis Full geometric analysis Full geometric analysis NMR restraint data NMR restraint data

90 Knowledge-Driven Bioinformatics Enhanced with Chemistry

91 ToxTree An in silico toxicology prediction suite An in silico toxicology prediction suite Based on the CDK toolkit Based on the CDK toolkit Built on CML Built on CML Released as OpenSource under the GPL Released as OpenSource under the GPL Standalone PC software Standalone PC software User Manual: TREE/toxTree_user_manual.pdf User Manual: TREE/toxTree_user_manual.pdf TREE/toxTree_user_manual.pdf TREE/toxTree_user_manual.pdf

92 Tools for Genomic and Proteomic Scientists vis-à-vis Cell Biology (Gagna et al.) Tools to fully exploit the techniques in cellular biology Tools to fully exploit the techniques in cellular biology Light microscopy for high resolution images Light microscopy for high resolution images Fractionation of cells into basic components via ultracentrifugation Fractionation of cells into basic components via ultracentrifugation Analysis of individual cells through flow cytometry Analysis of individual cells through flow cytometry LCM, normal and diseased TMAs (tissue microarrays), quantitative computer image analysis, cell micromanipulation, and high-throughput microscopy LCM, normal and diseased TMAs (tissue microarrays), quantitative computer image analysis, cell micromanipulation, and high-throughput microscopy

93 InChI Generation on the Web The following websites provide the facility to generate InChIs: The following websites provide the facility to generate InChIs: ACD/Labs' freely available structure-drawing program ChemSketch includes the facility to generate InChIs from drawn structures. ACD/Labs' freely available structure-drawing program ChemSketch includes the facility to generate InChIs from drawn structures. pubchem.ncbi.nlm.nih.gov/edit/ PubChem Server Side Structure Editor v1.8 includes a facility for generating InChIs as you draw the structure. pubchem.ncbi.nlm.nih.gov/edit/ PubChem Server Side Structure Editor v1.8 includes a facility for generating InChIs as you draw the structure. pubchem.ncbi.nlm.nih.gov/edit/

94 Advances in Macromolcular Crystallography by CCG More protein structures available now More protein structures available now Use of 3D info in bioinformatics makes functional inferences more dependable Use of 3D info in bioinformatics makes functional inferences more dependable CCG Structural Family Database distributed with MOECCG Structural Family Database distributed with MOE Includes fold detection methodology to ID structurally similar proteins Includes fold detection methodology to ID structurally similar proteins Simultaneous sequence and structural alignment of large collections of proteins Simultaneous sequence and structural alignment of large collections of proteins 3D structural family analysis for insight into conserved geometry, water molecules, salt bridges, hydrogen bonds, hydrophobic contacts, and disulfide bonds 3D structural family analysis for insight into conserved geometry, water molecules, salt bridges, hydrogen bonds, hydrophobic contacts, and disulfide bonds

95 CCGs Cheminformatics Offerings MOE Molecular Database MOE Molecular Database Molecular Descriptors calculated and used for classification, clustering, filtering, and predictive model construction Molecular Descriptors calculated and used for classification, clustering, filtering, and predictive model construction QSAR/QSPR Predictive Modeling QSAR/QSPR Predictive Modeling Diversity and Similarity Searching Diversity and Similarity Searching High Throughput Conformational Search High Throughput Conformational Search 3D Pharmacophore Search 3D Pharmacophore Search

96 Components of the Semantic Web for Chemistry XML – eXtensible Markup Language XML – eXtensible Markup Language RDF – Resource Description Framework RDF – Resource Description Framework RSS – Rich Site Summary RSS – Rich Site Summary Dublin Core – allows metadata-based newsfeeds Dublin Core – allows metadata-based newsfeeds OWL – for ontologies OWL – for ontologies BPEL4WS – for workflow and web services BPEL4WS – for workflow and web services Murray-Rust et al. Org. Biomol. Chem. 2004, 2, Murray-Rust et al. Org. Biomol. Chem. 2004, 2,

97 Web Services Integration Projects: Biosciences myGrid myGrid BIOPIPE BIOPIPE BioMOBY BioMOBY

98 BIOT 2006 Major themes, areas and suggested topics include Major themes, areas and suggested topics include - Bio-molecular and Phylogenetic Databases - Bio-molecular and Phylogenetic Databases - Molecular Evolution and Phylogenetic analysis - Molecular Evolution and Phylogenetic analysis - Drug Delivery Systems - Drug Delivery Systems - Bio-Ontology and Data Mining - Bio-Ontology and Data Mining - Sequence Search and Alignment - Sequence Search and Alignment - Microarray Analysis - Microarray Analysis - System Biology - System Biology - Pathway analysis - Pathway analysis - Identification and Classification of Genes - Identification and Classification of Genes - Protein Structure Prediction and Molecular Simulation - Protein Structure Prediction and Molecular Simulation - Functional Genomics - Functional Genomics - Proteomics - Proteomics - Tertiary structure prediction - Tertiary structure prediction - Drug Docking - Drug Docking - Gene Expression Analysis - Gene Expression Analysis - Biomedical Imaging - Biomedical Imaging

99 Proteomics: What is it? Proteomics is the study of protein expression, regulation, modification, and function in living systems for understanding how living systems use proteins. Using a variety of techniques, proteomics can be used to study how proteins interact within a system, or how proteins change due to applied stresses. Requires advanced measurement techniques, especially separations and mass spectrometry

100 Proteomics Needs Informatics for: Locating peaks in 2 or more dimensions MS/MS spectra interpretation Protein/Peptide quantification Peptide detectability Experimental data Biological information enzyme or pathway regulation disease susceptibility drug efficacy


Download ppt "Bridging Bioinformatics and Chem(o)informatics Gary Wiggins School of Informatics Indiana University Yan He (SLIS MLS Student) Meredith."

Similar presentations


Ads by Google