Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright GeneGo 2000-2003 CONFIDENTIAL Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej.

Similar presentations

Presentation on theme: "Copyright GeneGo 2000-2003 CONFIDENTIAL Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej."— Presentation transcript:


2 Copyright GeneGo CONFIDENTIAL Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej Bugrim GeneGo, Inc.

3 Copyright GeneGo CONFIDENTIAL Topics Annotation process and collecting network content for idustrial-type applications Biological and disease ontologies – how to improve and use them in functional analysis Tools: utilizing network data in pharmaceutical R&D

4 Copyright GeneGo CONFIDENTIAL Multi-level understanding of human biology Level of phenotype Level of Cell process/ network Level of protein Causative relations Mechanistic relations

5 Copyright GeneGo CONFIDENTIAL BC-perturbed cell processesCausative BC models Disease-centered knowledge base in MetaMiner (Oncology example) General BC schema Other cancers chosen by Consortium Compare Causative disease associations: DNA, RNA, protein levels Disease group Protein-protein; Protein-DNA; protein-RNA interactions Network group Ligand-receptor interactions: drugs, leads, hits Chemistry group Biomarkers Specialty group GG annotation team

6 Copyright GeneGo CONFIDENTIAL Content

7 Copyright GeneGo CONFIDENTIAL Ligands: metabolites, peptides, xenoboitics Membrane receptors Signal transduction: G proteins, Secondary messengers Kinases Phosphotases Transcription factors Core effect: metabolic pathways Metabolites 1,600 drugs w/targets 4,100 endogenous metabolites >21,000 ligand-receptor interactions 850 GPCRs and other membrane receptors 110 Nuclear hormone receptors Three interactions domains in MetaCore 172K manually curated physical signaling interactions 538 canonical maps 42, step canonical signal transduction pathways 924 Human transcription factors 6,000 target genes 11,300 metabolic reactions 116 Fine metabolic maps 4,100 endogenous metabolites

8 Copyright GeneGo CONFIDENTIAL MetaBase Content Overview –Database Chemical compounds580,000 Drugs8,590 Chemical Reactions35,600 Metabolic networks251 –Network Proteins + genes13,402 Transcription factors924 Chemical compounds26,000 Drugs 2,740 Endogenous compounds4,100 Proteins linked to drugs2,711 Reactions5,330 Small molecule ligands for human receptors3,510 blockers for ion channels629 Pubmed journals3,100 Pubmed articles81,400 Total amount of interactions177,000 –Content GeneGo regulatory networks120 GeneGo disease networks 88 Maps538 Regulatory maps325 Metabolic maps116 Traditional metabolic maps (EC)97 Diseases4,920

9 Copyright GeneGo CONFIDENTIAL MetaBase content by type Database Genes (human: 38,700) Total:137,500 Chemical compounds 580,000 Human proteins 14,570 Metabolic reactions 35,600

10 Copyright GeneGo CONFIDENTIAL Network interactions All interactions taken from articles indexed in Pubmed Pubmed journals 3,100 Pubmed articles 81,400 Manually curated interactions (172,787) Signalling interactions; 137,297; 79% Metabolic reactions; 35,490; 21% Y2H "Interactome"; 2,370; 1% Logical relations; 1,934; 1% Protein-protein; 87,675; 51% Small molecule-protein; 42,383; 26% With MicroRNA; 1,620; 1% With virus protiens; 335; 0% Chip-Chip; 980; 1%

11 Copyright GeneGo CONFIDENTIAL Type of interactions in network Effects activation inhibition unspecified Direct interaction Indirect interaction Mechanism phosphorylationinfluence on expression dephosphorylationunspecified other type of covalent modification binding transport cleavage transcription regulation transformation catalysis competition

12 Copyright GeneGo CONFIDENTIAL Distribution of interactions by mechanism

13 Copyright GeneGo CONFIDENTIAL Network objects Total number of nodes: 40,229

14 Copyright GeneGo CONFIDENTIAL Proteins: distribution by tissue & localization

15 Copyright GeneGo CONFIDENTIAL Molecular functions in Database

16 Copyright GeneGo CONFIDENTIAL Endogenous compounds (4,100 total) 3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only 751 endogenous ligand for 498 receptors with 2,455 interactions 4000 (98%) of endogenous compounds in network 15,962 network interactions with endogenous metabolites 3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups)

17 Copyright GeneGo CONFIDENTIAL Network and pathway statistics in GeneGO >40,000 nodes; ~177,000 edges; Average node degree: 3,77; 241 million shortest pathways; Average shortest pathway length: ; 42, step canonical signal transduction pathways; 200 canonical metabolic pathways- major metabolic fluxes like glycolysis or TCA; 72,000 pathways on metabolic maps: pathways analogous to KEGG (KEGG has 42,500) Enzyme1 Enzyme2 reaction1reaction2metabolite

18 Copyright GeneGo CONFIDENTIAL Pathways in regulatory network a ab Start: TMR (transmembrane receptor) TF (Transcription Factor) End: Target genes

19 Copyright GeneGo CONFIDENTIAL Ontologies

20 Copyright GeneGo CONFIDENTIAL Mixed ontologies Knowledge base (ontologies) By genre: - Drama - Action - Romance - Horror - Foreign By director: - Lynch -Tarantino - Leone - Stone - Antonioni By actor: - Pitt - Nicholson - Depp - Redford - Damon By year: How do you compare “action” movies vs. Tarantino movies vs movies? These are incomparable as these are different categories Molecular pthwy Cellular process Disease Metabolic process

21 Copyright GeneGo CONFIDENTIAL Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology

22 Copyright GeneGo CONFIDENTIAL GeneGo metabolic maps vs. KEGG human maps Human genes in GG and KEGG

23 Copyright GeneGo CONFIDENTIAL Enrichment in GO and GeneGo processes 4 samples from 4 patiens Disease/norm from same patients Affy U133A arrays GO processes GeneGo process networks Resolution: list of proteins No connections between proteins No sgnaling/effect within process Resolution: interactions between proteins Connections between all proteins in folder Clear signaling path, effect within process

24 Copyright GeneGo CONFIDENTIAL Genes from GO process “Inflammatory response” 231 Genes from GO-process “Immune response” 446 Genes from GO-processes “Inflammatory response” “Immune response” 613 Not in networks 268 Genes in 15 process networks 1642 Genes added to networks 1297 In networks 345 Not in networks 79 Not in networks 199 In networks 247 In networks 152 Inflammation

25 Copyright GeneGo CONFIDENTIAL Diseases Human genes linked to diseases – 6,318 Human genes not linked to diseases – 32,391 Diseases with no gene links – 3,251 Diseases linked to genes – 1,630 6,318 genes are linked to 1,630 diseases 4,881 Diseases, based on MeSH 38,709 Human genes total 21,264 unique articles, indexed in PubMed

26 Copyright GeneGo CONFIDENTIAL Disease tree – Neoplasms by Site

27 Copyright GeneGo CONFIDENTIAL Drug toxicity tree Folders from MeSH Folders created at GeneGo based on reviews 38 Drug-induced pathological processes

28 Copyright GeneGo CONFIDENTIAL Gene-Disease connections in public domain and GeneGo GENEMeSH Hierarchical strusture disease classification 4,888 diseases Genes associated with diseases 6,429 Cited articles 33, 792 Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations withgenes and proteins Only citation with Diseases name. Low trust Only hierarchical structure disease tree OMIM Only genetic info (mutation, SNPs) -No expression - No protein activity, loc GeneGo

29 Copyright GeneGo CONFIDENTIAL Content. Cancer maps and networks. Breast Cancer: general scheme

30 Copyright GeneGo CONFIDENTIAL Angiogenesis in tumor growth

31 Copyright GeneGo CONFIDENTIAL Unique genes Human Mouse, Rat 141 mouse genes 74 rat genes 9 mouse genes 2 rat genes 1 mouse gene 1 rat gene Unique genes and orthologs catalyse one reaction Unique genes catalyze unique reactions There is no human orthologs for Protein A Orthologs catalyse different reactions Fine metabolic differences between rodents, human

32 Copyright GeneGo CONFIDENTIAL Tools

33 Copyright GeneGo CONFIDENTIAL Data analysis workflow in MetaDiscovery suit HTS, HCS PathwayEditorMetaLinkMapEditor Custom interactions data: -Y2H -Pull-down -Co-expression - annotation Custom maps, networks, pathways MetaCore/MetaDrug platform Med. chemistry: - Indications - Toxicities - Off-site effects Modeling software: -CellDesigner - Virtual Cell - SBML, BioPax Biology: - Biomarkers - Pathway-based targets Structures sdf, MOL Molecular bio data HTS, HCSMetabolites ISIS DB Signature networks -Diseases -Drug response P-value scoring Ontologies: -GO processes -GeneGo processes -Canonical pathways -Metabolic networks -Diseases -Toxicities Cross-experiment comparison -Time series - Multi-patient cohorts - Multiple logical operations -Complete report Network alignment - Multiple algorithms - Sub-network queries

34 Copyright GeneGo CONFIDENTIAL MetaCore™ Platform Networks Building Tools Visualization Tools Oracle Based Database curated interactions from the literature Data:m-arrays, SAGE, proteomics, siRNA, metabolites, custom interactions Logical operations module Pathway editor Statistics for pathways, processes, networks

35 Copyright GeneGo CONFIDENTIAL Networks of protein interactions –Dynamic; built “on-the-fly” –Exploratory tool –Build new pathways for genes of interest Pathways Integration Interactive, static maps –550 maps –Signaling, regulation, metabolism, diseases –Backbone of formalized “state of art” in the field

36 Copyright GeneGo CONFIDENTIAL Choose direction and checkpoints within network building page From – histamine through – histamine H1 receptor to – Actin

37 Copyright GeneGo CONFIDENTIAL Non-significant bars become semi-transparent False discovery rate filter 0.01 Apply i Threshold

38 Copyright GeneGo CONFIDENTIAL DNA level data Illumina, Affy SNP arrays Epigenomics dataChIP/Chip, CGH, Taq assays Resequencing Pharamacogenomics Disease biomarkers Input data Analysis Applications Direct data analysis Correlation with molecular data Correlation with clinical data MetaDiscovery tools

39 Copyright GeneGo CONFIDENTIAL New customization modules MapEditor: custom maps synchronized with MC/MD database –Draw pathways maps from scratch –Transform gene lists into networks into pathway maps –Edit MetaCore’s canonical maps –View and score your maps within the context of canonical maps –Map experimental data on custom maps MetaLink: overlaying custom interactions –Import custom interactions (Y2H, co-expression, pull-down, etc.) –Visualize using GeneGo network building algorithms –Score “unknown” proteins (high IP potential) based on relevance to “benchmark” networks built from MetaCore interactions PathwayEditor: annotation technology transfer, at the database level –Custom annotation of interactions, compounds, diseases, metabolism in the framework of internal annotation system at GeneGo –Use the annotation forms, workflows and QC system developed at GeneGo –Novel objects are imported and integrated with pre-existing data in MetaCore

40 Copyright GeneGo CONFIDENTIAL Adding Localizations Additional Localizations can be added

41 Copyright GeneGo CONFIDENTIAL Your NEW map is now an interactive part of MetaCore Users can visualize their experimental data on the new map

42 Copyright GeneGo CONFIDENTIAL Resulting Direct Interactions network Pink interactions are from the uploaded links file Mouse over an interaction to see the uploaded weight value Blue interactions are in both the links file and the MetaCore database Mapping interaction sets on networks

43 Copyright GeneGo CONFIDENTIAL A NEW paradigm! = 1 protein at a time 3,400 disease related genes 500+ diseases w/genes 600 known drug targets 3,000 toxicity related genes 700 cellular processes 200,000 protein interactions Present a structure as a GROUP of proteins Query KNOWLEDGE base: - Diseases - Toxicities - Pathways - Processes - Networks -Score based on analysis -Indication -Toxicity -Select best structures = Systems level analysis Many, many products MetaDrug IND BIOLOGAL effects: -Indication -Toxicity -Off target effects: = GROUPS OF PROTEINS! Novel compounds Structure-based modeling QSARs models activity Dry lab predictions Pharmacophores QSAR models metabolism QSARs models toxicity Report BY and FOR end user = chemist = Systems level effects

44 Copyright GeneGo CONFIDENTIAL Integration with MDL DiscoveryGate Copyright GeneGo Search compound of interest or its metabolites in MDL databases * (requires access to Discovery Gate)

45 Copyright GeneGo CONFIDENTIAL Merge networks

46 Copyright GeneGo CONFIDENTIAL Algorithms

47 Copyright GeneGo CONFIDENTIAL Old and new ways to analyze data Full data tables Statistical procedures, thresholds of fold, p- value either in MC or 3 rd party tools Sets of genes Connect them on network by one way or another: Too many choices, no clear way to choose Full data tables Statistical procedures in MC based on concurrent analysis of expression profiles and connectivity Sets of network modules Apply to global network Current way of analysis: all significance calculations done before mapping onto network New way of analysis: significance calculations follow the mapping onto network

48 Copyright GeneGo CONFIDENTIAL Samples are analyzed in pathway’s expression space Sample 1Sample 2Sample 3Sample 4 Gene Gene Gene Gene 42542

49 Copyright GeneGo CONFIDENTIAL Network signatures for compounds effects Mestranol Tamoxifen Phenobarbital

50 Copyright GeneGo CONFIDENTIAL Finding topologically significant nodes A BC Topologically significant Not topologically significant 4 out 6 under nodes regulated by B are differentially expressed: more than random share = significant Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event = not significant In reality algorithm also considers nodes beyond first-degree neighbors Differentially expressed genesNon-differentially expressed genes

51 Copyright GeneGo CONFIDENTIAL Why JAK1 is significant in this dataset? Regulation via JAK1 JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1 Topological significance helps to find important links in pathways that do not come up on HT screens Feedback loops

52 Copyright GeneGo CONFIDENTIAL Regulation of lipid Metabolism Differentially expressed genes identified by microarray and confirmed by proteomic screen Topologically significant nodes revealed by the new algorithm

53 Copyright GeneGo CONFIDENTIAL Putting it all together: network activity inference –Identifying causal relation between putative input and output signals –Tracking effects of molecular perturbation trough activation/inhibition cascades Experimental data: start cascade Experimental data: terminate cascade Inferred activity Experimental data Predicted input Predicted target Scoring intermediary nodes

54 Copyright GeneGo CONFIDENTIAL Work in progress Finding Patterns of significance (based on one experiment): –Significant neighborhoods –Significant receptors (by underlying cascade) –Significant transcription factors (by upstream cascade) –Significant interaction types (by distribution of expression at terminals) Finding common and different pathway modules (based on multiple samples: –Looking for “differential pathways” - modules that distinguish one group of samples from another –Finding common motifs in a group of pathway modules Inferring patterns of network activity –Identifying causal relation between putative input and output signals –Tracking effects of molecular perturbation trough activation/inhibition cascades Looking into mutual gene-process information and Bayesian inference of significance –If gene G occurs only in process P its up-/down-regulation is a significant evidence with respect to inferring P’s status –If gene G occurs in many other processes in addition to P its up-/down- regulation is not a significant evidence with respect of inferring P’s status

55 Copyright GeneGo CONFIDENTIAL Future products

56 Copyright GeneGo CONFIDENTIAL MetaMiner Consortiums for 2007 Oncology (breast cancer, 4 other cancers) Metabolic diseases (diabetes II, obesity, metabolic syndrome) CNS and neurodegenerative diseases Immunological and autoimmune diseases

57 Copyright GeneGo CONFIDENTIAL MetaMiner consortiums: Analytical platform for disease areas HTS, HCS MetaMiner (Oncology) platform Cancer relevant annotations, datatabases, Active cpds analysis creening Maps for disease, processes, drug action Custom maps for projects Experimental data depository Data parsing, normalization Data analysis Cancer consortium labs Compounds scoring: - Indications - Toxicities - Off-site effects Drug targets: -Divergence hubs on networks; - “Druggability” testing - Pathways connectivity Biomarkers: -Combination of different types - Expression - Secreted proteins - Metabolites -Convergence hubs (core effectors)

58 Copyright GeneGo CONFIDENTIAL MetaTox consortium. Functional descriptors Enrichment by category Pathways mapsToxicity, process mapsSub-networks, modules, nodes Mapping on descriptorsIndexing & scoring by tox. category Predictive models

Download ppt "Copyright GeneGo 2000-2003 CONFIDENTIAL Systems Biology for Drug Discovery Building and using protein interaction networks: industry perspective Andrej."

Similar presentations

Ads by Google