Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building and using protein interaction networks: industry perspective

Similar presentations

Presentation on theme: "Building and using protein interaction networks: industry perspective"— Presentation transcript:

1 Building and using protein interaction networks: industry perspective
Systems Biology for Drug Discovery Andrej Bugrim GeneGo, Inc.

2 Topics Annotation process and collecting network content for idustrial-type applications Biological and disease ontologies – how to improve and use them in functional analysis Tools: utilizing network data in pharmaceutical R&D

3 Multi-level understanding of human biology
Causative relations Mechanistic Level of phenotype Level of Cell process/ network Level of protein

4 Disease-centered knowledge base in MetaMiner (Oncology example)
GG annotation team Causative disease associations: DNA, RNA, protein levels Disease group Protein-protein; Protein-DNA; protein-RNA interactions Network group Biomarkers Specialty group Ligand-receptor interactions: drugs, leads, hits Chemistry group Compare BC-perturbed cell processes Causative BC models General BC schema Other cancers chosen by Consortium

5 Content

6 Three interactions domains in MetaCore
1,600 drugs w/targets 4,100 endogenous metabolites >21,000 ligand-receptor interactions 850 GPCRs and other membrane receptors 110 Nuclear hormone receptors Ligands: metabolites, peptides, xenoboitics Membrane receptors Signal transduction: G proteins, Secondary messengers Kinases Phosphotases 172K manually curated physical signaling interactions 538 canonical maps 42, step canonical signal transduction pathways 924 Human transcription factors 6,000 target genes Transcription factors Only MC has all 3 levels needed for functional recinstruction: ligand-receptor (primary signaling) level; signal transduction level; effector level (core metabolism). DONE 11,300 metabolic reactions 116 Fine metabolic maps Core effect: metabolic pathways Metabolites 4,100 endogenous metabolites

7 MetaBase Content Overview
Database Chemical compounds 580,000 Drugs 8,590 Chemical Reactions 35,600 Metabolic networks 251 Network Proteins + genes 13,402 Transcription factors 924 Chemical compounds 26,000 Drugs 2,740 Endogenous compounds 4,100 Proteins linked to drugs 2,711 Reactions 5,330 Small molecule ligands for human receptors 3,510 blockers for ion channels 629 Pubmed journals 3,100 Pubmed articles 81,400 Total amount of interactions 177,000 Content GeneGo regulatory networks 120 GeneGo disease networks Maps Regulatory maps 325 Metabolic maps 116 Traditional metabolic maps (EC) 97 Diseases 4,920 DONE (VK, ) Will work on it, provide histograms

8 MetaBase content by type
Database Genes (human: 38,700) Total:137,500 Chemical compounds 580,000 Human proteins 14,570 Metabolic reactions 35,600 DONE

9 Network interactions Manually curated interactions (172,787)
Signalling interactions; 137,297; 79% Metabolic reactions; 35,490; 21% Y2H "Interactome"; 2,370; 1% Logical relations; 1,934; 1% Protein-protein; 87,675; 51% Small molecule-protein; 42,383; 26% With MicroRNA; 1,620; 1% With virus protiens; 335; 0% Chip-Chip; 980; 1% DONE Counted collapsed interactions only, uncollapsed are over 650,000! All interactions taken from articles indexed in Pubmed Pubmed journals 3,100 Pubmed articles 81,400

10 Type of interactions in network
Effects activation inhibition unspecified Direct interaction Indirect interaction Mechanism phosphorylation influence on expression dephosphorylation unspecified other type of covalent modification binding transport cleavage transcription regulation transformation catalysis competition 1. We split all protein-protein and protein-compound (chem.) interactions on two main types: direct and indirect (functional) 2. Within both types of interactions, we separate distinct mechanisms which are defined by experimental evidence for interactions. There are 10 mechanisms for direct interactions and two mechanisms for indirect interactions. All interactions, direct and indirect, are split on directed and undirected ones and are marked with “effects” 3.1 Directed interactions are characterized by causative relationships between network objects (proteins, genes, compounds, complexes etc.). 3.2 Undirected interactions are marked with effect-mechanism

11 Distribution of interactions by mechanism

12 Network objects Total number of nodes: 40,229 DONE

13 Proteins: distribution by tissue & localization

14 Molecular functions in Database

15 Endogenous compounds (4,100 total)
3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only 751 endogenous ligand for 498 receptors with 2,455 interactions 4000 (98%) of endogenous compounds in network 15,962 network interactions with endogenous metabolites 3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups) DONE Note: 4,000 endogenous metabolic compounds: this is 4x more then Biocrates, the leader in human metabolomics has!!!

16 Network and pathway statistics in GeneGO
>40,000 nodes; ~177,000 edges; Average node degree: 3,77; 241 million shortest pathways; Average shortest pathway length: ; 42, step canonical signal transduction pathways; 200 canonical metabolic pathways- major metabolic fluxes like glycolysis or TCA; 72,000 pathways on metabolic maps: pathways analogous to KEGG (KEGG has 42,500) DONE Enzyme1 Enzyme2 reaction1 reaction2 metabolite

17 Pathways in regulatory network
Start: TMR (transmembrane receptor) TF (Transcription Factor) a a b End: Target genes

18 Ontologies

19 Knowledge base (ontologies)
By year: 2007 2006 2005 2004 2003 By genre: Drama Action Romance Horror Foreign By director: Lynch Tarantino Leone Stone Antonioni By actor: Pitt Nicholson Depp Redford Damon How do you compare “action” movies vs. Tarantino movies vs movies? These are incomparable as these are different categories Molecular pthwy Cellular process Disease Metabolic process Mixed ontologies

20 Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology

21 GeneGo metabolic maps vs. KEGG human maps
Human genes in GG and KEGG DONE, VK

22 Enrichment in GO and GeneGo processes
GeneGo process networks Resolution: interactions between proteins Connections between all proteins in folder Clear signaling path, effect within process Resolution: list of proteins No connections between proteins No sgnaling/effect within process 4 samples from 4 patiens Disease/norm from same patients Affy U133A arrays

23 231 613 446 268 345 1642 1297 Inflammation Genes from GO process
“Inflammatory response” 231 Genes from GO-processes “Inflammatory response” “Immune response” 613 Genes from GO-process “Immune response” 446 Not in networks 79 Not in networks 199 In networks 152 In networks 247 Not in networks 268 In networks 345 Синие блоки – исходные статистические данные Зелёные блоки – объекты GО-процессов, которые привязали к созданным нетворкам Красные блоки – объекты GО-процессов, которые пока не привязали к созданным нетворкам Желтые блоки – объекты, добавленные по смыслу представленных в нетворках процессов, согласно литературным данным Genes in 15 process networks 1642 Genes added to networks 1297

24 21,264 unique articles, indexed in PubMed
Diseases 4,881 Diseases, based on MeSH 38,709 Human genes total Human genes linked to diseases – 6,318 Diseases linked to genes – 1,630 Human genes not linked to diseases – 32,391 Diseases with no gene links – 3,251 DONE 6,318 genes are linked to 1,630 diseases 21,264 unique articles, indexed in PubMed

25 Disease tree – Neoplasms by Site

26 Folders created at GeneGo based on reviews
Drug toxicity tree 38 Drug-induced pathological processes Folders from MeSH Folders created at GeneGo based on reviews Мы создали иерархическую структуру патологий – дерево токсичности. Эта работа выполенена, в основном, на основе анализа статей. Исходно, дерево болезней MeSH содержало folder Drug toxicity and subfolders Drug Erutions with its subfolders (with the exception of Sweet's Syndrome). Нет отдельного обзора, где перечислены все эти виды токсичности. Они встречаются в различных статьях и обзорах. К настоящему момнту мы собрали 38 видов органоспецифичных патологий, вызванных драгами.

27 Gene-Disease connections in public domain and GeneGo
OMIM Only genetic info (mutation, SNPs) No expression No protein activity, loc GENE MeSH Only citation with Diseases name. Low trust Only hierarchical structure disease tree Public domain does not have structured information about disease connectivity(by clinical classification) and causative relations withgenes and proteins GeneGo Сравнение публичных баз и GeneGo - связь генов и болезней Hierarchical strusture disease classification ,888 diseases Genes associated with diseases ,429 Cited articles , 792

28 Content. Cancer maps and networks. Breast Cancer: general scheme
Далее на каждом слайде будут детально разобраны основные процессы и задействованные в развитии ВС сигнальные каскады. На все нижеследующие нетворки промепплены описанные в литературе маркеры ВС - красные блямбы, чтобы детально отследить задействованость участников исследуемого процесса в ВС.

29 Angiogenesis in tumor growth
Для успешного роста опухоли крайне необходимо ее своевременное прорастание сосудами, поскольку в быстрорастущей опухоли центральные ее отделы испытывают недостаток кислорода – может наступать некроз тканей. В таких условиях главным образом активируется HIF-A фактор ответственный за реакцию на гипоксию – и активирующий ангиогенез через VEGF. Если опухоль растет быстрей чем сосуды, то наблюдается некротизирующий процесс и воспаление, через IL-1, 6, 18, что также через NF-kB приводит к синтезу VEGFs. A key angiogenesis activator is VEGF-A, which can act alone or via its basic receptors VEGFR-1 and VEGFR-2 and neuropilins as corecetors. Other members of VEGF family are also involved in the process. One of transcriptional activators of VEGF-A is HIF-1 that is responsible to induction of angiogenesis under hypoxia. Moreover, some other growth factors (FGFs, EGF, PDGF, PlGF, TGF-beta) and MMPs are involved in regulation of angiogenesis. Angiopoietins 1 and 2 act on later stages on angiogenesis via TIE receptors. Integrins are required for an intercellular adhesion that occurs during angiogenesis.

30 Fine metabolic differences between rodents, human
Unique genes Human Mouse, Rat Unique genes and orthologs catalyse one reaction 141 mouse genes 74 rat genes There is no human orthologs for Protein A Unique genes catalyze unique reactions 9 mouse genes 2 rat genes Orthologs catalyse different reactions 1 mouse gene 1 rat gene

31 Tools

32 Data analysis workflow in MetaDiscovery suit
PathwayEditor MetaLink MapEditor Custom interactions data: Y2H Pull-down Co-expression annotation Custom maps, networks, pathways ISIS DB Molecular bio data Structures sdf, MOL Metabolites HTS, HCS HTS, HCS MetaCore/MetaDrug platform Signature networks Diseases Drug response P-value scoring Ontologies: GO processes GeneGo processes Canonical pathways Metabolic networks Toxicities Cross-experiment comparison Time series Multi-patient cohorts Multiple logical operations Complete report Network alignment Multiple algorithms Sub-network queries This is the general schema for network analysis. High throughput data can be mapped directly on Metacore interaction tables and unique for the data file network will be built. Note that the network provides with highest possible resolution mapping of the data: at the level of individual proteins/genes and single-step physical protein-protein interactions and one step metabolic transformations. The networks can be also built using third party tools and compared withing Metacore. Upon generation, networks can be interpreted in the context of canonical pathways (in this case, Genego maps) and different process ontologies (for example, GO categories) and prioritized based on p-value and data saturation. Based on this analysis, new hypotheses can be generated for drug targets, siRNA perturbations and biomarkers Med. chemistry: Indications - Toxicities - Off-site effects Modeling software: CellDesigner Virtual Cell SBML, BioPax Biology: - Biomarkers - Pathway-based targets

33 curated interactions from the literature
MetaCore™ Platform Networks Building Tools Pathway editor Statistics for pathways, processes, networks Visualization Tools Data:m-arrays, SAGE, proteomics, siRNA, metabolites, custom interactions Logical operations module curated interactions from the literature Self-explanatory. Composition of Metacore Oracle Based Database

34 Signaling, regulation, metabolism, diseases
Pathways Integration Interactive, static maps 550 maps Signaling, regulation, metabolism, diseases Backbone of formalized “state of art” in the field Networks of protein interactions Dynamic; built “on-the-fly” Exploratory tool Build new pathways for genes of interest

35 Choose direction and checkpoints within network building page
From – histamine through – histamine H1 receptor to – Actin Выбираем направление сигнала: From – histamine, through – histamine H1 receptor, to – actin. Получили кратчайшие пути. Однако при этом не попали Rac1 и RhoA.

36 Non-significant bars become semi-transparent
False discovery rate filter i Threshold 0.01 Apply Non-significant bars become semi-transparent

37 DNA level data MetaDiscovery tools
Input data Illumina, Affy SNP arrays Epigenomics data ChIP/Chip, CGH, Resequencing Taq assays Analysis Correlation with clinical data Direct data analysis Correlation with molecular data MetaDiscovery tools Pharamacogenomics Disease biomarkers Applications

38 New customization modules
MapEditor: custom maps synchronized with MC/MD database Draw pathways maps from scratch Transform gene lists into networks into pathway maps Edit MetaCore’s canonical maps View and score your maps within the context of canonical maps Map experimental data on custom maps MetaLink: overlaying custom interactions Import custom interactions (Y2H, co-expression, pull-down, etc.) Visualize using GeneGo network building algorithms Score “unknown” proteins (high IP potential) based on relevance to “benchmark” networks built from MetaCore interactions PathwayEditor: annotation technology transfer, at the database level Custom annotation of interactions, compounds, diseases, metabolism in the framework of internal annotation system at GeneGo Use the annotation forms, workflows and QC system developed at GeneGo Novel objects are imported and integrated with pre-existing data in MetaCore

39 Additional Localizations can be added
Adding Localizations Additional Localizations can be added

40 Your NEW map is now an interactive part of MetaCore
Users can visualize their experimental data on the new map

41 Mapping interaction sets on networks
Resulting Direct Interactions network Pink interactions are from the uploaded links file Mouse over an interaction to see the uploaded weight value Blue interactions are in both the links file and the MetaCore database

42 Systems level analysis
A NEW paradigm! Many, many products MetaDrug Novel compounds Structure-based modeling QSARs models activity Dry lab predictions Pharmacophores QSAR models metabolism toxicity Present a structure as a GROUP of proteins Query KNOWLEDGE base: Diseases - Toxicities Pathways Processes Networks IND BIOLOGAL effects: Indication Toxicity Off target effects: = GROUPS OF PROTEINS! Score based on analysis Indication Toxicity Select best structures 3,400 disease related genes 500+ diseases w/genes 600 known drug targets 3,000 toxicity related genes 700 cellular processes 200,000 protein interactions Report BY and FOR end user = chemist = 1 protein at a time = Systems level effects = Systems level analysis

43 Integration with MDL DiscoveryGate
Search compound of interest or its metabolites in MDL databases *(requires access to Discovery Gate) Copyright GeneGo

44 Merge networks

45 Algorithms

46 Old and new ways to analyze data
Current way of analysis: all significance calculations done before mapping onto network Statistical procedures, thresholds of fold, p-value either in MC or 3rd party tools Full data tables Connect them on network by one way or another: Too many choices, no clear way to choose Sets of genes New way of analysis: significance calculations follow the mapping onto network Statistical procedures in MC based on concurrent analysis of expression profiles and connectivity Full data tables Apply to global network Sets of network modules

47 Samples are analyzed in pathway’s expression space
Calculating distance between samples in the pathway’s gene expression space. Hypothetical pathway is shown, consisting of three proteins A, B and C. Proteins are the products of corresponding genes. (A) Samples are represented as points in 3-dimensional space of gene expression. Note grouping of samples. (B) Representation of the pathway: corresponding gene expression (fold change compared to control) is represented by arrows. Sample 1 Sample 2 Sample 3 Sample 4 Gene 1 1 4 3 2 Gene 2 7 6 Gene 3 9 8 Gene 4 5

48 Network signatures for compounds effects
Mestranol Phenobarbital Differentially respond to drug treatments on the pathways. (A) Direct interaction network assembled from the genes extracted from the pathways distinguishing between mestranol and Phenobarbital treatments. Relative change in expression between treated and untreated rats are mapped (log-ratios, averaged over 5 repeats). Blue circles mark down-regulated genes, red circles mark up-regulated genes. (B) The network assembled from the genes extracted from the pathways different between Phenobarbital and Mestranol. Note that both comparisons contain significant numbers of negatively correlated genes. Tamoxifen Phenobarbital

49 Finding topologically significant nodes
Not topologically significant B C 4 out 6 under nodes regulated by B are differentially expressed: more than random share = significant Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event = not significant In reality algorithm also considers nodes beyond first-degree neighbors Differentially expressed genes Non-differentially expressed genes

50 Why JAK1 is significant in this dataset?
Regulation via JAK1 Feedback loops Uterus in labor vs. non labor Wayne State JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1 Topological significance helps to find important links in pathways that do not come up on HT screens

51 Regulation of lipid Metabolism
Topologically significant nodes revealed by the new algorithm Differentially expressed genes identified by microarray and confirmed by proteomic screen

52 Putting it all together: network activity inference
Identifying causal relation between putative input and output signals Tracking effects of molecular perturbation trough activation/inhibition cascades Predicted input Scoring intermediary nodes Experimental data Experimental data: terminate cascade Predicted target Experimental data: start cascade Inferred activity

53 Work in progress Finding Patterns of significance (based on one experiment): Significant neighborhoods Significant receptors (by underlying cascade) Significant transcription factors (by upstream cascade) Significant interaction types (by distribution of expression at terminals) Finding common and different pathway modules (based on multiple samples: Looking for “differential pathways” - modules that distinguish one group of samples from another Finding common motifs in a group of pathway modules Inferring patterns of network activity Identifying causal relation between putative input and output signals Tracking effects of molecular perturbation trough activation/inhibition cascades Looking into mutual gene-process information and Bayesian inference of significance If gene G occurs only in process P its up-/down-regulation is a significant evidence with respect to inferring P’s status If gene G occurs in many other processes in addition to P its up-/down-regulation is not a significant evidence with respect of inferring P’s status

54 Future products

55 MetaMiner Consortiums for 2007
Oncology (breast cancer, 4 other cancers) Metabolic diseases (diabetes II, obesity, metabolic syndrome) CNS and neurodegenerative diseases Immunological and autoimmune diseases

56 MetaMiner consortiums: Analytical platform for disease areas
Cancer relevant annotations, datatabases, Active cpds analysis creening Cancer consortium labs HTS, HCS MetaMiner (Oncology) platform Compounds scoring: Indications - Toxicities - Off-site effects Drug targets: Divergence hubs on networks; “Druggability” testing Pathways connectivity Biomarkers: Combination of different types - Expression - Secreted proteins - Metabolites Convergence hubs (core effectors) Data parsing, normalization Experimental data depository Maps for disease, processes, drug action Custom maps for projects Data analysis This is the general schema for network analysis. High throughput data can be mapped directly on Metacore interaction tables and unique for the data file network will be built. Note that the network provides with highest possible resolution mapping of the data: at the level of individual proteins/genes and single-step physical protein-protein interactions and one step metabolic transformations. The networks can be also built using third party tools and compared withing Metacore. Upon generation, networks can be interpreted in the context of canonical pathways (in this case, Genego maps) and different process ontologies (for example, GO categories) and prioritized based on p-value and data saturation. Based on this analysis, new hypotheses can be generated for drug targets, siRNA perturbations and biomarkers

57 MetaTox consortium. Functional descriptors
Mapping on descriptors Enrichment by category Pathways maps Toxicity, process maps Sub-networks, modules, nodes Predictive models Indexing & scoring by tox. category

Download ppt "Building and using protein interaction networks: industry perspective"

Similar presentations

Ads by Google