Presentation on theme: "Building and using protein interaction networks: industry perspective"— Presentation transcript:
1 Building and using protein interaction networks: industry perspective Systems Biology for Drug DiscoveryAndrej BugrimGeneGo, Inc.
2 TopicsAnnotation process and collecting network content for idustrial-type applicationsBiological and disease ontologies – how to improve and use them in functional analysisTools: utilizing network data in pharmaceutical R&D
3 Multi-level understanding of human biology CausativerelationsMechanisticLevelofphenotypeLevel ofCell process/networkLevel ofprotein
4 Disease-centered knowledge base in MetaMiner (Oncology example) GG annotation teamCausative disease associations:DNA, RNA, protein levelsDisease groupProtein-protein; Protein-DNA;protein-RNA interactionsNetwork groupBiomarkersSpecialty groupLigand-receptorinteractions: drugs,leads, hitsChemistry groupCompareBC-perturbed cell processesCausative BC modelsGeneral BC schemaOther cancers chosen by Consortium
6 Three interactions domains in MetaCore 1,600 drugs w/targets4,100 endogenous metabolites>21,000 ligand-receptor interactions850 GPCRs and other membrane receptors110 Nuclear hormone receptorsLigands: metabolites, peptides, xenoboiticsMembrane receptorsSignal transduction:G proteins,Secondary messengersKinasesPhosphotases172K manually curated physical signaling interactions538 canonical maps42, step canonical signal transduction pathways924 Human transcription factors6,000 target genesTranscription factorsOnly MC has all 3 levels needed for functional recinstruction: ligand-receptor (primary signaling) level; signal transduction level; effector level (core metabolism).DONE11,300 metabolic reactions116 Fine metabolic mapsCore effect: metabolic pathwaysMetabolites4,100 endogenous metabolites
7 MetaBase Content Overview DatabaseChemical compounds 580,000Drugs 8,590Chemical Reactions 35,600Metabolic networks 251NetworkProteins + genes 13,402Transcription factors 924Chemical compounds 26,000Drugs 2,740Endogenous compounds 4,100Proteins linked to drugs 2,711Reactions 5,330Small molecule ligands for human receptors 3,510blockers for ion channels 629Pubmed journals 3,100Pubmed articles 81,400Total amount of interactions 177,000ContentGeneGo regulatory networks 120GeneGo disease networksMapsRegulatory maps 325Metabolic maps 116Traditional metabolic maps (EC) 97Diseases 4,920DONE (VK, )Will work on it, provide histograms
8 MetaBase content by type DatabaseGenes(human:38,700)Total:137,500Chemicalcompounds580,000Humanproteins14,570Metabolicreactions35,600DONE
9 Network interactions Manually curated interactions (172,787) Signalling interactions;137,297; 79%Metabolic reactions; 35,490;21%Y2H "Interactome"; 2,370; 1%Logical relations; 1,934; 1%Protein-protein; 87,675;51%Small molecule-protein;42,383; 26%With MicroRNA; 1,620; 1%With virus protiens; 335; 0%Chip-Chip; 980; 1%DONECounted collapsed interactions only, uncollapsed are over 650,000!All interactions taken from articles indexed in PubmedPubmed journals 3,100Pubmed articles 81,400
10 Type of interactions in network EffectsactivationinhibitionunspecifiedDirect interactionIndirect interactionMechanismphosphorylationinfluence on expressiondephosphorylationunspecifiedother type of covalent modificationbindingtransportcleavagetranscription regulationtransformationcatalysiscompetition1. We split all protein-protein and protein-compound (chem.) interactions on two main types: direct and indirect (functional)2. Within both types of interactions, we separate distinct mechanisms which are defined by experimental evidence for interactions. There are 10 mechanisms for direct interactions and two mechanisms for indirect interactions.All interactions, direct and indirect, are split on directed and undirected ones and are marked with “effects”3.1 Directed interactions are characterized by causative relationships between network objects (proteins, genes, compounds, complexes etc.).3.2 Undirected interactions are marked with effect-mechanism
15 Endogenous compounds (4,100 total) 3,070 endogenous compounds involved in metabolic reactions: 6,819 reactions with endogenous compounds only751 endogenous ligand for 498 receptors with 2,455 interactions4000 (98%) of endogenous compounds in network15,962 network interactions with endogenous metabolites3,600 compounds with structures and brutto-formulas (other 700 are “generic”: contain acyl-, alkyl- and other variable groups)DONENote: 4,000 endogenous metabolic compounds: this is 4x more then Biocrates, the leader in human metabolomics has!!!
16 Network and pathway statistics in GeneGO >40,000 nodes;~177,000 edges;Average node degree: 3,77;241 million shortest pathways;Average shortest pathway length: ;42, step canonical signal transduction pathways;200 canonical metabolic pathways- major metabolic fluxes like glycolysis or TCA;72,000 pathways on metabolic maps: pathways analogous to KEGG (KEGG has 42,500)DONEEnzyme1Enzyme2reaction1reaction2metabolite
19 Knowledge base (ontologies) By year:20072006200520042003By genre:DramaActionRomanceHorrorForeignBy director:LynchTarantinoLeoneStoneAntonioniBy actor:PittNicholsonDeppRedfordDamonHow do you compare “action” movies vs. Tarantino movies vs movies?These are incomparable as these are different categoriesMolecular pthwyCellular processDiseaseMetabolic processMixed ontologies
20 Multiple ontologies in MetaDiscovery Platform: multi-dimensional knowledge base on human biology
21 GeneGo metabolic maps vs. KEGG human maps Human genes in GG and KEGGDONE, VK
22 Enrichment in GO and GeneGo processes GeneGo process networksResolution: interactions between proteinsConnections between all proteins in folderClear signaling path, effect within processResolution: list of proteinsNo connections between proteinsNo sgnaling/effect within process4 samples from 4 patiensDisease/norm from same patientsAffy U133A arrays
23 231 613 446 268 345 1642 1297 Inflammation Genes from GO process “Inflammatory response”231Genes from GO-processes“Inflammatory response”“Immune response”613Genes from GO-process“Immune response”446Not in networks79Not in networks199In networks152In networks247Not in networks268In networks345Синие блоки – исходные статистические данныеЗелёные блоки – объекты GО-процессов, которые привязали к созданным нетворкамКрасные блоки – объекты GО-процессов, которые пока не привязали к созданным нетворкамЖелтые блоки – объекты, добавленные по смыслу представленных в нетворках процессов, согласно литературным даннымGenes in 15 process networks1642Genes added to networks1297
24 21,264 unique articles, indexed in PubMed Diseases4,881 Diseases, based on MeSH38,709 Human genes totalHuman genes linked to diseases – 6,318Diseases linked to genes – 1,630Human genes not linked to diseases – 32,391Diseases with no gene links – 3,251DONE6,318 genes are linked to 1,630 diseases21,264 unique articles, indexed in PubMed
26 Folders created at GeneGo based on reviews Drug toxicity tree38 Drug-induced pathological processesFolders from MeSHFolders created at GeneGo based on reviewsМы создали иерархическую структуру патологий – дерево токсичности. Эта работа выполенена, в основном, на основе анализа статей. Исходно, дерево болезней MeSH содержало folder Drug toxicity and subfolders Drug Erutions with its subfolders (with the exception of Sweet's Syndrome).Нет отдельного обзора, где перечислены все эти виды токсичности. Они встречаются в различных статьях и обзорах.К настоящему момнту мы собрали 38 видов органоспецифичных патологий, вызванных драгами.
27 Gene-Disease connections in public domain and GeneGo OMIMOnly genetic info (mutation, SNPs)No expressionNo protein activity, locGENEMeSHOnly citation withDiseases name. Low trustOnlyhierarchical structuredisease treePublic domain does not have structured informationabout disease connectivity(by clinical classification)and causative relations withgenes and proteinsGeneGoСравнение публичных баз и GeneGo - связь генов и болезнейHierarchical strusturedisease classification ,888 diseasesGenes associated with diseases ,429Cited articles , 792
28 Content. Cancer maps and networks. Breast Cancer: general scheme Далее на каждом слайде будут детально разобраны основные процессы и задействованные в развитии ВС сигнальные каскады.На все нижеследующие нетворки промепплены описанные в литературе маркеры ВС - красные блямбы, чтобы детально отследить задействованость участников исследуемого процесса в ВС.
29 Angiogenesis in tumor growth Для успешного роста опухоли крайне необходимо ее своевременное прорастание сосудами, поскольку в быстрорастущей опухоли центральные ее отделы испытывают недостаток кислорода – может наступать некроз тканей. В таких условиях главным образом активируется HIF-A фактор ответственный за реакцию на гипоксию – и активирующий ангиогенез через VEGF. Если опухоль растет быстрей чем сосуды, то наблюдается некротизирующий процесс и воспаление, через IL-1, 6, 18, что также через NF-kB приводит к синтезу VEGFs.A key angiogenesis activator is VEGF-A, which can act alone or via its basic receptors VEGFR-1 and VEGFR-2 and neuropilins as corecetors. Other members of VEGF family are also involved in the process. One of transcriptional activators of VEGF-A is HIF-1 that is responsible to induction of angiogenesis under hypoxia. Moreover, some other growth factors (FGFs, EGF, PDGF, PlGF, TGF-beta) and MMPs are involved in regulation of angiogenesis. Angiopoietins 1 and 2 act on later stages on angiogenesis via TIE receptors. Integrins are required for an intercellular adhesion that occurs during angiogenesis.
30 Fine metabolic differences between rodents, human Unique genesHumanMouse, RatUnique genes and orthologs catalyse one reaction141 mouse genes74 rat genesThere is no human orthologsfor Protein AUnique genes catalyze unique reactions9 mouse genes2 rat genesOrthologs catalyse different reactions1 mouse gene1 rat gene
32 Data analysis workflow in MetaDiscovery suit PathwayEditorMetaLinkMapEditorCustom interactions data:Y2HPull-downCo-expressionannotationCustom maps,networks, pathwaysISIS DBMolecular bio dataStructuressdf, MOLMetabolitesHTS, HCSHTS, HCSMetaCore/MetaDrug platformSignature networksDiseasesDrug responseP-value scoringOntologies:GO processesGeneGo processesCanonical pathwaysMetabolic networksToxicitiesCross-experiment comparisonTime seriesMulti-patient cohortsMultiple logical operationsComplete reportNetwork alignmentMultiple algorithmsSub-network queriesThis is the general schema for network analysis. High throughput data can be mapped directly on Metacore interaction tables and unique for the data file network will be built. Note that the network provides with highest possible resolution mapping of the data: at the level of individual proteins/genes and single-step physical protein-protein interactions and one step metabolic transformations.The networks can be also built using third party tools and compared withing Metacore.Upon generation, networks can be interpreted in the context of canonical pathways (in this case, Genego maps) and different process ontologies (for example, GO categories) and prioritized based on p-value and data saturation. Based on this analysis, new hypotheses can be generated for drug targets, siRNA perturbations and biomarkersMed. chemistry:Indications- Toxicities- Off-site effectsModeling software:CellDesignerVirtual CellSBML, BioPaxBiology:- Biomarkers- Pathway-based targets
33 curated interactions from the literature MetaCore™ PlatformNetworksBuilding ToolsPathway editorStatistics for pathways,processes, networksVisualizationToolsData:m-arrays, SAGE, proteomics,siRNA, metabolites, custom interactionsLogical operations modulecurated interactions from the literatureSelf-explanatory. Composition of MetacoreOracle Based Database
34 Signaling, regulation, metabolism, diseases Pathways IntegrationInteractive, static maps550 mapsSignaling, regulation, metabolism, diseasesBackbone of formalized “state of art” in the fieldNetworks of protein interactionsDynamic; built “on-the-fly”Exploratory toolBuild new pathways for genes of interest
35 Choose direction and checkpoints within network building page From – histaminethrough – histamine H1 receptorto – ActinВыбираем направление сигнала:From – histamine, through – histamine H1 receptor, to – actin.Получили кратчайшие пути. Однако при этом не попали Rac1 и RhoA.
36 Non-significant bars become semi-transparent False discovery rate filteriThreshold0.01ApplyNon-significant bars become semi-transparent
37 DNA level data MetaDiscovery tools Input dataIllumina, Affy SNP arraysEpigenomics dataChIP/Chip, CGH,ResequencingTaq assaysAnalysisCorrelation with clinical dataDirect data analysisCorrelation with molecular dataMetaDiscovery toolsPharamacogenomicsDisease biomarkersApplications
38 New customization modules MapEditor: custom maps synchronized with MC/MD databaseDraw pathways maps from scratchTransform gene lists into networks into pathway mapsEdit MetaCore’s canonical mapsView and score your maps within the context of canonical mapsMap experimental data on custom mapsMetaLink: overlaying custom interactionsImport custom interactions (Y2H, co-expression, pull-down, etc.)Visualize using GeneGo network building algorithmsScore “unknown” proteins (high IP potential) based on relevance to “benchmark” networks built from MetaCore interactionsPathwayEditor: annotation technology transfer, at the database levelCustom annotation of interactions, compounds, diseases, metabolism in the framework of internal annotation system at GeneGoUse the annotation forms, workflows and QC system developed at GeneGoNovel objects are imported and integrated with pre-existing data in MetaCore
39 Additional Localizations can be added Adding LocalizationsAdditional Localizations can be added
40 Your NEW map is now an interactive part of MetaCore Users can visualize their experimental data on the new map
41 Mapping interaction sets on networks Resulting Direct Interactions networkPink interactions are from the uploaded links fileMouse over an interaction to see the uploaded weight valueBlue interactions are in both the links file and the MetaCore database
42 Systems level analysis A NEW paradigm!Many, many productsMetaDrugNovel compoundsStructure-basedmodelingQSARs modelsactivityDry labpredictionsPharmacophoresQSAR modelsmetabolismtoxicityPresent a structureas a GROUP of proteinsQuery KNOWLEDGE base:Diseases- ToxicitiesPathwaysProcessesNetworksIND BIOLOGAL effects:IndicationToxicityOff target effects:=GROUPS OF PROTEINS!Score based on analysisIndicationToxicitySelect best structures3,400 disease related genes500+ diseases w/genes600 known drug targets3,000 toxicity related genes700 cellular processes200,000 protein interactionsReport BY and FORend user = chemist=1 protein at a time=Systems level effects=Systems level analysis
43 Integration with MDL DiscoveryGate Search compound of interest or its metabolites in MDL databases*(requires access to Discovery Gate)Copyright GeneGo
46 Old and new ways to analyze data Current way of analysis:all significance calculations done before mapping onto networkStatistical procedures,thresholds of fold, p-value either in MC or 3rd party toolsFull datatablesConnect them on network by one way or another:Too many choices, no clear way to chooseSets of genesNew way of analysis:significance calculations follow the mapping onto networkStatistical procedures in MC based on concurrent analysis of expression profiles and connectivityFull datatablesApply to global networkSets of network modules
47 Samples are analyzed in pathway’s expression space Calculating distance between samples in the pathway’s gene expression space. Hypothetical pathway is shown, consisting of three proteins A, B and C. Proteins are the products of corresponding genes. (A) Samples are represented as points in 3-dimensional space of gene expression. Note grouping of samples. (B) Representation of the pathway: corresponding gene expression (fold change compared to control) is represented by arrows.Sample 1Sample 2Sample 3Sample 4Gene 11432Gene 276Gene 398Gene 45
48 Network signatures for compounds effects MestranolPhenobarbitalDifferentially respond to drug treatments on the pathways. (A) Direct interaction network assembled from the genes extracted from the pathways distinguishing between mestranol and Phenobarbital treatments. Relative change in expression between treated and untreated rats are mapped (log-ratios, averaged over 5 repeats). Blue circles mark down-regulated genes, red circles mark up-regulated genes. (B) The network assembled from the genes extracted from the pathways different between Phenobarbital and Mestranol. Note that both comparisons contain significant numbers of negatively correlated genes.TamoxifenPhenobarbital
49 Finding topologically significant nodes Not topologically significantBC4 out 6 under nodes regulated by B are differentially expressed: more than random share = significantOnly 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event= not significantIn reality algorithm also considers nodes beyond first-degree neighborsDifferentially expressed genesNon-differentially expressed genes
50 Why JAK1 is significant in this dataset? Regulation via JAK1Feedback loopsUterus in labor vs. non labor Wayne StateJAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1Topological significance helps to find important links in pathways that do not come up on HT screens
51 Regulation of lipid Metabolism Topologically significant nodes revealed by the new algorithmDifferentially expressed genes identified by microarray and confirmed by proteomic screen
52 Putting it all together: network activity inference Identifying causal relation between putative input and output signalsTracking effects of molecular perturbation trough activation/inhibition cascadesPredicted inputScoring intermediary nodesExperimental dataExperimental data: terminate cascadePredicted targetExperimental data: start cascadeInferred activity
53 Work in progressFinding Patterns of significance (based on one experiment):Significant neighborhoodsSignificant receptors (by underlying cascade)Significant transcription factors (by upstream cascade)Significant interaction types (by distribution of expression at terminals)Finding common and different pathway modules (based on multiple samples:Looking for “differential pathways” - modules that distinguish one group of samples from anotherFinding common motifs in a group of pathway modulesInferring patterns of network activityIdentifying causal relation between putative input and output signalsTracking effects of molecular perturbation trough activation/inhibition cascadesLooking into mutual gene-process information and Bayesian inference of significanceIf gene G occurs only in process P its up-/down-regulation is a significant evidence with respect to inferring P’s statusIf gene G occurs in many other processes in addition to P its up-/down-regulation is not a significant evidence with respect of inferring P’s status
55 MetaMiner Consortiums for 2007 Oncology (breast cancer, 4 other cancers)Metabolic diseases (diabetes II, obesity, metabolic syndrome)CNS and neurodegenerative diseasesImmunological and autoimmune diseases
56 MetaMiner consortiums: Analytical platform for disease areas Cancer relevant annotations, datatabases,Active cpds analysis creeningCancer consortium labsHTS, HCSMetaMiner (Oncology) platformCompounds scoring:Indications- Toxicities- Off-site effectsDrug targets:Divergence hubs on networks;“Druggability” testingPathways connectivityBiomarkers:Combination of different types- Expression- Secreted proteins- MetabolitesConvergence hubs (core effectors)Data parsing, normalizationExperimental data depositoryMaps for disease, processes, drug actionCustom maps for projectsData analysisThis is the general schema for network analysis. High throughput data can be mapped directly on Metacore interaction tables and unique for the data file network will be built. Note that the network provides with highest possible resolution mapping of the data: at the level of individual proteins/genes and single-step physical protein-protein interactions and one step metabolic transformations.The networks can be also built using third party tools and compared withing Metacore.Upon generation, networks can be interpreted in the context of canonical pathways (in this case, Genego maps) and different process ontologies (for example, GO categories) and prioritized based on p-value and data saturation. Based on this analysis, new hypotheses can be generated for drug targets, siRNA perturbations and biomarkers
57 MetaTox consortium. Functional descriptors Mapping on descriptorsEnrichment by categoryPathways mapsToxicity, process mapsSub-networks, modules, nodesPredictive modelsIndexing & scoring by tox. category