© 2005 by Genomatix Software GmbH Genomatix Microarray Evaluation for Gene Regulation Analysis Dr. Martin Seifert Genomatix Software GmbH Landsberger Strasse 6, D München
© 2005 by Genomatix Software GmbH Genomatix The general goal in microarray analysis Biological functionality is not directly evident from microarrays Classification / Diagnostics Metabolic pathways Regulatory networks Disease mechanisms Microarrays today ? Cell Microarray experiment
© 2005 by Genomatix Software GmbH Genomatix How to reach the general goal in microarray analysis? Methods for microarray data analysis Statistic analysis Cellular processes Literature analysis Sequence analysis (Genome annotation and promoter analysis) Genomatix knowledge transfer approach
© 2005 by Genomatix Software GmbH Genomatix Statistical analysis; clustering What is the biological functionality behind the chip data? PDGF stimulation of fibroblasts (Demoulin et al. JBC 279, No. 34, 2004; 35392–35402) Microarray experiment Evaluation of the role of PDGF in fibroblasts A real life example Chip data Cluster Genomatix Evaluation of chip clusters PDGF Intro PDGF
© 2005 by Genomatix Software GmbH Genomatix Technology Linking genomic sequence analysis and literature mining Automatic evaluation of gene relationships Promoter source for functional promoter analysis Analysis of promoter sequences/ database scans
© 2005 by Genomatix Software GmbH Genomatix Analysis strategy 2 Project statistical clusters onto biology and categorization of results by z-scoring ( BiblioSphere ) 1 Find statistical clusters 3 Analyze functional groups for co-regulation ( ElDorado & GEMS ) and find additional potentially co-regulated genes ( ModelInspector ) 4 Carry out additional statistical analysis 5 Merge results into biological context Workflow of the project Analysis Strategy
© 2005 by Genomatix Software GmbH Genomatix Statistic analysis Cellular processes Literature analysis Sequence analysis Step 1: Statistical Analysis Methods for microarray data analysis
© 2005 by Genomatix Software GmbH Genomatix Cluster Analysis Significance Analysis for Microarrays (SAM; FDR: 4,3%) 105 of 9928 gene spots are significantly up regulated (Chip: Hver1.2.1) hours PDGF induction Statistical analyzed microarray data data
© 2005 by Genomatix Software GmbH Genomatix 2 Project statistical clusters onto biology and categorization of results by z-scoring ( BiblioSphere ) Biology 1 subtitle Workflow Statistic analysis Cellular processes Literature analysis Sequence analysis
© 2005 by Genomatix Software GmbH Genomatix cluster contains 107 genes Too many genes for biological meaningful co-regulation Strategy: knowledge driven sub-clustering Find functional correlations Gene Cluster BiblioSphere : Large Cluster Query Functional correlations are retrieved by categorization Characterisation of experimental cluster with BiblioSphere
© 2005 by Genomatix Software GmbH Genomatix Knowlege driven sub-clustering Ontology based functional ranking: Genomatix z-scoring highest z-score
© 2005 by Genomatix Software GmbH Genomatix Knowlege driven sub-clustering Ontology based functional ranking: Genomatix z-scoring retrieval of genes overrepresented in the GO-category sterol biosynthesis
© 2005 by Genomatix Software GmbH Genomatix BiblioSphere subgroup analysis: connecting TFs re-enter the six overrepresentd genes into BiblioSphere Gene group analysis
© 2005 by Genomatix Software GmbH Genomatix Towards regulatory networks: connecting TFs Knowlege driven sub-clustering Co-citation for HMGCS1, HMGCR, SC4MOL, DHCR7 with SREBF1 Bibliosphere on sentence level; at least 4 co-citations with input genes Prediction of SREBF1 (EBOX) binding sites in the promoters of HMGCS1, HMGCR and DHCR7 ElDorado
© 2005 by Genomatix Software GmbH Genomatix SREBP1 (=SREBF1) expression is experimentally confirmed Experimental verification
© 2005 by Genomatix Software GmbH Genomatix 3 Analyze functional groups for co-regulation ( Gene2promoter & GEMS ) and find additional potentially co-regulated genes ( ModelInspector ) Genomics subtitle Workflow Statistic analysis Cellular processes Literature analysis Sequence analysis
© 2005 by Genomatix Software GmbH Genomatix Sequence analysis Promoter analysis by GEMS based on ElDorado data Results from literature analysis are used to guide sequence analysis Literature analysisPromoter analysis GEMS ElDorado + Gene2Promoter
© 2005 by Genomatix Software GmbH Genomatix human mouse rat Comparative genomics of promoters -> phylogenetic conservation Comparative analysis of promoters within one species -> co-regulation Sequence analysis Analysis strategies: Inter-genomic and intra-genomic 107 genes 6 genes sterol synthesis DHCR24 DHCR7 EBP HMGCR HMGCS1 SC4MOL
© 2005 by Genomatix Software GmbH Genomatix Intra-genomic approach Extraction of the promoters of DHCR24, DHCR7, EBP, HMGCR, HMGCS1, and SC4MOL ElDorado + Gene2Promoter Analysis of the promoters of DHCR24, DHCR7, EBP, HMGCR, HMGCS1, and SC4MOL with FrameWorker GEMS Comparative promoter analysis (intra-genomic co-regulation) Frameworks underly functional conservation of promoters
© 2005 by Genomatix Software GmbH Genomatix Regulatory genome annotation Promoter resource ElDorado / Gene2Promoter ElDorado Alternative promoters/ transcripts Interconnected to: BiblioSphere GEMS Regulatory SNPs Regulatory regions promoter Promoter modules
© 2005 by Genomatix Software GmbH Genomatix Regulatory genome annotation Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH Genomatix Regulatory genome annotation Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH Genomatix Regulatory genome annotation Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH Genomatix Analysis of promoter organization Promoter analysis with FrameWorker
© 2005 by Genomatix Software GmbH Genomatix EBOX ECAT ZBPF Genes sharing framework: DHCR7, EBP, HMGCS1 EBOX (SREBF1) frameworks are found in a subset of the genes Analysis of promoter organization Frameworks are conserved in order and distance of TFBSs
© 2005 by Genomatix Software GmbH Genomatix EBOX ECAT ZBPF EBOX (SREBF1) frameworks are found in a subset of the genes Analysis of promoter organization EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH Genomatix ModelInspector search Beyond the microarray EBOX ECAT ZBPF framework Genomatix Human promoter database GPD
© 2005 by Genomatix Software GmbH Genomatix Framework# of hits in human promoters steroid biosynthesis z-score EBOX-ECAT-ZBPF ModelInspector results Results of database search highly selective model no Additional found genes for steroid metabolism so fare... The selectivity is reduced by modification of the model by increasing of the distance variability (application of FastM)
© 2005 by Genomatix Software GmbH Genomatix modification of the model with FastM Model modification distance variability is increased to bp
© 2005 by Genomatix Software GmbH Genomatix additional ModelInspector search Beyond the microarray EBOX ECAT ZBPF framework with modified distance variability Genomatix Human promoter database GPD
© 2005 by Genomatix Software GmbH Genomatix ModelInspector results Results of database search Additional found genes related to steroid metabolism: LSS, MVK, SC5DL, SREBF2 Possibility to re-evaluate statistical results Framework# of hits in human promoters four categories related to “steroid metabolism” z-score EBOX-ECAT-ZBPF LSS and MVK are present on chip, up-regulated but not statistically significant SC5DL, is not present on microarray
© 2005 by Genomatix Software GmbH Genomatix Additional framework analysis All sterol-metabolism related genes identified by microarray analysis, and Modelinspector are included: HMGCS1, MVK, SC5DL, DHCR7, EBP, SREBF2, LSS, HMGCR, SC4MOL, DHCR24 ECAT EGRF ZBPF Re-analysis of promoter organization A additional framework consisting of three TFBSs found It matches 8 of 10 genes input genes: HMGCS1, DHCR7, HMGCR, EBP, LSS; MVK, SC5DL, SREBF2
© 2005 by Genomatix Software GmbH Genomatix Second framework is searched in human promoters by ModelInspector Is the framework also part of other human Promoters? ECAT EGRF ZBPF Genomatix Human promoter database GPD Several frameworks may be important for sterol-related pathways/networks Matches may overlap with first framework but are basically distinct Beyond the microarray
© 2005 by Genomatix Software GmbH Genomatix CYP46A1, FDPS, HMGCR, HSD17B8, OPRS1, SREBF1!, STARD5 ModelInspector results Results of second database search SREBF1/2 are potential regulators of the previous framework! SREBF1/2 may be mediators between the two frameworks identified so far Framework# of hits in human promoters four categories related to “steroid metabolism” z-score EBOX-ECAT-ZBPF
© 2005 by Genomatix Software GmbH Genomatix 4 Carry out additional statistical analysis Statistics2 subtitle Workflow Statistic analysis Cellular processes Literature analysis Sequence analysis
© 2005 by Genomatix Software GmbH Genomatix Expression cluster is extended by Pavlidid Template Matching (PTM) Cluster of 105 significant regulated genes is taken as template The threshold p-value is 0.1 Cluster is extended to 798 genes (including all 105 initial genes) Relaxed statistics requires cross-validation by second evidence Clustering by profile of the initially selected 105 genes Relaxed statistical approach Initial profile Profile cluster
© 2005 by Genomatix Software GmbH Genomatix 5 Merge results into biological context Biology 2 subtitle Workflow Statistic analysis Cellular processes Literature analysis Sequence analysis
© 2005 by Genomatix Software GmbH Genomatix Comparison of ModelInspector results with profile cluster 52 genes share a common framework and are co-expressed 8 genes belong to the GO-category "steroid biosynthesis": DHCR24, DHCR7, EBP, HMGCR, HMGCS1, LSS, MVK, SC4MOL Eight genes are associated with steroid metabolism are supported by three lines of evidence: 1.Common up-regulation 2.Common framework 3.Common functional class (GO-annotation) Merging profile and database searches
© 2005 by Genomatix Software GmbH Genomatix Sterol biosynthesis and regulatory networks ECAT EGRF ZBPF EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH Genomatix Confirmation of results by GNF tissue profiles Example: profile of HMGCS1 Find correlates with cut-off 0.6
© 2005 by Genomatix Software GmbH Genomatix Sterol biosynthesis and regulatory networks ECAT EGRF ZBPF EBOX ECAT ZBPF GNF profile
© 2005 by Genomatix Software GmbH Genomatix Additional gene group: Tubulins CDEF EGRF MAZF
© 2005 by Genomatix Software GmbH Genomatix Sterol biosynthesis / cell structure proteins and regulatory networks ECAT EGRF ZBPF EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH Genomatix However, the final focus usually is on a few genes (30 or less usually) Genomatix technology elucidates the biology behind the chip data! No individual method can reveal networks and pathway mechanisms An alternating combinatorial approach can achieve this Evaluation of microarray data Conclusions Several independent functional groups may be derived from one chip PDGF conclusions All of this is possible based on available tools
© 2005 by Genomatix Software GmbH Genomatix Let’s have a break…