Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characterizing Gene Functional Expression Profiles Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology.

Similar presentations


Presentation on theme: "Characterizing Gene Functional Expression Profiles Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology."— Presentation transcript:

1 Characterizing Gene Functional Expression Profiles Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology Center, Temple University

2 Outline 1. Microarray Data Analysis Process 2. Functional Expression Profile Analysis  Functional Expression Profile Ranking  Functional Expression Profile Clustering 3. Functional Characterization of  Plasmodium Falciparum,  Saccharomyces Cerevisiae,  Mus Musculus and  Homo Sapiens

3 What is a DNA Microarray? DNA microarray technology allows measuring expressions for tens of thousands of genes at a time Analysis of Replicated Experiments Gordon Smyth, Walter and Eliza Hall Institute

4 Scanning/Signal Detection equal expression higher expression in Cy3 higher expression in Cy5 Cy3 channelCy5 channel

5 Microarray Data Analysis Process 1. Designing gene expression experiments 2. Image processing and analysis 3. Preprocessing raw intensity data 4. Discovering differentially expressed genes 5. Advanced analysis  Finding relevant pathways  Discovering gene expression patterns  Understanding gene functions More information: www.ist.temple.edu/research/biocore.html

6 Designing Gene Expression Experiments A saturated design reference design loop design Design experiment Comparative designing http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp

7 Image Processing and Analysis (figure is obtained using Imagene software)

8 Preprocessing Raw Intensity Data normalize Analysis of Replicated Experiments Gordon Smyth, Walter and Eliza Hall Institute

9 Discovering Differentially Expressed Genes Fold change (log ratio)Fold change (log ratio) Statistics methodsStatistics methods 1)T-test 1)T-test 2)ANOVA 2)ANOVA 3)Non-parametric analysis 3)Non-parametric analysis Wilcoxon Rank-Sum Test Wilcoxon Rank-Sum Test

10 Advanced Analysis: Finding Relevant Pathways (figure is obtained using Ingenuity software)

11 Advanced Analysis: Discovering Gene Expression Patterns Plasmodium Falciparum intraerythrocytic developmental cycle Genes are sorted based on expression time peaks Bozdech Z et al., PLoS Biol. 2003 Oct;1(1))

12 Advanced Analysis: Identifying Unknown Gene Functions Based on Expression Profiles Gene 1 expression profile with function A Gene 2 expression profile with function B Unknown sequence Tag Functions ? Unknown sequence has high correlation With gene 1 expression profile Sequence Tag has function A Is this alignment reliable ? Standard practice: Basic Assumption: Expression profiles of functionally related genes are correlated Objectives: Confirm a specific biological hypothesis; predict functional properties of less characterized genes; or uncover new/unexpected biological knowledge Methodology: clustering genes based on similarity of their expression profiles; followed by functional analysis of the obtained clusters

13 Problems with old approaches Genes with same function do not necessarily have the same expression profiles Clustering on all genes expression profiles could be unreliable

14 Our Approach: Analyzing Microarray Functional Expression Profiles (FEP) FEPs: Compute FEP as the average profile of all genes associated with a given highly correlated GO term Advanced Analysis: Identifying Unknown Gene Functions Based on Expression Profiles GO:0016311 : Dephosphorylation GO:0004721 : phosphoprotein phosphatase activity

15 Questions that we address: How to perform functional analysis in an objective manner How to estimate biological significance of discovers

16 Tools and Applications Developed tools to identify: (1) Explore which functions have the conserved expression profiles (Tool 1: f unctional expression profile ranking package) (2) Explore which functions have similar expression profiles and test of their functional similarity (Tool 2: f unctional expression profile clustering package ) Applications: Functional characterization of gene expression related to Intraerythrocytic Developmental Cycle of Plasmodium Falciparum, Saccharomyces Cerevisiae, Mus Musculus and Home Sapiens

17 Tools Architecture Microarray raw data Functional expression profile ranking Functional expression profile clustering Gene Function Semantic Distance Mapping Space List of significantly correlated GO terms Data pre- processing Report Gene function annotation database Clusters of functional Expression profiles

18 Tool 1: Functional Expression Profile (FEP) Ranking Package Objective: Identify genes with same function having correlated expression profiles Task: Evaluate gene expression correlation within each FEP Methodology Step 1: calculate average pairwise correlation coefficient S among n gene expression profiles for a given function term Step 2: randomly select n genes from the whole dataset and compute average pairwise correlation coefficient S’ Step 3: repeated Step 2 m times (m>10,000) and compare the distribution S’ to the original S to evaluate p-value

19 Dataset 1: Plasmodium Falciparum Intraerythrocytic Developmental Cycle (Bozdech Z et al., (2003) PLoS Biol. Oct; 1(1)) Objective: Identification of P.falciparum genes whose RNA levels vary periodically within the asexual intraerythrocytic developmental cycle (IDC) transcriptom Materials: 5080 ORFs, 3532 unique genes, 46 assays (sampled in time) using cDNAs Methods: Permutation test with Fast Fourier Transform alg. and correlations Found: 60% of genes transcriptionally active and most genes only active once during the IDC Figure: Major morphological stages during the IDC and 2712 genes’ transcriptional profiles

20 Dataset 2: Saccharomyces Cerevisiae Cell Cycle ( Spellman et al., (1998) Molecular Biology of the Cell 9, 3273-3297) Objective: Identification of yeast genes whose RNA levels vary periodically within cell cycle process Materials: 6178 ORFs, 4450 unique genes, 77 assays (sampled in time) using cDNAs Methods: Periodicity and correlation algorithm Found: Identified 800 genes that meet an objective minimum criterion for cell cycle regulation Figure : The M/G1 clusters

21 Dataset 3: Homo Sapiens Cell Cycle (R.Cho, et al (2001) Nature, 27) Objective: Identification of human genes whose RNA levels vary periodically within cell cycle process Materials: 6800 ORFs, 5795 unique genes, 14 assays (sampled in time) Using affymatrix arrays Methods: Fold change Found: 700 genes that display transcriptional fluctuation with a periodicity consistent with that of the cell cycle Figure: Clustering analysis of cell- cycle–regulated transcripts

22 DataSet 4: Mus Musculus Cell Cycle (Ishida, S et al (2001) Mol. Cell. Biol. 21, 4684-4699 ) Objective: Analysis of gene regulation during the mammalian cell cycle Materials: 6347 unique genes, 14 assays Methods: Clustering Found: Identified 7 distinct clusters of genes that exhibit unique patterns of expression Figure: Patterns of gene expression following growth stimulation and during the mammalian cell cycle

23 Applying FEP Ranking Package: Cumulative Distributions of GO Term p- Values of Human, Yeast, Mouse and P.F.

24 Applying FEP Ranking Package: GO Terms with the Most Conserved FEP Among Multi-organisms

25 Applying FEP Ranking Package: Selection of GO Terms with Significantly Correlated Expression Patterns at Plasmodium Falciparum Developmental Cycle Data Cumulative distribution of p-values for GO terms Cumulative distribution of p-values for GO terms associated with at least two genes GO:0016311 : Dephosphorylation GO: 0007028: cytoplasm Organization and biosynthesis 46% functions of all function GO terms are significantly correlated 52% processes of all process GO terms are significantly correlated Selected:

26 Plasmodium Falciparum: Processes and Functions with the Highest/Lowest Correlation Functions Biological Processes acid phosphatase activity zinc ion transport calmodulin-dependent protein kinase I activity terpene metabol triose-phosphate isomerase activity protein processing guanylate kinase activity DNA replication, synthesis of RNA primer glutamate-cysteine ligase activity cell invasion Functions Biological Processes translation regulator activity terpene biosynthesis cell surface antigen activity, host-interacting pigment biosynthesis peptide binding tetrahydrobiopterin metabolism 5,10-methylenetetra- hydrofolate-dependent methyltransferase activity coenzyme and prosthetic group biosynthesis cyclophilin-type peptidyl- prolyl cis-trans isomerase activity purine ribonucleoside biosynthesis Highest correlationLowest correlation

27 Plasmodium Falciparum: Findings by FEP Ranking Package Of 12 FEPs referenced by Bozdech et al, two have p- value larger than 0.05. E.g. the average correlation coefficient among genes associated with Robonucleotide Synthesis function is only 0.258 (p-value = 0.11) which weakens the claim that is related to the Ring stage of IDC. No linear relationship were found between number of genes associated with a given GO term and average correlation coefficient among these genes Ranking of GO terms based on p-value could be useful in rapid identification of functions that are closely related with a specific developmental stage (of Plasmodium Falciparum)

28 All Datasets: Findings by FEP Ranking Package To some extent genes with identical functions have similar expression profiles However, a large fraction of functions do not follow the underlying hypothesis! Higher level organisms seem to have lower fraction of significantly correlated expression profiles for identical functions. Fractions of correlated FEPs: Saccharomyces Cerevisiae: 59% (643/1,083)* Plasmodium Falciparum: 48.4% (428/ 884) Homo Sapiens: 16.4% (249/1514) Mus musculus: 13.3% (182/1366) *fractions are for both processes and functions

29 Tool 2: FEP Clustering Package Objective: Identifying genes with similar functions and similar expression profiles Tasks : Cluster FEPs selected by FEP ranking package Evaluate found clusters for biological relevance by Identifying similar functions based on GO term hierarchy tree structure Evaluating inter-cluster GO term distance Methodology Randomly generate k sets each containing same number of GO terms as the corresponding cluster Calculate total GO term distance within each generated set and sum total distance of all sets to get the overall score S’ Repeat the procedure 1000 times and compare the distribution S’ to the overall distance obtained through clustering

30 Structure of GO Term Tree (Example) GO:0008150 : Biological Process GO:0007275 : developmentGO:0007582 : physiological process GO:0007389 : pattern specification GO:0000003 : reproduction GO:0008152 : metabolism GO:0009798 : axis specification Level 3 Level 2 Level 1 Level 5 GO:0009948 : anterior/posterior axis specification Level 4 Measuring Distance of GO Terms -- length of the minimal chain between X and Y terms in GO tree -- is length of maximal chain from the top to the bottom

31 Determination of Number of Clusters Measured Larger z-score indicates a better grouping of functions within clusters.

32 Number of Clusters vs Z-score: Results for Plasmodium Falciparum Plasmodium Falciparum biological processes number of clusters vs z-scores Plasmodium Falciparum molecular function number of clusters vs z-scores

33 Applying FEP Clustering Package: Results on Plasmodium Falciparum Processes Cluster vs Stage of IDC k-mean clustering profiles of FEPs for 238 identified processes Cluster index Number of EPS Corresponding Stage 178 Trophozoite 280 Schizont 350 Ring 420 Schizont-Early Ring 12 34

34 Applying FEP Clustering Package: Results on Plasmodium Falciparum Functions Cluster vs stage of IDC 12 34 k-means clustering profiles of FEPs for 199 identified molecular functions ClusterNumber of FEPs Corresponding Stage 148 Trophozoite 263 Schizont 353 Ring 435 Schizont-Early Ring

35 GO Trees of Functions: 4 Clusters of Plasmodium Falciparum

36 Statistical Evaluation: Fund vs. Random Clusters for P. Falciparum found clusters Molecular Functions Biological Processes larger distance from found cluster to random clusters for biological processes. random clusters for biological processes have smaller variance

37 Statistical Evaluation: Clustering All GO Terms for P. Falciparum Clustering all GO terms will lead to smaller z- score which means that we have worse quality clusters Right figure is P.F. functional clustering result. Z-score is 8.5 compared to 12 for clustering correlated GO terms only found clusters

38 Statistical Evaluation: Found vs. Random Clusters at S. Cerevisiae and Homo Sapiens Yeast Processes Yeast functions Human Processes Human functions found clusters

39 Remarks Statistical significance of identified clusters (separation between clusters and random groupings) is increased by Normalizing data ( Plasmodium Falciparum) Eliminating noise through singular vector decomposition (SVD) Reducing data through Principle Components Analysis (10 5 ) Function clusters distance Process clusters distance Normalized1.3161.089 Without Normalization 1.3681.213

40 Conclusions Proposed microarray tools help identifying genes with same function and correlated expression profiles genes with similar functions have similar expression profiles Measuring GO tree based distance was useful for evaluating biological relevance of clusters; however, many GO terms have only 1 associated gene many genes do not even have a GO term parenthood and siblings in GO trees should be differentiated, but there should be a smaller penalty for siblings relationship compared to parenthood More robust clustering methods could be used

41 Thank You ! More information: www.ist.temple.edu/research/biocore.html Contact: Zoran Obradovic, director IST Center, Temple University 215 204-6265 zoran@ist.temple.edu


Download ppt "Characterizing Gene Functional Expression Profiles Zoran Obradovic Slobodan Vucetic Hongbo Xie, Hao Sun, Pooja Hedge Information Science and Technology."

Similar presentations


Ads by Google