Presentation on theme: "Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic."— Presentation transcript:
Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network G. C.Castellani, D.Remondini, N.Intrator, B. OConnell, JM Sedivy Centro L.Galvani Biofisica Bioinformatica e Biocomplessità Università Bologna and Physics Department Bologna Institute for Brain and Neural System Brown University Providence RI
Gene expression Network dynamics: from microarray data to gene-gene connectivity reconstruction. Reconstruction of c-MYC proto-oncogene regulated genetic network Gene significance Temporal structure Gene clustering Model validation
Complex Network Theory and its application to cellular networks Complex Network theory is a rapidly growing field of contemporary interdisciplinary research. The applications ranges from Mathematics to Physics to Biology. The classical mathematical theory has been developed (1957-1960) by Erdos and Reny : Random Graph. Some Physical problems that are related to this approach are: Percolation, Bose-Einstein Condensation and the Simon problem. Recent application to Biology are focused on Neural Network,Immune Network Protein Folding, Proteomic and Genomics mainly on the large scale organization of Biological Network
One of the most recent theories that has been shown to have promising applications in the Biological Sciences is the so called Theory of Complex Networks that have been applied to protein-protein interaction and to metabolic network (Jeong and Barabasi)
Classical Random Graphs A Random Graphs is a set of nodes and edges connecting them. The number of edges and their nodes attachment are chosen Randomly with a certain probability p. It has been demonstrated that there exists a critical probability p c for the appearance of a giant cluster (phase transition) p c ~N -1. Another Erdos Reny result is that the degree connectivity distribution (the number of edges of each node) follows the Poisson statistics
Extension to Random Graph Theory During the last years considerable efforts have been done to further analyze the statistics of Random Graphs. The major results are summarized by the so called Small World and Scale free graphs The Small World graphs interpolates between regular lattice and Random graphs. The Scale Free network are created by two simple rules: Network growth and Preferential Attachment (the most connected Nodes are the most probable sites of attachment) Both models gives a non Poisson degree distribution: Power Law Moreover, this type of distributions was observed in real networks such as Internet, C.Elegans Brain, Methabolic Network with 2< < 3 exponent and various values for the exponential cutoff k c and k 0
Inadeguacy of complete connectivity The complete connectivity as well the random connectivity are not biologically plausible. Connectivity changes as consequence to developemental changes (ie learning, ageing) appear most appropriate Comparison between experimental and theoretical resultson the number of virgin cells during the lifespan.The number of stable states (that we identify with memory capacity and with memory cells) increases as a function of age. We found similar results (increase of number of stable states by connectivity changes) also for the BCM model, but the biological interpretation is less clear
The John Sedivy Lab at Brown University has designed a new generation of microarrays that cover approximately one half of the whole rat genome (roughly 9000 genes). The array construction aims at obtaining a precise targeting of the proto-oncogene c-MYC. This gene encodes for a transcriptional regulator that is correlated with a wide array of human malignancies, cellular growth and cell cycle progression. The data base is organized in 81 array obtained by hybridisation with a cell line of rat fibroblats. These gene expression measurements were performed in triplicate for a better statistical significance. The complete data set is divided into three separate experiments; each of which addresses a specific problem;. Experiment 1: Comparison of different cell lines where c-myc is expressed at various degrees ( null, moderate, over-expressed). This experiment can reveal the total number of genes that respond to a sustained loss of c-Myc as well as those genes that respond to c-MYC over- expression. Experiment 2: Analysis of those cell-lines that over-express c-Myc following stimulation with Tamoxifen (a drug that has been used to treat both advanced and early stage breast cancer). This data was collected during a 16 our time course. This experiment reveals the kinetics of the response to Myc activation and may lead to the identification of the early- responding genes. Experiment 3: Analysis of the time course of induction with Tamoxifen when it was performed in the presence of Cycloheximide (a protein synthesis inhibitor). This experiment reveals a subset of direct transcriptional targets of c-Myc.
Our approach to the determination of the C-MYC regulated network can be summarized in 3 points 1) List of genes based on significance analysis over time points between MYC and control and within time point (between groups and within groups (time)). 2) Time translation matrix calculated on microarray treated with Tamoxifen and not treated - T and NT raw data The resulting time translation matrix will be used to reconstruct the connectivity matrix between genes 4) Model validation for determination of the error model
S 0 is an appropriate regularizing factor. Interesting genes are chosen as the union between the genes selected with the above methods With this SA we obtain 776 significative genes (p<0.05) if we require significance on 1 time point Significance Analysis
The selected genes are used for the step 2 of our analysis: The x(t) are the gene expressions at time t and A is the unknown matrix that we estimate from time course (0,2,4,8,16) of microarray data (T and NT separately, A n and A t ). This is a so called inverse problem because the matrix is recovered from time dependent data. -> From appropriate thresholding on As we can recover the connectivity matrix between the genes. Step 2: Linear Markov Model
Model validation The different models (data preprocessing, modeling of gene dynamics, clustering techniques) have been validated mathematically by means of - residues analysis (errors) The residual are small and we have used a Markov matrix that is not the original (computed over 5 time steps) but the validated one. We compute the matrix on 4 time step and the validation is on the subsequent by comparison with the real data.
Changing databases In order to have a better understanding of the results, both in terms of network topology and connectivity distribution, we generated 2 databases: 1) One small database with those genes that were without any doubt affected by Tamoxifen (50 genes) 2) One larger database with all the genes that give 2 P on 3 experiments i.e. those genes for which we have good measurements (3444 genes)
Results For each of the 50 genes, we computed the connectivity and the clustering coefficient that express if the gene is connected to highly connected or poorly connected genes. It is possible to see that the treatment with Tamoxifen causes a decrease in clustering in the network so it seems that the network becomes less scale free. This is confirmed by the network clustering coefficient: N Overall graph clustering coefficient: 0.840 T Overall graph clustering coefficient: 0.241 The individual connectivity and clustering changes are summarized in this table: TableTable
The 3444 genes database This large database is used in order to have a better statistics and possibly a distribution fit NT Clearly these distributions are not Poisson and seem to be Power law with exponential tail
Fitting the distributions We fitted the distribution with a generalized power-law : NT N T
Network Structure (3444 genes) N Overall graph clustering coefficient: 0.902 T Overall graph clustering coefficient: 0.893 From this results and from the fit parameters it seems that the N- Network is less scale free, but these results are strongly affected by noise We have looked at the individual connectivity and clustering coefficient, and their variation between N and T. The results are encouraging: between those genes that have changed their connectivity in a significant way there are C-MYC targets
Network Structure (3444 genes) As an example we report some connectivity change in C-Myc target genes 2379 rc_AI178135_at complement component 1, q subcomponent binding protein 3 272 2796 U09256_attransketolase 13 39 2772 U02553cds_s_atprotein tyrosine phosphatase, non-receptor type 16 133 146 390D10853_atphosphoribosyl pyrophosphate Amidotransferase 0 7 933M58040_attransferrin receptor 1 27
Conclusions We have tested the hyphothesis that a treatment with Tamoxifen that in these engineered cells lead to C-MYC activation can be related to connectivity changes between genes The connectivity is a very important parameter both for Physical and Biological systems. Connectivity (coupling) changes are the basis for Phase Transitions and developmental changes (ageing, learning and response to external stimuli) Our results show that within the framework of scale free network there are changes in gene-gene connectivity. The connectivity distributions of N and T are far from Poisson with parameters that are similar to those founded for other systems that account for scale free distribution with exponential tail.
Conclusions If we look for the individual gene connectivity or if we look in smaller database we observe that there are significant changes induced by the treatment. As example the clustering coefficient changes and some C-MYC target shows connectivity and clustering coefficient changes One clear result is that the global gene degree connectivity follow a power law distribution both without and with Tamoxifen.This result seems to point out that this type of behaviour is very general These results need to be confirmed and further analyzed, but, at our knowledge this is the first attempt to monitor the network connectivity changes induced by C-MYC activation in comparison with a basal level
Conclusions Some points that need further analysis are the correlation between connectivity change and C-MYC target, our method is not a significance test it can only help to look gene activity as result of interactions between genes at the previous time step The MARKOV approach for the gene-gene connectivity reconstruction is not new (Maritan 2001) but we have introduced matrix validation, rigorous data discretization and normalization that can improve the model robustness Finally we will further improve the model robustness by time reshuffling and try to test its predictive performances