Presentation on theme: "MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne."— Presentation transcript:
1MSc GBE Course: Genes: from sequence to function Brief Introduction to Systems Biology Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: cell:
2Course Overview Basics: What is Systems Biology? Standard analysis tools for large datasetsAdvanced analysis toolsSystems approach to “small” networks
3What is Systems Biology? To understand biology at the system level, we must examine the structure and dynamics of cellular and organismal function, rather than the characteristics of isolated parts of a cell or organism. Properties of systems, such as robustness, emerge as central issues, and understanding these properties may have an impact on the future of medicine.Hiroaki Kitano
4What is Systems Biology? To me, systems biology seeks to explain biological phenomenon not on a gene by gene basis, but through the interaction of all the cellular and biochemical components in a cell or an organism. Since, biologists have always sought to understand the mechanisms sustaining living systems, solutions arising from systems biology have always been the goal in biology. Previously, however, we did not have the knowledge or the tools.Edison T LiuGenome Institute of Singapore
5What is Systems Biology? addresses the analysis of entire biological systemsinterdisciplinary approach to the investigation of all the components and networks contributing to a biological system[involves] new dynamic computer modeling programs which ultimately might allow us to simulate entire organisms based on their individual cellular componentsStrategy of Systems Biology is dependent on interactive cycles of predictions and experimentation.Allow[s Biology] to move from the ranks of a descriptive science to an exact science.(Quotes from SystemsX.ch website)
7What is Systems Biology? identify elements (genes, molecules, cells, …)ascertain their relationships (co-expressed, interacting, …)integrate information to obtain view of system as a wholeLarge (genomic) systemsmany uncharacterized elementsrelationships unknowncomputational analysis should:improve annotationreveal relationsreduce complexitySmall systemselements well-knownmany relationships establishedquantitative modeling of systems properties like:DynamicsRobustnessLogicsA systems approach to a biological system might be likened to what is required to understand our medical healthcare system:As a first step one has to identify the elements and their groups: patients, physicians, nurses, hospitals, insurance companies, government insurance, etc. To understand the whole healthcare system, the relationships of the elements (e.g. individual patients, physicians, nurses, etc.) within a group must be ascertained with respect to one another and the elements in other groups.This information must be integrated together to obtain a view of how the system works.So it is with systems biology—the types of biological information (DNA, RNA, protein, protein interactions, biomodules, cells, tissues, etc.) also have their individual elements (e.g. specific genes or proteins) and the relationships of these with respect to one another and the elements of other types of biological information must be determined and, once again, all of this information integrated to obtain a view (model) of the system as a whole.
8Part 1: Basics Motivation: What is a “systems biology approach”? Why to take such an approach?How can one study systems properties?Practical Part:First look at a set of genomic expression dataHow to have a global look at such datasets?Distributions, mean-values, standard deviations, z-scoresT-tests and other statistical testsCorrelations and similarity measuresSimple Clustering
10DNA microarray experiments monitor expression levels of thousands of genes simultaneously: testcontrolallows for studying the genome-wide transcriptional response of a cell to interior and exterior changesprovide us with a first step towards understanding gene function and regulation on a global scale
21Student’s T-testt-statistic: difference between means in units of average error Significance can be translated into p-value (probability) assuming normal distributions
22History: W. S. Gossett [1876-1937] The t-test was developed by W. S. Gossett, a statistician employed at the Guinness brewery. However, because the brewery did not allow employees to publish their research, Gossett's work on the t-test appears under the name "Student" (and the t-test is sometimes referred to as "Student's t-test.") Gossett was a chemist and was responsible for developing procedures for ensuring the similarity of batches of Guinness. The t-test was developed as a way of measuring how closely the yeast content of a particular batch of beer corresponded to the brewery's standard.
23Comparing profiles X and Y (not distributions!): Pearson correlations (Graphic)Each dot isa pair (xi,yi)Y = (yi)X = (xi)Comparing profiles X and Y (not distributions!):What is the tendency that high/low values in X match high/low values in Y?
24Pearson correlations: Formulae (complicated version)(simple version using z-scores)
25Similarity according to all conditions (“Democratic vote”) Pearson correlations: IntuitionSimilarity according to all conditions (“Democratic vote”)conditions12345r12 ~ 1geneClustering-coefficient12345r12 ~ 0gene12345r12 ~ -1gene
26Pearson correlations: Caution! High correlation does not necessarily mean co-linearity!
27(Hierarchical Agglomerative) Clustering Join most correlated samples and replace correlations to remaining samples by average, then iterate …
30K-means Clustering “guess” k=3 (# of clusters) 1. Start with random positions of centroids ( )2. Assign each data point to closest centroid
31Hierachical Clustering Plus:Shows (re-orderd) dataGives hierarchyMinus:Does not work well for many genes (usually apply cut-off on fold-change)Similarity over all genes/conditionsClusters do not overlap
32Overview of “modular” analysis tools Cheng Y and Church GM. Biclustering of expression data. (Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103)Getz G, Levine E, Domany E. Coupled two-way clustering analysis of gene microarray data. (Proc Natl Acad Sci U S A Oct 24;97(22): )Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. (Proc Natl Acad Sci U S A Mar 2;101(9):2981-6)Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. (Bioinformatics Oct;19 Suppl 2:ii )Gasch AP and Eisen MB. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. (Genome Biol Oct 10;3(11):RESEARCH0059)Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan WC, Botstein D, Brown P. 'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns. (Genome Biol. 2000;1(2):RESEARCH0003.)… and many more!
36The (Iterative) Signature Algorithm: One example in more detail:The (Iterative) Signature Algorithm:No need for correlations!decomposes data into “transcription modules”integrates external informationallows for interspecies comparative analysisJ Ihmels, G Friedlander, SB, O Sarig, Y Ziv & N Barkai Nature Genetics (2002)
38customers with similar choice How to find related items?customers with similar choiceitems10re- commended items2030405060your choice7080901005101520253035404550customers
39similarly expressed genes How to find related genes?relevant conditionsgenes10similarly expressed genes2030405060your guess7080901005101520253035404550conditionsJ Ihmels, G Friedlander, SB, O Sarig, Y Ziv & N Barkai Nature Genetics (2002)
40Signature Algorithm: Score definitions The signature algorithm uses a set of genes as INPUT.These genes could be a putative group of related genes,Or just a random selection of genes.The algorithm uses the expression data to generate its OUTPUT.Genes that are not correlated with the bulk of the input-genes are rejected.And other co-regulated genes that were not part of the INPUT are added.
41How to find related genes? Scores and thresholds! condition scoresthresholding:initial guesses(genes)
42How to find related genes? Scores and thresholds! condition scoresgene scoresthresholding:
43How to find related genes? Scores and thresholds! condition scoresthresholding:gene scores
44Iterative Signature Algorithm INPUTOUTPUT = INPUT“Transcription Module”An extension of this algorithm is the “Iterative Signature Algorithm”.Here the OUPUT is re-used as INPUT, such that further iterations can bring in more co-regulated genes.We repeat this procedure until the OUTPUT equals to the INPUT.We refer to the final OUTPUT as a “transcription module”.As before it contains both the set of co-regulated genes and the conditions that induce their co-regulation.By definition, all genes outside the module are less co-regulated than the module genes under these conditions.OUTPUTSB, J Ihmels & N Barkai Physical Review E (2003)
45Transcription modules Identification of transcription modules using many random “seeds”random “seeds”Transcription modulesIndependent identification: Modules may overlap!In order to identify the transcription modules encode in the expression data we proceed as follows:We assemble a large number of random INPUT seeds.Applying the iterative signature algorithm to each seed converges into a specific transcription module.Since each transcription module is identified independently, it may overlap with other modules.
47Gene enrichment analysis The hypergeometric distribution f(M,A,K,T) gives the probabilitythat K out of A genes with a particular annotation match with amodule having M genes if there are T genes in total.
48Decomposing expression data into annotated transcriptional modules identified >100 transcriptional modules in yeast:high functional consistency!many functional links “waiting” to be verified experimentallyJ Ihmels, SB & N Barkai Bioinformatics 2005
49Higher-order structure correlatedCLet me go one step further and compare higher order correlations between organisms.Since each module comes with an associated set of conditions, we investigate correlations between module according to these conditions.For example, in yeast the regulation pattern of the ribosomal genes is very similar to that of the genes involved in ribosomal RNA processing.A positive correlation is indicated in red and these modules are placed close to each other.In contrast the heat-shock module is anti-correlated with these two modules, which is indicated here in blue.anti- correlated