Introduction to Systems Biology: Systems Medicine Charu G. Kumar, Ph.D. Research Assistant Professor Department of Bioengineering.

Slides:



Advertisements
Similar presentations
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis (GSEA)
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Lecture #1 Introduction.
GENIE – GEne Network Inference with Ensemble of trees Van Anh Huynh-Thu Department of Electrical Engineering and Computer Science, Systems and Modeling,
Regulated Flux-Balance Analysis (rFBA) Speack: Zhu YANG
Gene expression analysis summary Where are we now?
1. Elements of the Genetic Algorithm  Genome: A finite dynamical system model as a set of d polynomials over  2 (finite field of 2 elements)  Fitness.
Experimental and computational assessment of conditionally essential genes in E. coli Chao WANG, Oct
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Systems Biology Biological Sequence Analysis
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Genetics: From Genes to Genomes
Epistasis Analysis Using Microarrays Chris Workman.
Metabolic/Subsystem Reconstruction And Modeling. Given a “complete” set of genes… Assemble a “complete” picture of the biology of an organism? Gene products.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Lecture #23 Varying Parameters. Outline Varying a single parameter – Robustness analysis – Old core E. coli model – New core E. coli model – Literature.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Flux Balance Analysis Evangelos Simeonidis Metabolic Engineering.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh.
Networks and Interactions Boo Virk v1.0.
Transcriptional Regulation in Constraints-based metabolic Models of E. coli Published by Markus Covert and Bernhard Palsson, 2002.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
The Optimal Metabolic Network Identification Paula Jouhten Seminar on Computational Systems Biology
Solution Space? In most cases lack of constraints provide a space of solutions What can we do with this space? 1.Optimization methods (previous lesson)
Reconstruction of Transcriptional Regulatory Networks
BIOINFORMATICS ON NETWORKS Nick Sahinidis University of Illinois at Urbana-Champaign Chemical and Biomolecular Engineering.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
BIOS6660 shRNAseq Gene Set Enrichment Analysis Tzu L Phang PhD Robert Stearman PhD April 16, 2014.
Introduction: Acknowledgments Thanks to Department of Biotechnology (DBT), the Indo-US Science and Technology Forum (IUSSTF), University of Wisconsin-Madison.
10 AM Tue 20-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
Flexibility in energy metabolism supports hypoxia tolerance in Drosophila flight muscle: metabolomic and computational systems analysis Jacob Feala 1,2.
Lecture #19 Growth states of cells. Outline Objective functions The BOF The core E. coli model The genome-scale E. coli model Using BOF.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Project 2 Flux Balance Analysis of Mitochondria Energy Metabolism Suresh Gudimetla Salil Pathare.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Biological Network Analysis
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
BT8118 – Adv. Topics in Systems Biology
The Pathway Tools FBA Module
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Department of Chemical Engineering
The Omics Dashboard.
Presentation transcript:

Introduction to Systems Biology: Systems Medicine Charu G. Kumar, Ph.D. Research Assistant Professor Department of Bioengineering

At the end of this talk, you will be able to: Understand some aspects of systems biology. (15 min) Do statistical analyses for high-throughput data (20 min) –Gene set enrichment analysis Infer Gene-networks (50-60 min) –Inference methods –Tools for developing and visualizing gene networks Understand basic biochemical reaction modeling (20 min)

Leaders in the field Karl Ludwig von Bertalanffy (1950) laid out the General Systems Theory Denis Noble Mihajlo Mesarovic (1968) formally inducted the term, ‘Systems Biology’. Jacob and Monod (1970) described the Feedback regulatory mechanism on a molecular level.

What is Systems Biology? Study of interactions between the components, and emergence of functions and behavior of that system. This behavior is emergent (more than the sum of its processes) and not reductionist (behavior completely defined as sum or difference of component interactions). It is the science of modeling and discovering the broader dynamic and complex relationships between molecules in cell types or model organisms.

Systems biology captures the dynamic nature of the biological processes by focusing on the interactions and their reconstruction Metabolites turnover within a minute Figure from Vogel, 2011; Schwanhausser et al Time-scales: Metabolic networks (few seconds)<<protein networks (secs-mins) < GRNs (mins to hours)

Top-down Reconstructions (e.g. GRNs) from high-throughput molecular ‘omics’ data (microarray, proteomics, rna-seq) using Inference Methods (non-stoichiometric and coarse-grained)

Protein and GR networks Figures taken from Sauro’s book on Control theory for biologists Protein mechanisms Gene regulatory patterns

Bottom-up Reconstructions using direct methods for generating metabolic models (stoichiometric networks)

Tailoring to tissues; Drug response phenotypes Adaptive evolution; Disease progression; Synthetic biology; Metabolic engineering Systems Biology Paradigm: components  networks  computational models  phenotypes Figure adpated from Nathan Price’s course slides

Processing of high-throughput ‘omics’ data Noise filtering, background correction (adjust data for background intensity surrounding each feature i.e. non-specific hybridization) Normalization (adjusting values for spatial heterogeneity, different dye absorption, etc in samples) Feature summarization. Can use GenePattern for microarray analysis :

Gene Set Enrichment analysis (GSEA) HT anal results in gene lists that we evaluate using our favorite statistical test (Hypergeometric, t-test, Z- test etc) which give a p-value = P(this sample |Ho is true). For multiple comparisons, p-value adjusted for false discovery (q-value). Alternate tool developed by Subramanian et al (2005) Is your gene list over-represented in some known gene set (published gene list representing a pathway or GO category, or cytogenetic bands)? Needs these files: –Entire microarray data (Specific defined formats) –Sample phenotype –Known gene set –Microarray Chip annotation

GSEA input files (*.GCT)

GSEA input: Phenotype.cls file

GenePattern can convert RNA-seq (and other format files) into.gct

GSEA params

GO category Gene clusters ES(S) is calculated based on both the correlations and the positions in ranked matrix. Genes in expression matrix are sorted based on correlation to phenotype classes Compute ES for each permutation. Mootha et al. Nature Genetics, 2003 Compare the distribution of these ES with ES for actual data.

GSEA: Leading Edge Analysis

Top-down Reconstructions (e.g. GRNs) from high-throughput molecular ‘omics’ data (microarray, proteomics, rna-seq) using Inference Methods GSEA

Networks are mathematical graphs consisting of nodes, and edges joining those nodes. The degree or connectivity ‘d’ of a node is the number of edges from that node. Power-law distribution; P(d) ∝ d -γ, γ is a constant that is characteristic of the network

Gene Regulatory Networks Describe the interaction between TFs (and/or miRNA) and genes. GRNs are information processing networks that help determine the rate of protein production. Inouye and Kaneko, PLoS Comp. Biol Xj->Y X Y Rate of production Y =f (X*)

Kumar et al BMC Genomics 11:161 Agglomerative hierarchical clustering with average correlation >0.75 Used Match to search TRANFAC db; Each TFBS in cluster tested for significant enrichment. ANN-Spec for motif prediction; Tomtom

CARMAweb (

Unsupervised methods for inference of GRNs The Algorithm for the Reconstruction of Accurate Cellular Networks(ARACNE) –Margolin et al BMC Bioinformatics 7: S7 Context Likelihood Relatedness (CLR) –Faith et al PLOS Biol. The above two methods are based on Mutual Information (MI) for identifying co-expression networks. MI measures the dependency between two random variables i.e. to what extent does one variable reduce the uncertainty of prediction in the other. Weighted Gene Co-expression Network Analysis (WGCNA) WGCNA is based on Pearson correlation

ARACNE Works with more than 100 microarray samples Basso et al DPI: I(g 1, g 3 ) ≤ min [I(g 1, g 2 ); I(g 2, g 3 )] Finds the weakest link of a triplet Removes that edge. Infers the most likely path of information flow.

ARACNE download and setup aracne –i /data/input.exp –k 0.15 –t 0.05–r 1 aracne2 –i /data/input.exp –k 0.15 –t 0.05 –r 1 –p 1e-7 Outputs an adjacency matrix that consists of inferred interactions. To view the adjacency matrix as a network, geWorkbench can be installed from Nature protocol has tutorial, manual and technical report Margolin et al. Nature Protocols 1, (2006)

Aracne command line

JAVA GUI geWorkbench

Network visualization (Cytoscape component) geWorkbench can be installed from

Simple Interaction format (SIF) for Cytoscape nodeA nodeB nodeC nodeA nodeD nodeE... nodeY nodeZ

Metabolic models are stoichiometric representations of all possible biochemical reactions in the cell. 1.Provide a mapping between genotype and the phenotype 2.Identify key features of metabolism such as growth yield, network robustness, and gene essentiality. 3.Models of yeast have been used to investigate production of therapeutic proteins, as yeast model allow modeling of PTMs. 4.Pathogenic models allow for development of novel drugs to combat infection with minimal side-effects to host. 5.Metabolic models of mammals have been employed to study various diseases. 6.Model microbes for their biotechnological applications, such as fermentation, biofuel production, etc.

Kim TY et al. 2012

PATHOLOGIC, the model SEED Figure adapted from Nathan Price’s course slides

Step 2: Refinement of reconstruction Verify rxns for enzyme and substrate specificity, Gene-Protein- Reaction formulation, stoichiometry, directionality, and location. Figure adapted from Nathan Price’s course slides

Figure from Nathan Price’s course slides

Step 3: Converting the reconstruction into a computable form. Mathematically represent the reconstruction as a matrix Define system boundaries [extracellular, intracellular, and exchange reactions e.g. transport, which are represented w.r.t the extracellular environment (secretion is +ve flux, uptake is –ve flux)]. Add constraints Mass balance Steady ‐ state Thermodynamics (e.g., reaction directionality) Environmental constraints (e.g. presence or absence of nutrients) Regulatory (e.g., on/off gene expression )

= S matrix

Metabolic model consists of three components The reaction network, which is encoded as a stoichiometric matrix [parsed using the COBRA toolbox].stoichiometric matrix A list of rules called gene-protein-reaction (GPR) associations that describe how gene activity is linked to reaction activity. A biomass function, which is a list of small molecules, co- factors, nucleotides, amino acids, lipids, and cell wall components needed to support growth and division. Assumption used for modeling: Metabolism is in steady state. i.e. Uptake and secretion have reached a plateau; d[A]/dt ≈ 0

Mathematically, the S matrix is a linear transformation of the unsolved flux vector v = (v 1,v 2,.., v n ) to a vector of time derivatives of the concentration vector x = (x 1, x 2,.., x m ) as = S ∙ v V 1 V 2 A -1 0 Rxns for B: A ↔B, V 1 ; 2B ↔ C, V 2 ; B 1 -2 Mass balance: ; C 0 1 Steady state: At steady state, the change in concentration as a function of time is zero; hence, dx/dt = S ∙ v = 0 Solve for the possible set of flux vectors. Flux Balance Analysis

Constraints and Biomass Objective Function The set of possible flux vectors are further constrained by defining v i(lb) ≤ v i ≤ v i(ub) for reaction i. Assume Objective of organisms: grow, divide and proliferate. Need biomass generating metabolic precursors (e.g. aa, nts, phospholipids, vit., cofactors, energy req). This Biomass Objective Function requires dry cell weight composition, and macromolecular breakdown. For 1gDW Ecoli Adapted from Nogales et al. BMC Sys Biol Z = v ATP v NADH v NADPH v G6P v F6P v R5P v E4P v T3P v 3PG v PEP v PYR v AcCoA v OAA v AKG Using steady state fluxes, solve using linear programming to optimize Z.

Figure adapted from Nathan Price’s course slides

Step 4: Evaluation of network content Evaluate content pathway by pathway Will ease identification of missing genes & reactions Draw metabolic maps to ease detection of missing rxns –Gap analysis e.g. H.pylori has 2 of 4 enzymes missing for Ile and Val synthesis. Gap? No. Turns out Ile, Val are needed in medium to grow. Analysis of dead-end metabolites (either consumed OR produced) Network evaluation: can it generate biomass components, precursors to metabolites, mass-charge balancing, etc.

Conclusions Top down reconstruction: of networks using high- throughput data requires reliable statistical predictions. Gene Set Enrichment Analysis is an alternative to looking for over-representation in your gene list, by looking for enrichment of genes in defined gene sets. Gene regulatory network inference using Aracne Bottom up reconstructions: result in a more precise, mathematically d(r)efined model.

References 1: Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A Oct 25;102(43): Epub 2005 Sep 30. PubMed PMID: ; PubMed Central PMCID: PMC : Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet Jul;34(3): PubMed PMID: : Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol Jan;5(1):e8. PubMed PMID: ; PubMed Central PMCID: PMC : Basso K, Margolin AA, Stolovitzky G, Klein U, Dalla-Favera R, Califano A. Reverse engineering of regulatory networks in human B cells. Nat Genet Apr;37(4): Epub 2005 Mar 20. PubMed PMID: : Kumar CG, Everts RE, Loor JJ, Lewin HA. Functional annotation of novel lineage-specific genes using co-expression and promoter analysis. BMC Genomics Mar 9;11:161. doi: / PubMed PMID: ; PubMed Central PMCID: PMC : Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification of mammalian gene expression control. Nature May 19;473(7347): doi: /nature Erratum in: Nature Mar 7;495(7439): PubMed PMID: Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc Jan;5(1): doi: /nprot Epub 2010 Jan 7. PubMed PMID: ; PubMed Central PMCID: PMC

Thank you!

Tools for Enrichment analysis DAVID BinGO (Cytoscape app) GSEA GoMiner: GOstat: