Download presentation
Published byOswald Robertson Modified over 8 years ago
1
Lecture 13 - Networks Inference, Analysis, and Applications
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 13 - Networks Inference, Analysis, and Applications Soheil Feizi
3
Module III: Regulation, Epigenomics, Networks
Computational Foundations: Machine learning, dealing with noisy data L10: Clustering/classification: Supervised/unsupervised learning L11: Burrows-Wheeler Transform. Peak calling. Multivariate HMMs. L12: Expectation Maximization (EM), Gibbs Sampling, Information theory L13: Network algorithms, probabilistic interpretation, linear algebra L14: Normalization, Reproducibility, False Discovery Rate, Integration Biological frontiers: Gene Regulation, Regulatory Systems Genomics L10: Gene expression analysis. L11: Epigenomics and Chromatin state. L12: Regulatory motif discovery. L13: Biological Network inference and analysis. L14: Integrative genomics and the ENCODE project.
4
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
5
The multi-layered organization of information in living systems
EPIGENOME DNA CHROMATIN HISTONES GENOME Genes DNA cis-regulatory elements TRANSCRIPTOME RNA miRNA piRNA mRNA ncRNA PROTEOME R1 R2 S1 S2 M1 M2 PROTEINS Transcription factors Signaling proteins Metabolic Enzymes
6
Biological networks at all cellular levels
Transcriptional gene regulation Post-transcriptional Protein & signaling networks Metabolic Dynamics Modification Proteins Translation RNA Transcription Genome
7
Five major types of biological networks
Regulatory network Metabolic network Signaling network PPI, Protein interaction network Functional network (Co-expression) Transcription factors (TF) M Enzymes Receptors a Protein complex b A B C F N U X Gene C Metabolites Signaling protein D G P c Y Z E O Q d TF A D P C U X F M N A B Q D Y Z G C O A E Directed, Signed, weighted Undirected, weighted Directed, unweighted Undirected, unweighted Undirected, weighted
8
Information exchange across networks
Receptors P Q A Signaling protein TF Signaling network Activate TFs A B Transcription factors (TF) Regulatory network Form TF complexes Transcribe enzymes Transcribe proteins U X Y Z N a b c Metabolic network Protein interaction network
9
Network definitions: structural, probabilistic
Two types of binary graphs: directed/undirected networks Directed graph X1 X2 X3 X1 X2 Undirected graph X3 Graph theory: Nodes, edges, weights, paths Probabilistically: Bayesian Networks A model to represent “dependencies” among variables Unconnected nodes are conditionally independent Linear algebra: Matrices, powers, decomposition
10
Network applications and challenges
ATTAAT Z A B X Element Identification (motif finding lecture) CGCTT 1 Y Regulators Regulatory Motifs Target genes f Using networks to predict cellular activity Predict expression levels 2 f ? Predict gene ontology (GO) functional annotation terms g A B Inferring networks from functional data X=f(A,B) 3 ? ? ? Y=g(B) X Y Activity patterns Structure Function A B X Y Network Structure Analysis Hubs (degree-distribution) Network motifs Functional modules 4
11
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
12
Regulatory network: Input / output
Experimental Conditions G1 TFs (regulators) G2 G3 G4 Non-TFs (targets) G5 G6 G7 G8 Gene expression prediction: G1 G2 Gn (...) fi Intractable to compute joint distribution Focus on marginal distributions. Gi
13
Very large number of regulators / targets
Regulatory network limits the number of possible hypotheses Only directly related elements are connected Assume other pairs of nodes are conditionally independent
14
Graphs represent variable dependencies
1 2 3 X1 X3 X2 X1 and X2 are dependent. X2 and X3 are dependent. X1 and X3 are conditionally independent If we know the value of X2, they are independent But if the value of X2 is not known, then: Observing (or estimating) value of X1 …. … can influence our estimate of the value of X2… … which in turn can influence our estimate of value of X3 Some information does flow X1X3 through X2: Dependent! X1 and X3 are independent given X2 :
15
Probability tables vs. graphical models
X1 X2 X3 X4 X5 X1 X5 X2 Conditions X4 . X3
16
Network structure sets of ind. variables
Variables S1 and S3 are conditionally independent given S2, if they become disconnected by removing S2 Graphical models represent “structure” of joint probability distribution: reason about graph, instead of reasoning about probability tables
17
Directed graphs Asymmetry of conditional ind
X1 X2 X3 X1 X2 X3 X1 X2 X3 vs. vs. Given parents: children nodes independent from the rest of the network S2 S3 S1
18
Joint distribution node/edge potentials (Markov Random Field)
Bayes’ rule X1 X2 X3 Conditionally independent variables appear in separate terms Node potential Edge potential partition function (typically cancels out) For every network edge X1 X2 X3 X4 F(.) X1 X2 X3 X4 F(.) What about the function F(.)?
19
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expression prediction: edge potentials, regression (linear, trees) Function prediction: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
20
Types of potential functions F(.)
1 General (non-parametric) Parametric potentials: Exponential functions Rayleigh Gaussian Gaussian functions General covariance Unit variance, only correlations ρ No covariance (indpent) 2 2 3
21
Gaussian edge potential functions (Gaussian graphical models)
If X1, X2 and X3 are jointly Gaussian with μ=0 and σ=1 edge potential functions simplify to correlations ρi,j X1 X3 X2 correlations X1 X2 X3 X1 X2 X3 Ψ(X1,X2) Ψ(X2,X3)
22
Prediction problem calculate marginals
X1 X2 X3 Normalization term (Z) will be canceled out!
23
Assume linear function from regulators to target (Linear regression)
Assume expression of a target is Gaussian whose mean is a linear combination of the expression level of regulators X1 X2 X3 Use maximum likelihood to find parameters. Data: D Model: M Take derivatives to find optimal model parameters Problem of over-fitting => regularization
24
Predicting expression using regression trees
X1 > e1 Each path captures a mode of regulation NO Activating regulation YES X2 > e2 Activating regulation NO YES Repressing regulation Expression of target modeled using Gaussians at each leaf node Assumes variables are continuous. Arranges regulators in a tree Expression prediction follows a set of decision rules Can model combinatorics Allows non-linear dependencies between regulators and target Targets can share regulatory programs
25
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expression prediction: edge potentials, regression (linear, trees) Function prediction: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
26
Predicting functions of un-annotated genes
Goal: Predict function of unannotated genes based on “guilt by association” Different types of “association” Co-expression Protein-interactions Co-regulation X1 f g ? Regulator f X2 X3 f ? X4 g g Co-regulated genes However most approaches work with “functional networks” Deng et al 03, Sharan et al 07
27
Iterative classification algorithm
? Unknown genes ? ? Labels of unknown genes influence each other Need to be inferred jointly Known genes Start with an initial assignment of labels Repeat iteratively Update relational attributes Re-infer the labels Neville 03, Getoor 05
28
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
29
Likelihood approach to infer “network structure”
Assign a likelihood score to each structure Pick the best one! G1 G2 G3
30
Structure Learning needs search
... Best graph Maximum likelihood
31
Likelihood approach to infer network structure: challenges
Problems: Exponentially many structures! Unable to discriminate between direct vs indirect links (Undistinguishable structures!)
32
Solution 1: Correlation-based inference methods
Only consider structures whose “observed” edge weights are high Perform maximum likelihood test among fewer structures X1 X2 X3
33
Issues of correlation-based inference methods
Problems: Many false positive and true negative edges Observed edge weights may be different than true edge weights. Indirect effects and transitive edges: X2 X1
34
Indirect information flows cause transitive edges
Transitive edges are due to information flows over indirect paths ARACNE solution: Exclude edges with lowest Information in a triplet => information inequality X1 I(X1,X2) I(X1,X3) I(X2,X3) < min(I(X1,X2),I(X1,X3)) X2 X3 I(X2,X3) Network deconvolution!
35
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
36
Integrative network inference Diverse data types: chip, motif, chromatin
Glue proteins to DNA, cut into pieces Use antibody to filter for a specific protein Sequence the pieces, map back to genome
37
Integrated approach to infer regulatory networks Solution 3: Solution 1+Solution 2
Combine inferred regulatory networks from many data types
38
Network integration: problem setup
X1 X2 X3 N1=(V1,E1) N2=(V2,E2) N3=(V3,E3) Can we simply add weights? Assumptions: Input networks are “independent” Weights represent log-likelihoods
39
Likelihood approach to integrate weighted networks
Bayes’ rule Independence assumption
40
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
41
Structural Properties of Regulatory networks
Regulatory networks have scale-free distribution “Scale-free”: Graph is self-similar at all scales Degree distribution follows a power law P(d) ~ dγ Implies the presence of hubs Hub perturbations are often lethal In degree distribution of E. coli regulatory network Adapted from Albert 05,
42
Why are scale free distributions important
Presence of hubs Make the network robust to perturbations Preserve overall connectivity Perturbations to hubs is often lethal for an organism
43
Structural network motifs
Auto-regulation Multi-component Feed-forward loop Single Input Multi Input Regulatory Chain Feed-forward loops involved in speeding up in response of target gene Lee et.al. 2002, Mangan & Alon, 2003
44
Modularity of regulatory networks
Modular: Graph with densely connected subgraphs Genes in modules involved in similar functions and co-regulated Modules can be identified using graph partitioning algorithms Markov Clustering Algorithm Girvan-Newman Algorithm Spectral partitioning Newman PNAS 2007
45
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
46
An algebraic view to networks
A matrix representation of a network: Unweighted network => binary adjacency matrix Weighted network => real-valued matrix X1 X2 X3 1 Adjacency Matrix A= 1 1 1 1 -1 K1 K2 -Aij Laplacian Matrix L= = 1 -1 -Aij Kn -1 -1 2
47
Laplacian matrix characterizes # of edges between groups
node i in group 1 => si=1 node i in group 2 => si=-1 i-th component of L S= = degree (i)- (# edges to group 1) + (# edges to group 2) Diagonal term of Laplacian Off-diagonal terms of Laplacian = =2=2*(# edges going out of group 1)
48
Laplacian matrix characterizes # of edges between groups
node i in group 1 => si=1 node i in group 2 => si=-1 # of edges between groups = Choose vector s to minimize the error term A trivial solution: if s=(1,1,…,1), error is zero. A non-trivial solution: s parallel to the second eigenvector of L (why?)
49
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
50
Eigen decomposition principle-introduction
Suppose L is a square matrix: . . . contains eigenvectors. is a diagonal matrix of eigenvalues. For symmetric matrices, eigenvalues are real
51
Network modularization by using decomposition of Laplacian matrix
Use eigen decomposition principles: Project s over eigenvectors of L: Challenges in finding optimal ai’s: Without other conditions, a trivial solution exists Second eigenvector characterizes partitioning Vector s should be integer-valued => projection
52
Network modularization-example
1 2 3 4 5 6 7 8 A = L = Adjacency Matrix Laplacian Matrix
53
Eigen decomposition-example
1 2 3 4 5 6 7 8 U= First eigenvalue of Laplacian matrix is always zero. Second eigenvector of Laplacian matrix characterizes a network partition.
54
Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks
55
Conclusions Regulatory networks are central to gaining a systems-level understanding of living systems Structure and functional aspects of the network is unknown Probabilistic models provide a mathematical framework of representing and learning regulatory networks
56
Open issues Validation
How do we know the network structure is right? How do we know if the network function is right? Measuring and modeling protein expression Understanding the evolution of regulatory networks
57
Further reading Probabilistic graphical models
Network structure analysis Function Prediction
58
Predicting expression
Goal: Learn a parametric relationship between regulators and a target gene Use the “regulation function” of every target gene as a predictive model Predicting expression of multiple genes is essentially equivalent to solving a bunch of regression problems
59
Modeling the regulatory functions
Conditional Gaussian models Linear regression model Regression Trees Non-linear regression
60
Rules for conditional independence
61
Hierarchy of more complex models
From Lei Zhang, RPI
62
Spectral Partitioning- problem setup
group 1 group 2 minimize # of edges between groups # of edges between groups=(total # of edges)-(# edges within groups) nodes i and j are connected => Aij=1 node i in group 1 => si=1 node i in group 2 => si=-1 nodes i and j in the same group => (sisj+1)/2=1 nodes i and j in different groups => (sisj+1)/2=0
63
Laplacian matrix plays a major role in network modularization
# of edges between groups=(total # of edges)-(# edges within groups) K1 K2 -Aij node i in group 1 => si=1 node i in group 2 => si=-1 L= -Aij Kn Laplacian Matrix
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.