Lecture 13 - Networks Inference, Analysis, and Applications

Name: Lecture 13 - Networks Inference, Analysis, and Applications
Uploaded: 2017-09-07T13:51:15+00:00
Duration: PTM43S25
Channel: Oswald Robertson
Description: Lecture 13 - Networks Inference, Analysis, and Applications

Lecture 13 - Networks Inference, Analysis, and Applications
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 13 - Networks Inference, Analysis, and Applications Soheil Feizi

Module III: Regulation, Epigenomics, Networks
Computational Foundations: Machine learning, dealing with noisy data L10: Clustering/classification: Supervised/unsupervised learning L11: Burrows-Wheeler Transform. Peak calling. Multivariate HMMs. L12: Expectation Maximization (EM), Gibbs Sampling, Information theory L13: Network algorithms, probabilistic interpretation, linear algebra L14: Normalization, Reproducibility, False Discovery Rate, Integration Biological frontiers: Gene Regulation, Regulatory Systems Genomics L10: Gene expression analysis. L11: Epigenomics and Chromatin state. L12: Regulatory motif discovery. L13: Biological Network inference and analysis. L14: Integrative genomics and the ENCODE project.

Goals for today: Network analysis
Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expr. pred: edge potentials, regression (linear, trees) Func. pred: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks

The multi-layered organization of information in living systems
EPIGENOME DNA CHROMATIN HISTONES GENOME Genes DNA cis-regulatory elements TRANSCRIPTOME RNA miRNA piRNA mRNA ncRNA PROTEOME R1 R2 S1 S2 M1 M2 PROTEINS Transcription factors Signaling proteins Metabolic Enzymes

Biological networks at all cellular levels
Transcriptional gene regulation Post-transcriptional Protein & signaling networks Metabolic Dynamics Modification Proteins Translation RNA Transcription Genome

Five major types of biological networks
Regulatory network Metabolic network Signaling network PPI, Protein interaction network Functional network (Co-expression) Transcription factors (TF) M Enzymes Receptors a Protein complex b A B C F N U X Gene C Metabolites Signaling protein D G P c Y Z E O Q d TF A D P C U X F M N A B Q D Y Z G C O A E Directed, Signed, weighted Undirected, weighted Directed, unweighted Undirected, unweighted Undirected, weighted

Information exchange across networks
Receptors P Q A Signaling protein TF Signaling network Activate TFs A B Transcription factors (TF) Regulatory network Form TF complexes Transcribe enzymes Transcribe proteins U X Y Z N a b c Metabolic network Protein interaction network

Network definitions: structural, probabilistic
Two types of binary graphs: directed/undirected networks Directed graph X1 X2 X3 X1 X2 Undirected graph X3 Graph theory: Nodes, edges, weights, paths Probabilistically: Bayesian Networks A model to represent “dependencies” among variables Unconnected nodes are conditionally independent Linear algebra: Matrices, powers, decomposition

Network applications and challenges
ATTAAT Z A B X Element Identification (motif finding lecture) CGCTT 1 Y Regulators Regulatory Motifs Target genes f Using networks to predict cellular activity Predict expression levels 2 f ? Predict gene ontology (GO) functional annotation terms g A B Inferring networks from functional data X=f(A,B) 3 ? ? ? Y=g(B) X Y Activity patterns Structure Function A B X Y Network Structure Analysis Hubs (degree-distribution) Network motifs Functional modules 4

Regulatory network: Input / output
Experimental Conditions G1 TFs (regulators) G2 G3 G4 Non-TFs (targets) G5 G6 G7 G8 Gene expression prediction: G1 G2 Gn (...) fi Intractable to compute joint distribution  Focus on marginal distributions. Gi

Very large number of regulators / targets
Regulatory network limits the number of possible hypotheses Only directly related elements are connected Assume other pairs of nodes are conditionally independent

Graphs represent variable dependencies
1 2 3 X1 X3 X2 X1 and X2 are dependent. X2 and X3 are dependent. X1 and X3 are conditionally independent If we know the value of X2, they are independent But if the value of X2 is not known, then: Observing (or estimating) value of X1 …. … can influence our estimate of the value of X2… … which in turn can influence our estimate of value of X3  Some information does flow X1X3 through X2: Dependent! X1 and X3 are independent given X2 :

Probability tables vs. graphical models
X1 X2 X3 X4 X5 X1 X5 X2 Conditions X4 . X3

Network structure sets of ind. variables
Variables S1 and S3 are conditionally independent given S2, if they become disconnected by removing S2 Graphical models represent “structure” of joint probability distribution: reason about graph, instead of reasoning about probability tables

Directed graphs  Asymmetry of conditional ind
X1 X2 X3 X1 X2 X3 X1 X2 X3 vs. vs. Given parents: children nodes independent from the rest of the network S2 S3 S1

Joint distribution  node/edge potentials (Markov Random Field)
Bayes’ rule X1 X2 X3 Conditionally independent variables appear in separate terms Node potential Edge potential partition function (typically cancels out) For every network edge X1 X2 X3 X4 F(.) X1 X2 X3 X4 F(.) What about the function F(.)?

Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expression prediction: edge potentials, regression (linear, trees) Function prediction: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks

Types of potential functions F(.)
1 General (non-parametric) Parametric potentials: Exponential functions Rayleigh Gaussian Gaussian functions General covariance Unit variance, only correlations ρ No covariance (indpent) 2 2 3

Gaussian edge potential functions (Gaussian graphical models)
If X1, X2 and X3 are jointly Gaussian with μ=0 and σ=1  edge potential functions simplify to correlations ρi,j X1 X3 X2 correlations X1 X2 X3 X1 X2 X3 Ψ(X1,X2) Ψ(X2,X3)

Prediction problem  calculate marginals
X1 X2 X3 Normalization term (Z) will be canceled out!

Assume linear function from regulators to target (Linear regression)
Assume expression of a target is Gaussian whose mean is a linear combination of the expression level of regulators X1 X2 X3 Use maximum likelihood to find parameters. Data: D Model: M Take derivatives to find optimal model parameters Problem of over-fitting => regularization

Predicting expression using regression trees
X1 > e1 Each path captures a mode of regulation NO Activating regulation YES X2 > e2 Activating regulation NO YES Repressing regulation Expression of target modeled using Gaussians at each leaf node Assumes variables are continuous. Arranges regulators in a tree Expression prediction follows a set of decision rules  Can model combinatorics Allows non-linear dependencies between regulators and target Targets can share regulatory programs

Introduction to networks Network types: regulatory, metab., signal., interact., func. Structural, probabilistic, and linear algebra views Predictive networks (graphical probabilistic models) Conditional indepd. for (un)directed graphical models Expression prediction: edge potentials, regression (linear, trees) Function prediction: Guilt by association, hierarchical classification Network inference (inferring network structure) Structure learning: likelihood approach, correlation Integrative network inference: additive, probabilistic Structural properties of regulatory networks Global properties: Degree distribution, scale free Local properties: Network motifs Matrix operations on networks Linear algebra view of networks Spectral clustering and modular networks

Predicting functions of un-annotated genes
Goal: Predict function of unannotated genes based on “guilt by association” Different types of “association” Co-expression Protein-interactions Co-regulation X1 f g ? Regulator f X2 X3 f ? X4 g g Co-regulated genes However most approaches work with “functional networks” Deng et al 03, Sharan et al 07

Iterative classification algorithm
? Unknown genes ? ? Labels of unknown genes influence each other Need to be inferred jointly Known genes Start with an initial assignment of labels Repeat iteratively Update relational attributes Re-infer the labels Neville 03, Getoor 05

Likelihood approach to infer “network structure”
Assign a likelihood score to each structure Pick the best one! G1 G2 G3

Structure Learning needs search
... Best graph Maximum likelihood

Likelihood approach to infer network structure: challenges
Problems: Exponentially many structures! Unable to discriminate between direct vs indirect links (Undistinguishable structures!)

Solution 1: Correlation-based inference methods
Only consider structures whose “observed” edge weights are high Perform maximum likelihood test among fewer structures X1 X2 X3

Issues of correlation-based inference methods
Problems: Many false positive and true negative edges Observed edge weights may be different than true edge weights. Indirect effects and transitive edges: X2 X1

Indirect information flows cause transitive edges
Transitive edges are due to information flows over indirect paths ARACNE solution: Exclude edges with lowest Information in a triplet => information inequality X1 I(X1,X2) I(X1,X3) I(X2,X3) < min(I(X1,X2),I(X1,X3)) X2 X3 I(X2,X3) Network deconvolution!

Integrative network inference Diverse data types: chip, motif, chromatin
Glue proteins to DNA, cut into pieces Use antibody to filter for a specific protein Sequence the pieces, map back to genome

Integrated approach to infer regulatory networks Solution 3: Solution 1+Solution 2
Combine inferred regulatory networks from many data types

Network integration: problem setup
X1 X2 X3 N1=(V1,E1) N2=(V2,E2) N3=(V3,E3) Can we simply add weights? Assumptions: Input networks are “independent” Weights represent log-likelihoods

Likelihood approach to integrate weighted networks
Bayes’ rule Independence assumption

Structural Properties of Regulatory networks
Regulatory networks have scale-free distribution “Scale-free”: Graph is self-similar at all scales Degree distribution follows a power law P(d) ~ dγ Implies the presence of hubs Hub perturbations are often lethal In degree distribution of E. coli regulatory network Adapted from Albert 05,

Why are scale free distributions important
Presence of hubs Make the network robust to perturbations Preserve overall connectivity Perturbations to hubs is often lethal for an organism

Structural network motifs
Auto-regulation Multi-component Feed-forward loop Single Input Multi Input Regulatory Chain Feed-forward loops involved in speeding up in response of target gene Lee et.al. 2002, Mangan & Alon, 2003

Modularity of regulatory networks
Modular: Graph with densely connected subgraphs Genes in modules involved in similar functions and co-regulated Modules can be identified using graph partitioning algorithms Markov Clustering Algorithm Girvan-Newman Algorithm Spectral partitioning Newman PNAS 2007

An algebraic view to networks
A matrix representation of a network: Unweighted network => binary adjacency matrix Weighted network => real-valued matrix X1 X2 X3 1 Adjacency Matrix A= 1 1 1 1 -1 K1 K2 -Aij Laplacian Matrix L= = 1 -1 -Aij Kn -1 -1 2

Laplacian matrix characterizes # of edges between groups
node i in group 1 => si=1 node i in group 2 => si=-1 i-th component of L S= = degree (i)- (# edges to group 1) + (# edges to group 2) Diagonal term of Laplacian Off-diagonal terms of Laplacian = =2=2*(# edges going out of group 1)

Laplacian matrix characterizes # of edges between groups
node i in group 1 => si=1 node i in group 2 => si=-1 # of edges between groups = Choose vector s to minimize the error term A trivial solution: if s=(1,1,…,1), error is zero. A non-trivial solution: s parallel to the second eigenvector of L (why?)

Eigen decomposition principle-introduction
Suppose L is a square matrix: . . . contains eigenvectors. is a diagonal matrix of eigenvalues. For symmetric matrices, eigenvalues are real

Network modularization by using decomposition of Laplacian matrix
Use eigen decomposition principles: Project s over eigenvectors of L: Challenges in finding optimal ai’s: Without other conditions, a trivial solution exists Second eigenvector characterizes partitioning Vector s should be integer-valued => projection

Network modularization-example
1 2 3 4 5 6 7 8 A = L = Adjacency Matrix Laplacian Matrix

Eigen decomposition-example
1 2 3 4 5 6 7 8 U= First eigenvalue of Laplacian matrix is always zero. Second eigenvector of Laplacian matrix characterizes a network partition.

Conclusions Regulatory networks are central to gaining a systems-level understanding of living systems Structure and functional aspects of the network is unknown Probabilistic models provide a mathematical framework of representing and learning regulatory networks

Open issues Validation
How do we know the network structure is right? How do we know if the network function is right? Measuring and modeling protein expression Understanding the evolution of regulatory networks

Further reading Probabilistic graphical models
Network structure analysis Function Prediction

Predicting expression
Goal: Learn a parametric relationship between regulators and a target gene Use the “regulation function” of every target gene as a predictive model Predicting expression of multiple genes is essentially equivalent to solving a bunch of regression problems

Modeling the regulatory functions
Conditional Gaussian models Linear regression model Regression Trees Non-linear regression

Rules for conditional independence

Hierarchy of more complex models
From Lei Zhang, RPI

Spectral Partitioning- problem setup
group 1 group 2 minimize # of edges between groups # of edges between groups=(total # of edges)-(# edges within groups) nodes i and j are connected => Aij=1 node i in group 1 => si=1 node i in group 2 => si=-1 nodes i and j in the same group => (sisj+1)/2=1 nodes i and j in different groups => (sisj+1)/2=0

Laplacian matrix plays a major role in network modularization
# of edges between groups=(total # of edges)-(# edges within groups) K1 K2 -Aij node i in group 1 => si=1 node i in group 2 => si=-1 L= -Aij Kn Laplacian Matrix

Lecture 13 - Networks Inference, Analysis, and Applications

Similar presentations

Presentation on theme: "Lecture 13 - Networks Inference, Analysis, and Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 13 - Networks Inference, Analysis, and Applications

Similar presentations

Presentation on theme: "Lecture 13 - Networks Inference, Analysis, and Applications"— Presentation transcript:

Similar presentations

About project

Feedback