A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU)

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
On-line learning and Boosting
Fast Algorithms For Hierarchical Range Histogram Constructions
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
1 Optimization Algorithms on a Quantum Computer A New Paradigm for Technical Computing Richard H. Warren, PhD Optimization.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Lecture 5: Learning models using EM
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Temporal Causal Modeling with Graphical Granger Methods
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Efficient and Effective Itemset Pattern Summarization: Regression-based Approaches Ruoming Jin Kent State University Joint work with Muad Abu-Ata, Yang.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Overlapping Matrix Pattern Visualization: a Hypergraph Approach Ruoming Jin Kent State University Joint with Yang Xiang, David Fuhry, and Feodor F. Dragan.
Expectation Maximization Algorithm
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Prediction and model selection
Ordinary least squares regression (OLS)
Opportunistic Optimization for Market-Based Multirobot Control M. Bernardine Dias and Anthony Stentz Presented by: Wenjin Zhou.
Jana van Greunen - 228a1 Analysis of Localization Algorithms for Sensor Networks Jana van Greunen.
Advanced Topics in Optimization
Comparison of Regularization Penalties Pt.2 NCSU Statistical Learning Group Will Burton Oct
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Unconstrained Optimization Rong Jin. Logistic Regression The optimization problem is to find weights w and b that maximizes the above log-likelihood How.
Classification and Prediction: Regression Analysis
Normalised Least Mean-Square Adaptive Filtering
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Time-Series Analysis and Forecasting – Part V To read at home.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
STDM - Linear Programming 1 By Isuru Manawadu B.Sc in Accounting Sp. (USJP), ACA, AFM
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Multivariate Data Analysis CHAPTER seventeen.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Path Analysis and Structured Linear Equations Biologists in interested in complex phenomena Entails hypothesis testing –Deriving causal linkages between.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Lecture 2: Statistical learning primer for biologists
CpSc 881: Machine Learning
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Chapter 7. Classification and Prediction
Data Driven Resource Allocation for Distributed Learning
A Consensus-Based Clustering Method
Whitening-Rotation Based MIMO Channel Estimation
Presentation transcript:

A Sparsification Approach for Temporal Graphical Model Decomposition Ning Ruan Kent State University Joint work with Ruoming Jin (KSU), Victor Lee (KSU) and Kun Huang (OSU)

Motivation: Financial Markets

Motivation: Biological Systems 3 Microarray time series profile Protein-Protein Interaction Fluorescence Counts

4 Vector Autoregression Univariate Autoregression is self-regression for a time- series VAR is the multivariate extension of autoregression 0t=1234T

5 Granger Causality Goal: reveal causal relationship between two univariate time series. –Y is Granger causal for X at time t if X t-1 and Y t-1 together are a better predictor for X t than X t-1 alone. –i.e., compare the magnitude of error ε(t) vs. ε′(t)

Temporal Graphical Modeling Recover the causal structure among a group of relevant time series X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 temporal graphical model X1X1 X3X3 X2X2 X5X5 X4X4 X7X7 X6X6 X8X8 Φ 12

The Problem Given a temporal graphical model, can we decompose it to get a simpler global view of the interactions among relevant time series? How to interpret these causal relationships ???

Extra Benefit X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 Clustering based on similarity Consider time series clustering from a new perspective! X1X1 X2X2 X8X8 X7X7 X6X6 X5X5 X4X4 X3X3 X1X1 X3X3 X8X8 X7X7 X6X6 X5X5 X4X4 X2X2 X1X1 X3X3 X2X2 X5X5 X4X4 X7X7 X6X6 X8X8

Clustered Regression Coefficient Matrix Vector Autoregression Model –Φ(u) is a NxN coefficient matrix Clustered Regression Coefficient Matrix 1)ifΦ(u) ij ≠0, then time series i and j are in the same cluster 2)if time series i and j are not in the same cluster, then Φ(u) ij =0 submatrix

Temporal Graphical Model Decomposition Cost Goal: preserve prediction accuracy while reducing representation cost Given a temporal graphical model, the cost for model decomposition is Problem –Tend to group all time series into one cluster prediction error L2 penalty

Refined Cost for Decomposition Balance size of clusters –C is NxK membership matrix Overall cost is the sum of three parts Optimal Decomposition Problem –Find a cluster membership matrix C and its regression coefficient matrix Φ such that the cost for decomposition is minimal prediction error L2 penalty size constraint X2X2 C1C1

Hardness of Decomposition Problem Combined integer (membership matrix) and numerical (regression coefficient matrix) optimization problem Large number of unknown variables –NxK variables in membership matrix –NxN variables in regression coefficient matrix

Basic Idea for Iterative Optimization Algorithm Relax binary membership matrix C to probabilistic membership matrix P Optimize membership matrix while fixing regression coefficient matrix Optimize regression coefficient matrix while fixing membership matrix Employ two optimization steps iteratively to get a local optimal solution

Overview of Iterative Optimization Algorithm Time Series Data Temporal Graphical Model Optimize cluster membership matrix Quasi-Newton Method Optimize regression coefficient matrix Generalized ridge regression Step 1Step 2

Step 1: Optimize Membership Matrix Apply Lagrange multiplier method: Quasi-Newton method –Approximate Hessian matrix by iteratively updating

Step 2: Optimize Regression Coefficient Matrix Decompose cost functions into N subfunctions Generalized Ridge Regression –y k is a vector related with P and X (length L) –X k is a matrix related with P and X (size LxN) k=1, traditional ridge regression constant

Complexity Analysis Step 1 is the computational bottleneck of entire algorithm NxK+N Update Hessian Matrix takes Compute coefficient matrix N N NxK

Basic Idea for Scalable Approach Utilize variable dependence relationship to optimize each variable (or a small number of variables) independently, assuming other relationships are fixed Convert the problem to a Maximal Weight Independent Set (MWIS) problem

Experiments: Synthetic Data Synthetic data generator –Generate community-based graph as underlying temporal graphical model [Girvan and Newman 05] –Assign random weights to graphical model and generate time series data using recursive matrix multiplication [Arnold et al. 07] Decomposition Accuracy –Find a matching between clustering results and ground-truth clusters such that the number of intersected variables are maximal –The number of intersected variables over total number of variables is decomposition accuracy

Experiments: Synthetic Data (cont.) Applied algorithms –Iterative optimization algorithm based on Quasi- Newton method (newton) –Iterative optimization algorithm based on MWIS method (mwis) –Benchmark 1: Pearson correlation test to generate temporal graphical model, and Ncut [Shi00] for clustering (Cor_Ncut) –Benchmark 2: directed spectral clustering [Zhou05] on ground-truth temporal graphical model (Dcut)

Experimental Results: Synthetic On average, newton is better than Cor_Ncut and Dcut by 27% and 32%, respectively On average, mwis is better than Cor_Ncut and Dcut by 24% and 29%, respectively

Experimental Results: Synthetic mwis is better than Cor_Ncut by an average of 30% mwis is better than Dcut by an average of 52%

Experiment: Real Data Data –Annual GDP growth rate (downloaded from –192 countries 4 Time periods – – – – Hierarchically bipartition into 6 or 7 clusters

Experimental Result: Real Data

Summary We formulate a novel objective function for the decomposition problem in temporal graphical modeling. We introduce an iterative optimization approach utilizing Quasi-Newton method and generalized ridge regression. We employ a maximum weight independent set based approach to speed up the Quasi-Newton method. The experimental results demonstrate the effective and efficiency of our approaches.

Thank you