1 1 Sparsity Control for Robustness and Social Data Analysis Gonzalo Mateos ECE Department, University of Minnesota Acknowledgments: Profs. Georgios B. Giannakis, M. Kaveh G. Sapiro, N. Sidiropoulos, and N. Waller MURI (AFOSR FA9550-10-1-0567) grant Minneapolis, MN December 9, 2011

2 2 2 Learning from “ Big Data ” `Data are widely available, what is scarce is the ability to extract wisdom from them ’ Hal Varian, Google ’ s chief economist BIG Fast Productive Revealing Ubiquitous Smart K. Cukier, ``Harnessing the data deluge,'' Nov. 2011. Messy

3 3 3 Social-Computational Systems The means: leverage dual role of sparsity  Complexity control through variable selection  Robustness to outliers Complex systems of people and computers The vision: preference measurement (PM), analysis, management  Understand and engineer SoCS

4 4 4 Conjoint analysis Marketing, healthcare, psychology [Green-Srinivasan ‘ 78] Success story [Wind et al ’ 89] Attributes: room size, TV options, restaurant, transportation Goal: learn consumer ’ s utility function from preference data  Linear utilities: `How much is each part worth? ’ Optimal design and positioning of new products  Strategy: describe products by a set of attributes, `parts ’

5 5 5 Modeling preliminaries Respondents (e.g., consumers)  Rate profiles Each comprises attributes Linear utility: estimate vector of partworths Conjoint data collection formats (M1) Metric ratings: (M2) Choice-based conjoint data: Online SoCS-based preference data exponentially increases  Inconsistent/corrupted/irrelevant data Outliers

6 6  residuals discarded 6 Robustifying PM Least-trimmed squares [Rousseeuw ’ 87] (LTS) Q: How should we go about minimizing nonconvex (LTS)? A: Try all subsets of size, solve, and pick the best  is the -th order statistic among G. Mateos, V. Kekatos, and G. B. Giannakis, ``Exploiting sparsity in model residuals for robust conjoint analysis,'' Marketing Sci., Dec. 2011 (submitted). Simple but intractable beyond small problems  Near optimal solvers [Rousseeuw ’ 06], RANSAC [Fischler-Bolles ’ 81]

7 7 7 Modeling outliers Outlier variables s.t. outlier otherwise  Both and unknown, typically sparse! Natural (but intractable) nonconvex estimator  Nominal ratings obey (M1); outliers something else -contamination [Fuchs ’ 99], Bayesian model [Jin-Rao ’ 10]

8 8 8 LTS as sparse regression Lagrangian form  Tuning parameter controls sparsity in number of outliers (P0)  Formally justifies the preference model and its estimator (P0)  Ties sparse regression with robust estimation Proposition 1: If solves (P0) with chosen s.t., then in (LTS).

9 9 9 Just relax! (P1)  (P1) convex, and thus efficiently solved  Role of sparsity-controlling is central Q: Does (P1) yield robust estimates ? A: Yap! Huber estimator is a special case where (P0) is NP-hard relax e.g., [Tropp ’ 06]

10 10 Lassoing outliers Suffices to solve Lasso [Tibshirani ’ 94] Data-driven methods to select  Lasso solvers return entire robustification path (RP) Proposition 2:, Minimizers of (P1) are Coeffs. Decreasing

11 11 Nonconvex regularization Nonconvex penalty terms approximate better in (P0) Options: SCAD [Fan-Li ’ 01], or sum-of-logs [Candes et al ’ 08] Iterative linearization-minimization of around  Initialize with, use  Bias reduction (cf. adaptive Lasso [Zou ’ 06])

12 12 Comparison with RANSAC, i.i.d. Nominal: Outliers:

13 13 Nonparametric regression If one trusts data more than any parametric model  Go nonparametric regression:  lives in a space of “ smooth ’’ functions Ill-posed problem  Workaround: regularization [Tikhonov ’ 77], [Wahba ’ 90]  RKHS with kernel and norm Interactions among attributes?  Not captured by  Driven by complex mechanisms hard to model

14 14 Function approximation True function Nonrobust predictions Robust predictionsRefined predictions  Effectiveness in rejecting outliers is apparent G. Mateos and G. B. Giannakis, ``Robust nonparametric regression via sparsity control with application to load curve data cleansing,'' IEEE Trans. Signal Process., 2012

15 15 Load curve data cleansing Load curve: electric power consumption recorded periodically  Reliable data: key to realize smart grid vision [Hauser ’ 09] Uruguay ’ s power consumption (MW)  Faulty meters, communication errors  Unscheduled maintenance, strikes, sport events B-splines for load curve prediction and denoising [Chen et al ’ 10]

16 16 NorthWrite data Data: courtesy of NorthWrite Energy Group, provided by Prof. V. Cherkassky  Outliers: “ Building operational transition shoulder periods ”  No manual labeling of outliers [Chen et al ’ 10] Energy consumption of a government building ( ’ 05- ’ 10)  Robust smoothing spline estimator, hours

17 17 Principal Component Analysis Our goal: robustify PCA by controlling outlier sparsity Motivation: (statistical) learning from high-dimensional data Principal component analysis (PCA) [Pearson ’ 1901]  Extraction of low-dimensional data structure  Data compression and reconstruction  PCA is non-robust to outliers [Jolliffe ’ 86] DNA microarray Traffic surveillance

18 18 Our work in context Robust PCA  Robust covariance matrix estimators [Campbell ’ 80], [Huber ’ 81]  Computer vision [Xu-Yuille ’ 95], [De la Torre-Black ’ 03]  Low-rank matrix recovery from sparse errors, e.g., [Wright et al ’ 09] Contemporary applications tied to SoCS  Anomaly detection in IP networks [Huang et al ’ 07], [Kim et al ’ 09]  Video surveillance, e.g., [Oliver et al ’ 99]  Matrix completion for collaborative filtering, e.g., [Candes et al ’ 09]

19 19 PCA formulations Training data Minimum reconstruction error  Compression operator  Reconstruction operator Maximum variance Component analysis model Solution:

20 20 Robustifying PCA Outlier-aware model G. Mateos and G. B. Giannakis, ``Robust PCA as bilinear decomposition with outlier sparsity regularization,'' IEEE Trans. Signal Process., Nov. 2011 (submitted).  Interpret: blind preference model with latent profiles (P2)  -norm counterpart tied to (LTS PCA)  (P2) subsumes optimal (vector) Huber  -norm regularization for entry-wise outliers

21 21 Alternating minimization (P2)  update: SVD of outlier-compensated data  update: row-wise vector soft-thresholding Proposition 3: Alg. 1 ’ s iterates converge to a stationary point of (P2). 1

22 22 Video surveillance Data: http://www.cs.cmu.edu/~ftorre/ OriginalPCARobust PCA `Outliers ’

23 23 Big Five personality factors Five dimensions of personality traits [Goldberg ’ 93][Costa-McRae ’ 92]  Measure the Big Five  Short-questionnaire (44 items)  Rate 1-5, e.g., `I see myself as someone who … … is talkative ’ … is full of energy ’ Big Five Inventory (BFI) Handbook of personality: Theory and research, O. P. John, R. W. Robins, and L. A. Pervin, Eds. New York, NY: Guilford Press, 2008.  Discovered through factor analysis  WEIRD subjects

24 24 BFI data 24 Robust PCA identifies 8 outlying subjects  Validated via `inconsistency ’ scores, e.g., VRIN [Tellegen ’ 88] Eugene-Springfield community sample [Goldberg ’ 08]  subjects, item responses, factors Data: courtesy of Prof. L. Goldberg, provided by Prof. N. Waller

25 25 Online robust PCA Motivation: Real-time data and memory limitations Exponentially-weighted robust PCA  At time, do not re-estimate

26 26 Online PCA in action Outliers: Nominal:

27 27 Robust kernel PCA Kernel (K)PCA [ Scholkopf ‘ 97 ]  Challenge: -dimensional Kernel trick: Input space Feature space Related to spectral clustering

28 28 Unveiling communities Data: http://www-personal.umich.edu/~mejn/netdata/ Network: NCAA football teams (nodes), F ’ 00 games (edges)  teams, kernel  Identified exactly: Big 10, Big 12, ACC, SEC, Big East  Outliers: Independent teams ARI=0.8967

29 29 Spectrum cartography Goal: find s.t. is the spectrum at position Approach: Basis expansion model for, nonparametric basis pursuit Idea: collaborate to form a spatial map of the spectrum SPECTRUM MAPSPECTRUM MAP OriginalEstimated J. A. Bazerque, G. Mateos, and G. B. Giannakis, ``Group-Lasso on splines for spectrum cartography,'' IEEE Trans. Signal Process., Oct. 2011.

30 30 Technical Approaches:  Consensus-based in-network operation in ad hoc WSNs  Distributed optimization using alternating-direction methods  Online learning of statistics using stochastic approximation  Performance analysis via stochastic averaging Distributed adaptive algorithms Issues and Significance:  Fast varying (non-)stationary processes  Unavailability of statistical information  Online incorporation of sensor data  Noisy communication links Improved learning through cooperation G. Mateos, I. D. Schizas, and G. B. Giannakis, ``Distributed recursive least-squares for consensus-based in-network adaptive estimation,' ‘ IEEE Trans. Signal Process., Nov. 2009. Wireless sensor

31 31 Unveiling network anomalies Anomalies across flows and timeEnhanced detection capabilities Approach: Flag anomalies across flows and time via sparsity and low rank Payoff: Ensure high performance, QoS, and security in IP networks M. Mardani, G. Mateos, and G. B. Giannakis, ``Unveiling network anomalies across flows and time via sparsity and low rank,'' IEEE Trans. Inf. Theory, Dec 2011 (submitted).

32 32 OUTLIER-RESILIENT ESTIMATION SIGNAL PROCESSING LASSO 32 Concluding summary Research issues addressed  Sparsity control for robust metric and choice-based PM  Kernel-based nonparametric utility estimation  Robust (kernel) principal component analysis  Scalable distributed real-time implementations Control sparsity in model residuals for robust learning Application domains  Preference measurement and conjoint analysis  Psychometrics, personality assessment  Video surveillance  Social and power networks Experimental validation with GPIPP personality ratings ( ~ 6M) Gosling-Potter Internet Personality Project (GPIPP) - http://www.outofservice.com

