Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Slides:

Advertisements

Similar presentations

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Advertisements

Numerical Linear Algebra in the Streaming Model

RandNLA: Randomized Numerical Linear Algebra

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

The Stability of a Good Clustering Marina Meila University of Washington

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

1.2 Row Reduction and Echelon Forms

Linear Equations in Linear Algebra

1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.

Randomized matrix algorithms and their applications

Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur.

Tirgul 9 Amortized analysis Graph representation.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005

Singular Value Decomposition COS 323. Underconstrained Least Squares What if you have fewer data points than parameters in your function?What if you have.

Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Kathryn Linehan Advisor: Dr. Dianne O’Leary

Orthogonality and Least Squares

Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.

SVD(Singular Value Decomposition) and Its Applications

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.

 Karthik Gurumoorthy  Ajit Rajwade  Arunava Banerjee  Anand Rangarajan Department of CISE University of Florida 1.

AN ORTHOGONAL PROJECTION

SVD: Singular Value Decomposition

Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.

Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel

CPSC 491 Xin Liu November 19, Norm Norm is a generalization of Euclidean length in vector space Definition 2.

Elementary Linear Algebra Anton & Rorres, 9th Edition

MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.

Welcome to RPI CS! Theory Group Professors: Mark Goldberg Associate Professors: Daniel Freedman, Mukkai Krishnamoorthy, Malik Magdon- Ismail, Bulent Yener.

Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer.

Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.

Dimensionality Reduction

Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.

Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Singular Value Decomposition and Numerical Rank. The SVD was established for real square matrices in the 1870’s by Beltrami & Jordan for complex square.

Chapter 13 Discrete Image Transforms

1 1.2 Linear Equations in Linear Algebra Row Reduction and Echelon Forms © 2016 Pearson Education, Ltd.

1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.

Vector Semantics Dense Vectors.

A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.

Singular Value Decomposition and its applications

Lecture 8:Eigenfaces and Shared Features

Streaming & sampling.

Singular Value Decomposition

Reminder: Array Representation In C++

Principal Component Analysis

Singular Value Decomposition SVD

Reminder: Array Representation In C++

Presentation transcript:

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint work with R. Kannan and M. DIMACS Workshop on Privacy Preserving Data Mining

2 Motivation (Data Mining) In many applications large matrices appear (too large to store in RAM). We can make a few “passes” (sequential READS) through the matrices. We can create and store a small “sketch” of the matrices in RAM. Computing the “sketch” should be a very fast process. Discard the original matrix and work with the “sketch”.

3 Motivation (Privacy Preserving) In many applications, instead of revealing a large matrix, we only reveal its “sketch”. Intuition: The “sketch” is an approximation to the original matrix. Instead of viewing the approximation as a “necessary evil”, we might be able to use it to achieve privacy preservation (similar ideas in Feigenbaum et. al., ICALP 2001). Goal: Formulate a technical definition of privacy that might be achievable by such “sketching” algorithms and provide meaningful and quantifiable protection. Achieving the goal is an open problem !

4 Our approach & our results 1.A “sketch” consisting of a few rows/columns of the matrix is adequate for efficient approximations. [see D & Kannan ’03, and D, Kannan & Mahoney ’04] 2.We draw the rows/columns randomly, using adaptive sampling; e.g. rows/columns are picked with probability proportional to their lengths. Create an approximation to the original matrix which can be stored in much less space.

5 Overview A Data Mining setup Approximating a large matrix Algorithm Error bounds Tightness of the results An alternative approach (Achlioptas and McSherry ’01 and ’03) Conclusions

6 Applications: Data Mining We are given m (>10 6 ) objects and n(>10 5 ) features describing the objects. Database An m-by-n matrix A (A ij shows the “importance” of feature j for object i). Every row of A represents an object. Queries Given a new object x, find similar objects in the database (nearest neighbors).

7 Applications (cont’d) Key observation: The exact value x T · d might not be necessary. 1.The feature values in the vectors are set by coarse heuristics. 2.It is in general enough to see if x T · d > Threshold. Two objects are “close” if the angle between their corresponding vectors is small. So, assuming that the vectors are normalized, x T ·d = cos(x,d) is high when the two objects are close. A·x computes all the angles and answers the query.

8 Using an approximation to A Assume that A’ = CUR is an approximation to A, such that A’ is stored efficiently (e.g. in RAM). Given a query vector x, instead of computing A · x, compute A’ · x to identify its nearest neighbors. The CUR algorithm guarantees a bound on the worst case choice of x.

9 Approximating A efficiently Given a large m-by-n matrix A (stored on disk), compute an approximation A’ to A such that: 1.A’ can be stored in O(m+n) space, after making two passes through the entire matrix A, and using O(m+n) additional space and time. 2.A’ satisfies (with high probability) ||A-A’|| 2 2 < ε ||A|| F 2 (and a similar bound with respect to the Frobenius norm).

10 Describing A’ = C · U · R C consists of c = θ (1/ε 2 ) columns of A and R consists of r = θ (1/ε 2 ) rows of A (the “description length” of A is O(m+n)). C and R are created using adaptive sampling.

11 Creating C and R Create C (R) by performing c (r) i.i.d trials. In each trial, pick a column (row) of A with probability Include A (i) (A (i) ) as a column of C (R). [A (i) (A (i) ) is the i-th column (row) of A]

12 Singular Value Decomposition (SVD) 1. Exact computation of the SVD takes O(min(mn 2, m 2 n)) time. 2. The top few singular vectors/values can be approximated faster (Lanczos/ Arnoldi methods). U (V): orthogonal matrix containing the left (right) singular vectors of A.  : diagonal matrix containing the singular values of A.

13 Rank k approximations (A k ) A k is a matrix of rank k such that ||A-A k || 2,F is minimized over all rank k matrices! U k (V k ): orthogonal matrix containing the top k left (right) singular vectors of A.  k : diagonal matrix containing the top k singular values of A.

14 The CUR algorithm Input: 1.The matrix A in “sparse unordered representation”. (e.g. non-zero entries of A are presented as triples (i,j,A ij ) in any order) 2.Positive integers c < n and r < m (number or columns/rows that we pick). 3.Positive integer k (the rank of A’=CUR). Note: Since A’ is of rank k, ||A-A’|| 2,F >= ||A-A k || 2,F. We choose a k such that ||A-A k || 2,F is small. As k grows, for the Frobenius norm approximation, c and r grow as well.

15 Computing U Intuition: The CUR algorithm essentially expresses every row of the matrix A as a linear combination of a small subset of the rows of A. This small subset consists of the rows in R. Given a row of A – say A (i) – the algorithm computes the “best fit” for the row A (i) using the rows in R as the basis. e.g. Notice that only c = O(1) element of the i-th row are given as input. However, a vector of coefficients u can still be computed.

16 Creating U Running time Computing the elements of U amounts to a pseudo-inverse computation. It can be done in O(c 2 m + c 3 + r 3 ) time. Thus, U can be computed in O(m) time. Note on the rank of U and CUR The rank of U (by construction) is k. Thus, the rank of A’=CUR is at most k.

17 Error bounds (Frobenius norm) Assume A k is the “best” rank k approximation to A (through SVD). Then We need to pick O(k/ε 2 ) rows and O(k/ε 2 ) columns.

18 Error bounds (2-norm) Assume A k is the “best” rank k approximation to A (through SVD). Then since |A-A k | 2 2 <= |A| F 2 /(k+1). We need to pick O(1/ε 2 ) rows and O(1/ε 2 ) columns.

19 Can we do better? Lemma For any  < 1, there is a set of Ω (  –n ) n-by-n matrices, such that for two distinct matrices A,B in the set, ||A-B|| 2 2 > (  /20)||A|| F 2 Lower bound Theorem Any algorithm which approximates these matrices must output a different “sketch” for each one, thus it must output at least Ω (n log(1/  )) bits Tighter lower bounds, matching almost exactly with our upper bounds, have been obtained by Ziv-Bar Yossef, STOC ’03.

20 A different technique (D. Achlioptas and F. McSherry, ’01 and ’03) The Algorithm in 2 lines: To approximate a matrix A, keep a few elements of the matrix (instead of rows or columns) and zero out the remaining elements. Compute a rank k approximation to this sparse matrix (using Lanczos methods). Comparing the two techniques: The error bound w.r.t. the 2-norm is better, while the error bound w.r.t. the Frobenius norm is the same. (weighted sampling is used - heavier elements are kept with higher probabilities) Running times are the same.

21 Conclusions Given the small “sketch” of a matrix A, a “friendly user” can reconstruct a (provably accurate) approximation A’ to the original matrix A and employ any algorithms that he would use to process the original matrix A on A’, use the Frobenius and spectral norm bounds for A-A’ to argue about the approximation error of his algorithms. How do we ensure privacy for the object-vectors (rows) of A that are revealed as part of R? Are such sketches offering some privacy preserving guarantees, under some (relaxed) definition of privacy?