Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.

Similar presentations


Presentation on theme: "Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint."— Presentation transcript:

1 Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute drinep@cs.rpi.edu (joint work with R. Kannan and M. Mahoney) @ DIMACS Workshop on Privacy Preserving Data Mining

2 2 Motivation (Data Mining) In many applications large matrices appear (too large to store in RAM). We can make a few “passes” (sequential READS) through the matrices. We can create and store a small “sketch” of the matrices in RAM. Computing the “sketch” should be a very fast process. Discard the original matrix and work with the “sketch”.

3 3 Motivation (Privacy Preserving) In many applications, instead of revealing a large matrix, we only reveal its “sketch”. Intuition: The “sketch” is an approximation to the original matrix. Instead of viewing the approximation as a “necessary evil”, we might be able to use it to achieve privacy preservation (similar ideas in Feigenbaum et. al., ICALP 2001). Goal: Formulate a technical definition of privacy that might be achievable by such “sketching” algorithms and provide meaningful and quantifiable protection. Achieving the goal is an open problem !

4 4 Our approach & our results 1.A “sketch” consisting of a few rows/columns of the matrix is adequate for efficient approximations. [see D & Kannan ’03, and D, Kannan & Mahoney ’04] 2.We draw the rows/columns randomly, using adaptive sampling; e.g. rows/columns are picked with probability proportional to their lengths. Create an approximation to the original matrix which can be stored in much less space.

5 5 Overview A Data Mining setup Approximating a large matrix Algorithm Error bounds Tightness of the results An alternative approach (Achlioptas and McSherry ’01 and ’03) Conclusions

6 6 Applications: Data Mining We are given m (>10 6 ) objects and n(>10 5 ) features describing the objects. Database An m-by-n matrix A (A ij shows the “importance” of feature j for object i). Every row of A represents an object. Queries Given a new object x, find similar objects in the database (nearest neighbors).

7 7 Applications (cont’d) Key observation: The exact value x T · d might not be necessary. 1.The feature values in the vectors are set by coarse heuristics. 2.It is in general enough to see if x T · d > Threshold. Two objects are “close” if the angle between their corresponding vectors is small. So, assuming that the vectors are normalized, x T ·d = cos(x,d) is high when the two objects are close. A·x computes all the angles and answers the query.

8 8 Using an approximation to A Assume that A’ = CUR is an approximation to A, such that A’ is stored efficiently (e.g. in RAM). Given a query vector x, instead of computing A · x, compute A’ · x to identify its nearest neighbors. The CUR algorithm guarantees a bound on the worst case choice of x.

9 9 Approximating A efficiently Given a large m-by-n matrix A (stored on disk), compute an approximation A’ to A such that: 1.A’ can be stored in O(m+n) space, after making two passes through the entire matrix A, and using O(m+n) additional space and time. 2.A’ satisfies (with high probability) ||A-A’|| 2 2 < ε ||A|| F 2 (and a similar bound with respect to the Frobenius norm).

10 10 Describing A’ = C · U · R C consists of c = θ (1/ε 2 ) columns of A and R consists of r = θ (1/ε 2 ) rows of A (the “description length” of A is O(m+n)). C and R are created using adaptive sampling.

11 11 Creating C and R Create C (R) by performing c (r) i.i.d trials. In each trial, pick a column (row) of A with probability Include A (i) (A (i) ) as a column of C (R). [A (i) (A (i) ) is the i-th column (row) of A]

12 12 Singular Value Decomposition (SVD) 1. Exact computation of the SVD takes O(min(mn 2, m 2 n)) time. 2. The top few singular vectors/values can be approximated faster (Lanczos/ Arnoldi methods). U (V): orthogonal matrix containing the left (right) singular vectors of A.  : diagonal matrix containing the singular values of A.

13 13 Rank k approximations (A k ) A k is a matrix of rank k such that ||A-A k || 2,F is minimized over all rank k matrices! U k (V k ): orthogonal matrix containing the top k left (right) singular vectors of A.  k : diagonal matrix containing the top k singular values of A.

14 14 The CUR algorithm Input: 1.The matrix A in “sparse unordered representation”. (e.g. non-zero entries of A are presented as triples (i,j,A ij ) in any order) 2.Positive integers c < n and r < m (number or columns/rows that we pick). 3.Positive integer k (the rank of A’=CUR). Note: Since A’ is of rank k, ||A-A’|| 2,F >= ||A-A k || 2,F. We choose a k such that ||A-A k || 2,F is small. As k grows, for the Frobenius norm approximation, c and r grow as well.

15 15 Computing U Intuition: The CUR algorithm essentially expresses every row of the matrix A as a linear combination of a small subset of the rows of A. This small subset consists of the rows in R. Given a row of A – say A (i) – the algorithm computes the “best fit” for the row A (i) using the rows in R as the basis. e.g. Notice that only c = O(1) element of the i-th row are given as input. However, a vector of coefficients u can still be computed.

16 16 Creating U Running time Computing the elements of U amounts to a pseudo-inverse computation. It can be done in O(c 2 m + c 3 + r 3 ) time. Thus, U can be computed in O(m) time. Note on the rank of U and CUR The rank of U (by construction) is k. Thus, the rank of A’=CUR is at most k.

17 17 Error bounds (Frobenius norm) Assume A k is the “best” rank k approximation to A (through SVD). Then We need to pick O(k/ε 2 ) rows and O(k/ε 2 ) columns.

18 18 Error bounds (2-norm) Assume A k is the “best” rank k approximation to A (through SVD). Then since |A-A k | 2 2 <= |A| F 2 /(k+1). We need to pick O(1/ε 2 ) rows and O(1/ε 2 ) columns.

19 19 Can we do better? Lemma For any  < 1, there is a set of Ω (  –n ) n-by-n matrices, such that for two distinct matrices A,B in the set, ||A-B|| 2 2 > (  /20)||A|| F 2 Lower bound Theorem Any algorithm which approximates these matrices must output a different “sketch” for each one, thus it must output at least Ω (n log(1/  )) bits Tighter lower bounds, matching almost exactly with our upper bounds, have been obtained by Ziv-Bar Yossef, STOC ’03.

20 20 A different technique (D. Achlioptas and F. McSherry, ’01 and ’03) The Algorithm in 2 lines: To approximate a matrix A, keep a few elements of the matrix (instead of rows or columns) and zero out the remaining elements. Compute a rank k approximation to this sparse matrix (using Lanczos methods). Comparing the two techniques: The error bound w.r.t. the 2-norm is better, while the error bound w.r.t. the Frobenius norm is the same. (weighted sampling is used - heavier elements are kept with higher probabilities) Running times are the same.

21 21 Conclusions Given the small “sketch” of a matrix A, a “friendly user” can reconstruct a (provably accurate) approximation A’ to the original matrix A and employ any algorithms that he would use to process the original matrix A on A’, use the Frobenius and spectral norm bounds for A-A’ to argue about the approximation error of his algorithms. How do we ensure privacy for the object-vectors (rows) of A that are revealed as part of R? Are such sketches offering some privacy preserving guarantees, under some (relaxed) definition of privacy?


Download ppt "Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint."

Similar presentations


Ads by Google