Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mauro Mezzini ANSWERING SUM-QUERIES : A SECURE AND EFFICIENT APPROACH University of Rome “La Sapienza” Computer Science Department.

Similar presentations


Presentation on theme: "Mauro Mezzini ANSWERING SUM-QUERIES : A SECURE AND EFFICIENT APPROACH University of Rome “La Sapienza” Computer Science Department."— Presentation transcript:

1 Mauro Mezzini ANSWERING SUM-QUERIES : A SECURE AND EFFICIENT APPROACH University of Rome “La Sapienza” Computer Science Department

2 Introduction Statistical database: users are allowed to ask statistical information such as sum, count, average, max and min queries on a numerical attribute. PRODUCT SALES(€) storage 90000 router 30000 server 30000 mainframe 25000 select sum( SALES ) from Retail where PRODUCT = “storage” or PRODUCT = “router”; Retail r = 120.000

3 Introduction Definition: The target K of a query q. select sum( SALES ) from Retail where PRODUCT = “storage” or PRODUCT = “router”; PRODUCT K storage router

4 The efficiency issue To speed up the answer of a sum-query, the query system is endowed with a set of pre-computed sum-queries called the set of materialised views. select sum( SALES ) q 2 from Retail where PRODUCT = “storage” or PRODUCT = “router”; q 1 select sum( SALES ) from Retail r 1 = 175.000 r 2 = 120.000 select sum( SALES ) q from Retail where PRODUCT = “server” or PRODUCT = “mainframe”; r = r 1  r 2 = 55.000

5 Protection issue To protect the confidentiality of the numerical attribute, the query system is endowed with the list of all sensitive categories. q 1 select sum( SALES) from Retail where PRODUCT = “storage”; q 2 select sum( SALES) from Retail where PRODUCT = “router”; PRODUCT SALES(€) storage 90000 routers 30000 server 30000 mainframe 25000

6 select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 Protection issue x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 The value of all confidential information can be inferred from the answer of non–confidential queries {q 1, q 2, q 3 }.

7 The inference model Efficiency : Given a set of sum-queries V = {q 1,…,q n } determine if the result of q can be inferred from V. Protection : Given a set of sum-queries V = {q 1,…,q n } determine for every inferable sum-query q if the result of q is a sensitive information.

8 The inference model Let V = {q 1, q 2, …,q n } Let K i and r i be the target and the result of q i respectively Let  ={C 1, C 2,…, C m } be the coarsest partition of  i=1,…,n K i such that each K i can be obtained by the union of one or more elements of  The inference model is based on the following linear constraints system  j=1,…,m a i,j x j = r i i=1,…,n x  F m where a i,j = 1 if C j  K i and a i,j = 0 otherwise and F is the domain of the numerical attribute (1)

9 The inference model. An example K 1 ={router, server} C 1 ={router} C 2 ={server} C 3 ={storage} F is the set of non-negative reals select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 K 2 ={storage, server} K 3 ={storage, router}

10 The inference model Definition: Given a subset S of {1,2,…,m} the sum-expression  j  S x j is an F - invariant if it takes on the same value for every solution x of (1). An F -invariant sum is the result of the sum-query with target  j  S C j

11 The inference model Definitions: Given the partition  = {C 1,…,C m } and a query q with target K the two sets: S = { j : C j  K} the support of q S = { j : C j  K   and C j - K   } the cosupport of q The sum  j  S  S x j is called the sum-expression associated to q.

12 The inference model. An example q select sum( SALES) from Retail where PRODUCT = “storage”; The support of q is { 3 }, the cosupport is empty and the sum-expression associated to q is trivially: x 3 K 1 ={router, server} C 1 ={router} C 2 ={server} C 3 ={storage} select sum( SALES) from Retail q 1 where PRODUCT = “router” or PRODUCT = “server”; select sum( SALES) from Retail q 2 where PRODUCT = “storage” or PRODUCT = “server”; select sum( SALES) from Retail q 3 where PRODUCT = “storage” or PRODUCT = “router”; r 1 = 120.000 r 2 = 60.000 r 3 =120.000 K 2 ={storage, server} K 3 ={storage, router} x 1 + x 2 = r 1 x 2 + x 3 = r 2 x 1 + x 3 = r 3 K={storage}

13 Problems definitions 1)Given a sum-expression  j  S x j decide whether it is an F - invariant. 2)Given a sum-expression  j  S x j that is not an F -invariant, find a nonempty subset S of S such that  j  S x j is an F - invariant.

14 Let S be a subset of {1,…,m} and let s be the characteristic vector of S. Then 1 if i  S 0 if i  S Problem (2) s(i)= i = 1,…,m

15 Problem (2) An m-dimensional f vector is a linear combination of rows of A if We can rewrite system (1) as : A x = r, x  F m f =  i=1,…,m  i a i  i  R a i is a row of A i=1,…,m

16 Problem (2) Definition: A subset S of {1,2,…,m} is said to be algebraic if its characteristic vector can be expressed as a linear combination of the rows of the matrix A. If F is R, or Z then  j  S x j is F- invariant if and only if S is algebraic.

17 Problem definition :Given a sum expression  j  S x j that is not R- invariant, find a non-empty algebraic subset of S (NAS Problem). NAS Problem : find a non-empty subset F of S such that the characteristic vector of F is expressible as a linear combination of rows of A The NAS Problem

18 The subset sum problem (SSP): Given a set S = {1,…,p} and a mapping a:S  Z such that a(i) > 0 for i=1,…,p-1 and a(i) < 0 for i=p find a subset F of S such that  i  F a(i) = 0 The NAS Problem

19 Let c be a q-dimensional vector, with q≥p, such that c(1) = a(1) c(2) = a(2) …. c(p) = a(p) and c(j)  R for p<j  q Let M = (I, c) be the q  (q+1) matrix obtained from c. The NAS Problem

20 Example: let S={1, 2, 3, 4} and a(1) = 1 a(2) = 2 a(3) = 5 a(4)= -7 The subset F = { 2, 3, 4} of S is a solution of the SSP since a(2) + a(3) + a(4) = 2 + 5 – 7 = 0. The NAS Problem

21 If we chose q = 5 the vector c is (1, 2, 5, -7,  ) and the matrix M is 1 0 0 0 0 1 0 1 0 0 0 2 0 0 1 0 0 5 0 0 0 1 0 -7 0 0 0 0 1 

22 The NAS Problem The vector c= (  c, 1) is a solution of the equation M y = 0 y 1 +1 y 6 = 0 y 2 +2 y 6 = 0 y 3 +5 y 6 = 0 y 4  7 y 6 = 0 y 5 +  y 6 = 0

23 The NAS Problem Theorem: Given the matrix M and the set S = { 1,…,p} then the SSP as a solution if and only if there exist a nonempty algebraic subset of S. Proof The (q+1)-dimensional vector c= (  c, 1) spans the null space of M M y = 0 and the null space of M has dimension equal to one.

24 The NAS Problem If F  S is an algebraic set then its characteristic vector f is expressible as a linear combination of rows of M. Since f and c are orthogonal then  i=1,…,q+1 f(i) c(i) = 0 that is 0 =  i  F c(i) =   i  F a(i) qed.

25 The NAS Problem Example: let S={1, 2, 3} and a(1) = 2 a(2) = 2 a(3) =  4 then c 0 = (2, 2,  4) c 1 = (  1,  1, 1) c 2 = (  1,  1, 1) c 3 = ( 2, 2,  2, 1, 1) let c = (c 0, c 1, c 2, c 3 )

26 Then M would be 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

27 Step (1) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

28 Step (3) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

29 Step (4) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -4 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

30 Step (5) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

31 Step (6) 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

32 Final step 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 The NAS Problem c0c0 c1c1 c2c2 c3c3

33 a(i) > 1 i=1,..,p-1 c i = ( ) k i =  log 2 a(i) 

34 The NAS Problem a(i) = 7 c i = ( -3, -3, 3, -1, -1, 1 ) k i =  log 2 7  = 2 a(i) = 8 c i = ( -4, -4, 4, -2, -2, 2, -1, -1, 1 ) k i =  log 2 8  = 3

35 The NAS Problem B = max{ |a(i)| : i = 1,…,p} The SSP has input dimension equal to O( p × log 2 (B)). k i   log 2 (B) The dimension of the matrix M is q × (q +1) where q  ( p + 1 ) × 3 log 2 (B)  O( p × log 2 (B) )

36 Solving problem (1) A x = r, x  F m  j  S x j is an F -invariant? A is the vertex-edge incidence matrix of a graph, F is the set of reals and S is singleton. x1x1 x2x2 x7x7 x8x8 x6x6 x4x4 x3x3 x5x5 r1r1 r2r2 r3r3 r4r4 r5r5 r6r6

37 Solving problem (1) Consider the homogeneous system associated to system (1) A y = 0, y  R m (2) We call circulation a solution y of system (2). ++ ++ -- -- 0 0 0 0 0 0 0 0 0 0

38 Solving problem (1) Definition : given a circulation y then its support is the set C = { e : y e  0 } 0 0 0 0 ++ ++ -- --

39 Solving problem (1) Theorem 1: The unknown x e is an R- invariant if and only if  circulation y with support C then e  C. Proof: Let x* be a particular solution of (1). Then x = x* + y So if y e =0,  circulation y then x e = x e *,  solution x of (1). If x e is invariant then x e – x e * = 0 = y e For every solution x of (1). Therefore y e = 0 for every circulation y.

40 Solving problem (1) Definition : A circulation y with support C is minimal if there is no circulation with support C such that C  C. ++ +3  -2  ++ -- ++ --

41 Solving problem (1) The support of minimal circulations are called circuits and are the even cycles and the L-oddsets of the graph. ++ ++ -- -- +2  -- ++ ++ -- -- ++ -2  -- ++ -- -- ++ ++

42 Solving problem (1) Given a circulation y then y =  i=1,…,p  i y i where  i  R B={y 1,…, y p } is a base of N each y i is a circuit of G

43 Solving problem (1) +2  -- -- ++ -- ++ -- +β+β + β - β

44 Solving problem (1) Theorem 2: The unknown x e is an R- invariant if and only if  circuit y i with support C then e  C. Proof: y e =  i=1,…,p  i y i,e = 0

45 Solving problem (1) An odd edge is an edge of G belonging to every odd cycles of G. A bridge is an edge of G whose removal disconnect G.

46 Solving problem (1) Theorem 3: The unknown x e is an R- invariant if and only if e is an odd edge or is a bridge that disconnect a bipartite component of G. Proof: 1) If e belongs to all odd cycles of G then G cannot contains an l-oddset. 2) If e is a bridge then it cannot belong to an even cycle.

47 Solving problem (1) The case when e is an odd edge. Let for contraddiction D be an even cycle containing e. D  C is a set of edge-disjoint cycles not containing e. |D  C| = |D| +|C|  2 |D  C| |D  C| is odd and D  C must contains at least one odd cycle (contraddiction).

48 Solving problem (1) The case when e is a bridge disconnecting a bipartite component. e non bipartite component bipartite component

49 Solving problem (1) E(H) = { e : e is a bridge of G} V(H) = { v : v is a connected component of G  E(H)} G H

50 Solving problem (1) Step 1

51 Solving problem (1) Step 2

52 Solving problem (1) Step 3

53 Solving problem (1) Step 4

54 Solving problem (1) Step 5

55 Solving problem (1) Step 6

56 Solving problem (1) Step 7

57 Solving problem (1) Step 8

58 Solving problem (1) A DFS traversal of a graph gives a partition of the edges of G tree edges back edges Each back edge e generates a cycle C(e) The cycle C(e) is called a fundamental cycle with respect to the tree T

59 Solving problem (1) Proposition: every cycle of G can be obtained as the symmetric difference of one or more fundamental cycles. If e is an odd edge then 1)it must belong to every fundamental odd cycle of G 1)no fundamental even cycle of G contains e

60 Solving problem (1) A back edge e belong to every fundamental odd cycle of G if and only if C(e) is the only fundamental odd cycle. For every tree edge e we count the number of odd and even fundamental cycles containing e.


Download ppt "Mauro Mezzini ANSWERING SUM-QUERIES : A SECURE AND EFFICIENT APPROACH University of Rome “La Sapienza” Computer Science Department."

Similar presentations


Ads by Google