Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Placement Problems in Database Applications

Similar presentations


Presentation on theme: "Data Placement Problems in Database Applications"— Presentation transcript:

1 Data Placement Problems in Database Applications
An Zhu Stanford University

2 Data Placement Data objects Multiple disks
Assignment of objects to disks Optimize performance Optimize I/O Handle dynamic situations 4/21/2019 AZ

3 Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

4 Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

5 Multimedia Storage Systems
Movie objects Clients/subscribers Parallel disks Limited storage: # of movies—Nj Limited bandwidth: # of clients—Cj Homogeneous system: Nj=k, Cj=L,  j Uniform ratio: Cj/Nj=r,  j 4/21/2019 AZ

6 An Example Total Storage: 12 , Total Capacity: 1800 000/600 000/600
100 100 100 100 100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

7 An Example Total Storage: 12 , Total Capacity: 1800 400/600 000/600
100 100 000/600 100 100 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

8 An Example Total Storage: 12 , Total Capacity: 1800 400/600 400/600
000/600 100 100 400 400 Total Storage: 12 , Total Capacity: 1800 4/21/2019 AZ

9 Not All Clients Can be Satisfied
400/600 400/600 600/600 400 Total Satisfied Clients: 1400/1800=7/9 4/21/2019 AZ

10 Sliding Window Algorithm
Consider one disk at a time Maintain an ordered list of movies The first consecutive k movies (or less) with at least L combined clients Assign the first L clients to the disk and reconsider leftover clients 4/21/2019 AZ

11 An Example Max window size k=4 100 000/600 000/600 100 100 100 100 100
400 400 Max window size k=4 4/21/2019 AZ

12 An Example Max window size k=4 200 000/600 000/600 100 100 100 100 100
400 400 Max window size k=4 4/21/2019 AZ

13 An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100
4/21/2019 AZ

14 An Example Max window size k=4 400 000/600 000/600 100 100 100 100 100
4/21/2019 AZ

15 An Example Max window size k=4 000/600 000/600 100 100 100 100 100 100
400 400 700 Max window size k=4 4/21/2019 AZ

16 An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100
100 400 Max window size k=4 4/21/2019 AZ

17 An Example Max window size k=4 600/600 000/600 100 100 100 100 100 100
400 Max window size k=4 4/21/2019 AZ

18 An Example Max window size k=4 600/600 600/600 100 100 100 100 100 100
400 000/600 Max window size k=4 4/21/2019 AZ

19 An Example Total Satisfied Clients: 1600/1800=8/9 600/600 600/600 100
400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ

20 Theoretical Bounds Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients Translates to an approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ

21 Theoretical Bounds Satisfies at least fraction of total clients
In the worst case, no algorithm can satisfy more clients Translates to an approximation PTAS: (1+)-approximation, >0 4/21/2019 AZ

22 Proof Sketch Load vs. storage saturated: ML, MS Least loaded disk: cL
ML+MS=M, 0<c<1 All remaining movies each have no more than cL/k clients Initial instance is feasible (w.l.o.g.) 4/21/2019 AZ

23 An Example ML=2, MS=1, c=400/600 cL/k=100
600/600 600/600 100 100 ML=2, MS=1, c=400/600 cL/k=100 400/600 Total Satisfied Clients: 1600/1800=8/9 4/21/2019 AZ

24 Proof Outline If there is a load saturated disk with less than k movies All clients are satisfied Otherwise At most ML movies are left Satisfy at least fraction of the clients 4/21/2019 AZ

25 Lemma  If any of the load saturated disk has less than k objects
Any k-1 remaining movies in the list has L clients or more 4/21/2019 AZ

26 Lemma  The remaining disks are all load saturated
So, all clients are satisfied At least L At least L 4/21/2019 AZ

27 Otherwise… Each disk has exactly k movies Initial movies: N  M·k
Total assigned movies: M·k Initial movies: N  M·k “New” movies generated:  ML # of movies left: ≤ ML # of clients/remaining movie: ≤ cL/k Total # of remaining clients: cLML/k 4/21/2019 AZ

28 Otherwise… Total clients: ≤ M·L Assigned clients:  ML·L + Ms·cL
Total # of remaining clients : ≤ Ms·(1-c)L Final bound: 4/21/2019 AZ

29 Simulation Results M=5 L=100 N=M·k Zipf with =0.0 (  i-1 ) 4/21/2019
AZ

30 Recap The problem is NP-complete
PTAS: best possible approximation bound : best possible absolute bound Sliding window algorithm: practical with O((M+N)log(M+N)) running time 4/21/2019 AZ

31 Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the total I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

32 Relational Databases Objects: indexes, tables, views Multiple disks
Minimize the total I/O access time 4/21/2019 AZ

33 Past Work Full striping Split uniformly across all available disks
Utilize I/O parallelism : transfer rate 200MB 200MB =0.05s/MB,Tt=10s 4/21/2019 AZ

34 Past Work Full striping Split uniformly across all available disks
Utilize I/O parallelism : transfer rate 50MB 200MB =0.05s/MB,Tt=10s =0.05s/MB,Tt=2.5s 50MB 50MB 50MB 50MB 50MB 50MB 4/21/2019 AZ

35 Past Work Co-accessed objects with Random I/O
Seek time/per block size: 0.01s/0.1MB Seek rate:  =0.1s/MB Smaller object dominates A Ts=50·2=10s 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ

36 Past Work Combined access time Transfer time: Tt=(50+100)·=7.5s
Seek time: Ts=min(50,100)·=10s Combined time: Tt+Ts=17.5s A 50MB 50MB 50MB 50MB B 100MB 100MB 100MB 100MB 4/21/2019 AZ

37 Past Work Fully striping is no longer optimal [Agrawal Chaudhuri Das Narasayya 03’] Combined time: 200·=10s 200MB 200MB 100MB 100MB 4/21/2019 AZ

38 Data Layout Problem Work Load (SQL DML)
A set of queries and/or updates A set of co-accessed objects (pairwise) Access stats (pairwise) Minimize the estimated I/O access time 4/21/2019 AZ

39 Theoretical Questions
Approximation and its hardness Transfer time: P Seek time: Very Hard Combined time Hard Minimizing transfer time alone is a “good” approximation 4/21/2019 AZ

40 Transfer Time Heterogeneous disks Objects
Different rate: j Storage constraint: cj Objects Different size: si Access frequency: i,i’ Solvable using Linear Programming (LP) 4/21/2019 AZ

41 LP Amount of object i assigned to disk j
Each object must be completely assigned Each disk’s storage limit is kept Transfer time for (i,i’) on disk j Overall transfer time for (i,i’) Minimize the total transfer time 4/21/2019 AZ

42 Seek Time Hard even on disks with no storage constraint
Integral assignment Each object is assigned to one machine only Conversion from a fraction assignment with no loss 4/21/2019 AZ

43 Conversion  f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 Want: each file is spread uniformly across a subset of disks A B C B A C 100MB 150MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

44 Conversion  f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 New cost: 1002+1252 A B C B A C 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

45 Conversion  f( , )=1, f( , )=1, f( , )=0
Total seek cost: 1002+1002 New cost: 1002 A B C B A C 250MB 125MB 125MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

46 Conversion  f( , )=1, f( , )=1, f( , )=0 Total seek cost: 0
Each file resides on only one disk A B C B A C 400MB 250MB 250MB 200MB 200MB 200MB 100MB 100MB 4/21/2019 AZ

47 Implications A polynomial time algorithm
Equivalent to Minimum Edge Deletion k-Partition NP-Hard to approximate: O(n2) Forces combined time be hard to approximate 4/21/2019 AZ

48 Combined Time Let Hard to approximate: ·, 1>>0
Optimize transfer time alone gives 1+ 4/21/2019 AZ

49 Outline Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

50 Load Rebalancing Access pattern changes
Initial layout no longer balanced MAX LOAD 1 3 6 9 7 4 10 2 8 5 11 4/21/2019 AZ

51 Load Rebalancing Relocate objects Minimize the max load with  k moves
9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

52 Simple Algorithm (O(nlogn))
Step 1: Repeat k times Remove the largest object from the most loaded disk The resulting max load: L(1) Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2) 4/21/2019 AZ

53 Example (k=3) Step1: L(1)  OPT 9 1 6 MAX LOAD 9 MAX LOAD 1 L(1) 6 3 4
7 10 2 5 8 11 4/21/2019 AZ

54 Example (k=3) Step2: L(2)  OPT + S  2OPT
Overall: max(L(1),L(2))  2OPT 9 1 6 L(2) 9 1 3 MIN LOAD 6 MIN LOAD 4 7 10 2 5 8 11 4/21/2019 AZ

55 Can We Do Better? Blindly remove the large object is not wise MAX LOAD
9 1 6 3 4 7 10 2 5 8 11 4/21/2019 AZ

56 How can we do better Take care of large objects
Large objects: size >1/2OPT Small objects: size 1/2OPT OPT 10 9 1 2 11 6 3 4 7 5 8 4/21/2019 AZ

57 Revising The Plan Step 1: Repeat k times
Remove the largest object from the most loaded disk The resulting max load: L(1)  OPT Step2: Relocate the removed k objects Assign each object to the least loaded disk The resulting max load: L(2)  OPT +S  2OPT 4/21/2019 AZ

58 Revised Plan Step 1: with no more than k moves
Shuffle large objects and remove small objects The resulting max load: L(1)  3/2 OPT Step2: Relocate the removed objects Assign each object to the least loaded disk (they are all small) The resulting max load: L(2)  OPT +S  3/2 OPT just to fill in the space 4/21/2019 AZ

59 Example Step 1 2 10 11 MAX LOAD 9 1 3/2 OPT 1 6 3 4 7 10 2 5 8 11
4/21/2019 AZ

60 Example Step 2 2 10 11  OPT+S 2 9 1 MIN LOAD 10 11 MIN LOAD MIN LOAD
6 3 4 7 5 8 4/21/2019 AZ

61 Recap Fast 1.5-approximation (O(nlogn)) NP-complete
PTAS: generalized cost 4/21/2019 AZ

62 Summary Multimedia Systems [GKKTZ 00]
Maximize the total clients served Relational Database Layout [AFMPZ 03] Minimize the combined I/O access time Load Rebalancing Problem [AMZ 03] Minimize the makespan within allowed moves 4/21/2019 AZ

63 Other Research Interests
Algorithms for mobile, sensor networks and privacy preserving databases Online Algorithms: queue management, packet switching, web caching, scheduling Approximation Algorithms: network design, multi-product pricing Streaming Algorithms 4/21/2019 AZ


Download ppt "Data Placement Problems in Database Applications"

Similar presentations


Ads by Google