Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.

Similar presentations


Presentation on theme: "Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University."— Presentation transcript:

1 Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University D. Dash, A. Ailamaki Carnegie Mellon University

2 Hopkins Storage Systems Lab, Department of Computer Science Outline Target Application: Proxy caches for SkyQuery Physical Design in Proxy Caches – Need for vertical partitioning – Workload evolution Online Vertical Partitioning – Simple scenario: Two configuration – General scenario: N configurations Experiments

3 Hopkins Storage Systems Lab, Department of Computer Science SkyQuery Publicly accessible federation of sky surveys (a virtual telescope with terabyte data sets) Autonomous, heterogeneous, and geographically distributed sites Data intensive, read-only workload Scaling through proxy caching – Minimize network traffic – Offload query processing

4 Hopkins Storage Systems Lab, Department of Computer Science Bypass Caching (Malik et al., ICDE’05) Proxy database cache for SkyQuery – Brings columns closes to users – Economic model for bypassing queries

5 Hopkins Storage Systems Lab, Department of Computer Science The Need for Vertical Partitioning Poor I/O performance in the cache – Mirrors the backend DB design Largest relations groups 446 columns – Index-free environment Auxiliary data structures (indices/views) pollute cache Offsets response time benefits from network savings – 6x benefit with partitioning alone Performance without redundant data

6 Hopkins Storage Systems Lab, Department of Computer Science Is Partitioning Feasible?

7 Hopkins Storage Systems Lab, Department of Computer Science Why Not Existing Solutions? (ie. DB tuning advisor, Autopart) Require representative workloads – Not readily available – Astronomy workloads exhibit evolution Offline in nature – Invoked periodically – Costly to run – Ignore the cost of partitioning Static design – Output a single configuration for the input workload – Ignores incremental changes within the workload

8 Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution

9 Hopkins Storage Systems Lab, Department of Computer Science Workload Evolution

10 Hopkins Storage Systems Lab, Department of Computer Science Online Vertical Partitioning Problem

11 Hopkins Storage Systems Lab, Department of Computer Science Two Configuration Scenario Algorithm: Given current config C and an alternative C’, transition if remaining in C incurs substantial overhead Capturing overhead – Penalty : – Cumulative Penalty : – Max cumulative penalty : Transition if 3-competitive – After k transitions, 2Conf incurs (3k/2)(d(C,C’)+d(C’,C)) – OPT incurs at least (k/2)(d(C,C’)+d(C’,C))

12 Hopkins Storage Systems Lab, Department of Computer Science NConf: Extending to N-Configurations Let C y be the current config., C x be the previous config., transition to the first C z (C z ≠C y ) satisfying: Number of configurations is exponential (51 trillion ways to partition 20 attributes) Pruning heuristics – Neighboring configurations – Attribute Groups

13 Hopkins Storage Systems Lab, Department of Computer Science Neighboring Configurations A4 A3A1A2 A4A3A2A1 A3 A1A2A4A1A3A4A2 Neighbors of C y Curr. Config: C y Small, incremental partitions Lower threshold to overcome

14 Hopkins Storage Systems Lab, Department of Computer Science Attribute Groups A4 A3A1A2 Curr. Config: C y q k : {A1,A3,A4} Attr. Groups: {A1}, {A3}, {A4}, {A1,A3}, {A1,A4}, {A3,A4}, {A1,A3,A4} A4A1A2A3 A1A2A4A3A4A1A2A1A2A3A4 weight+=q k (C y ) Candidate config if n.weight > d(C x, C y )+d(C y,n) Candidates with high weight benefits from repartitioning

15 Hopkins Storage Systems Lab, Department of Computer Science Experiments TPC benchmark in SQL Server 2000 Partition orders relation using select queries Two 10k workloads – Wkld Sky : Evolving access pattern that approximates SkyQuery – Wkld Const : Access pattern remains unchanged AutoPart: an offline partitioning tool ( Papadomanolakis et al. )

16 Hopkins Storage Systems Lab, Department of Computer Science Query Performance

17 Hopkins Storage Systems Lab, Department of Computer Science Estimated I/O and Transitions

18 Hopkins Storage Systems Lab, Department of Computer Science Future Work Impact of cache replacement policy – Database state change periodically – Reuse work to find new partitions Scaling to SkyQuery with thousands of attributes Fast techniques for cost estimation Integration of index selection in caches

19 Hopkins Storage Systems Lab, Department of Computer Science Conclusion Proxy caches present a dynamic environment Vertical partitioning improves performance without adding redundant data Online vertical partitioning – Balances query execution performance with cost of transitioning Experiments show 17% improvement by partitioning a single table

20 Hopkins Storage Systems Lab, Department of Computer Science Questions ???


Download ppt "Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University."

Similar presentations


Ads by Google