Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stochastic Database Cracking : Towards Robust Adaptive Indexing in Main-Memory Column-Stores VLDB, August 2012 (to appear)

Similar presentations


Presentation on theme: "Stochastic Database Cracking : Towards Robust Adaptive Indexing in Main-Memory Column-Stores VLDB, August 2012 (to appear)"— Presentation transcript:

1 Stochastic Database Cracking : Towards Robust Adaptive Indexing in Main-Memory Column-Stores VLDB, August 2012 (to appear)

2 Background  Current cracking schemes is not effective  The feature of current cracking is blindly and not adaptive

3 Stochastic cracking  It is a significantly more resilient approach to adaptive indexing.  It is not only rely on each query to reorganize data but also others

4 OUTLINE  Related work  Problem of original cracking  Stochastic cracking  Experiment  Conclusion

5 Related work  Offline Analysis  It does not support dynamic workloads effective  Online Analysis  It significantly penalizes individual queries The drawbacks of offline and online analysis motivate adaptive indexing

6 Related work  Database Cracking  It pioneered the notion of continuously and incrementally building and refining indexes as part of query processing  It enables efficient adaptive indexing  The more often a key range is queried, the more its representation is optimized

7 Related work  Selection Cracking  The main innovation is that the physical data store is continuously changing with each incoming query Q, using Q as a hint on how data should be stored.

8 Related work  requirements for adaptive indexing  lightweight initialization  as fast as possible convergence to the desired performance Initialization cost is measured against that of a full scan, while desired performance is measured against that of a full index.

9 Problem of original cracking  Cracking Features  self-organize automatically  at low cost  Cracking Cost.  the cost of reorganizing data

10 Problem of original cracking  Ideal cracking costs Comes when analyzing fewer tuples, such disposition is workload-dependent.it depends not only on the nature of queries posed but also on order in which they are posed. Crack achieves the best-performance if each query cracks a piece of the column in two half pieces

11 Problem of original cracking  a Non-ideal workload  When the workload is a sequential workload, Namely every query requests follows the preceding one in a sequence. For example the first query requests A<1 and then A<2,and so on, leading to much performance of reorganizing the data.so original crack don’t perform sequential workload well.

12 Problem of original cracking

13 Stochastic cracking  The reorganization actions be partially driven by what queries want,and partially arbitrary in character.  Query-driven and arbitrary reorganization (make random reorganization)steps performed with each query.

14 Stochastic cracking  Stochastic Cracking Algorithms  Algorithm DDC(Data Driven Center) Algorithm DDC(C, a, b) Crack array C on bounds a, b. 1. positionLow = ddc crack(C, a) 2. positionHigh = ddc crack(C, b) 3. result = createView(C,positionLow, positionHigh) function ddc crack(C, v) 4. Find the piece Piece that contains value v 5. pLow = Piece.firstPosition() 6. pHgh = Piece.lastPosition() 7. while (pHgh - pLow > CRACK SIZE) 8. pMiddle = (pLow+pHgh) / 2; 9. Introduce crack at pMiddle 10. if (v <C[pMiddle]) pHgh = pMiddle 11. else pLow = pMiddle 12. position=crack(C[pLow, pHgh],v) 13. result=position

15 Stochastic cracking  Algorithm DDR(Data Driven Random)  DDR differs from DDC in that it relaxes the requirement that a piece be split exactly in half. Instead, it uses random cracks, selecting random pivots until the target value v fits in a piece smaller than the threshold set for the maximum piece size.

16 Stochastic cracking  Algorithms DD1C and DD1R They are two variants of DDC and DDR,which eschew the recursive physical reorganization, these variants perform at most one auxiliary physical reorganization, reducing high overhead of auxiliary operations.

17 Stochastic cracking  Algorithm MDD1R(materialization DD1R) Algorithm MDD1R(C, a, b) Crack array C on bounds a, b. 1. Find the piece P1 that contains value a 2. Find the piece P2 that contains value b 3. if (P1 == P2) 4. result = split and materialize(P1,a,b) 5. else 6. res1 = split and materialize(P1,a,b) 7. res2 = split and materialize(P2,a,b) 8. view = createView(C, P1.lastPos+1, P2.f irstPos-1) 9. result = concat(res1, view, res2) function split and materialize(Piece,a,b) 10. L=Piece.firstPosition 11. R=Piece.lastPosition 12. result=newArray() 13. X = C[L + rand()%(R-L+1)] 14. while (L <= R) 15. while (L <= R and C[L] < X) 16. if (a <= C[L] && C[L] < b) result.Add(C[L]) 17. L = L + 1 18. while (L = X) 19. if (a <= C[R] && C[R] < b) result.Add(C[R]) 20. R = R - 1 21. if (L < R) swap(C[L],C[R]) 22. Add crack on X at position L

18 Stochastic cracking  Progressive Stochastic Cracking  Progressive MDD1R (PMDD1R) is an even more incremental variant of MDD1R which further reduces the initialization costs, it control the number of swaps performed to change the position of tuples.

19 Stochastic cracking  Selective Stochastic Cracking  To further reduce the overhead of stochastic actions,we can selectively eschew stochastic cracking for some queries; such queries are answered using original cracking.  FiftyFifty  FlipCoin

20 Stochastic cracking

21 Experiment  Environment  Cpu ; 8-core hyper-threaded machine (2 Intel E5620 @2.4GHz)  Memory:24GB RAM

22 Experiment  Sequential Workload.

23 Experiment  Random Workload

24 Experiment  Varying Selectivity.

25 Experiment  Adaptive Indexing Hybrids In recent work, cracking was extended with a partition/merge logic. Therewith, a column is split into multiple pieces and each piece is cracked independently. Then, the relevant data for a query is merged out of all pieces.

26 Experiment  Stochastic Cracking on Real Workloads

27 Conclusion  Stochastic Cracking works adaptively and incrementally, with minimal impact on query processing  Stochastic Cracking solves the workload robustness deficiency inherent in Database

28


Download ppt "Stochastic Database Cracking : Towards Robust Adaptive Indexing in Main-Memory Column-Stores VLDB, August 2012 (to appear)"

Similar presentations


Ads by Google