Presentation is loading. Please wait.

Presentation is loading. Please wait.

Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.

Similar presentations


Presentation on theme: "Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham."— Presentation transcript:

1 Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham

2 Materialized Views Complex results materialized in order to speed up queries that depend on these results Increasingly being supported by commercial database systems (e.g. Oracle8i) Crucial in data warehousing environments

3 Materialized View Maintenance As underlying data changes, the materialized views need to be refreshed Efficient view maintenance crucial! Need to provide up-to-date query responses growing Amount of data added to data warehouses increasing Maintenance time window shrinking

4 Focus Efficient techniques for maintenance of a set of materialized views (MVs) by Transient materialization of common subexpressions (CSEs) Selection of additional MVs Computation of the best maintenance policy and plan for each MV

5 Transient Materialization of Common Subexpressions CSEs materialized to reduce maintenance cost by sharing computation, disposed after use Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] –Huge search space; considered impractical Earlier work by Sellis [TODS88] Efficient heuristic algorithms proposed by Roy et al. [SIGMOD00]

6 Selection of Additional MVs Additional views materialized permanently to reduce the overall maintenance cost Motivated by Ross et al. [SIGMOD96] –restricted to incremental maintenance only –do not consider transient materialization MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]

7 Best Maintenance Policy and Plan Computation For each MV, Determine the best maintenance policy (incremental or recomputation) Find the corresponding best plan Earlier work by Vista [EDBT98] –Does not take into account transient materialization of CSEs or presence of other MVs Current systems need manual specification of the maintenance policy

8 Contribution A framework that consolidates the choice of CSEs to be transiently materialized Additional MVs Best maintenance plan (incremental/recomputation) Integrated with a state of the art query optimizer (Volcano [ICDE93])

9 Example dABCDdE BC DE ABC CDEBCDE merge merge incremental refresh recomputation recomputation permanentpermanentpermanent permanent transient initial set

10 Approach 1. Setting up the search space of maintenance plans 2. Best maintenance plan computation 3. Transient/Permanent materialized view selection

11 Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

12 Setting Up the Maintenance Plan Space The Query DAG representation for recomputation plans Incorporating incremental plans

13 Representation of the Recomputation Plan Space Equivalence Class (OR node) Operation (AND node) AND/OR Query DAG BC ABC BCD CD AB C D B Best Plan A Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00 ]

14 Incremental Plans: Propagation Based Differential Generation Differentials propagated one at a time For each differential dR –Start at dR and compute node differentials bottom- up along the “best plan” in a topological order –Differential of a node computed as a function of its inputs and their differentials e.g. d(E 1 E 2 ) = E 1 dE 2 U E 2 dE 1 U dE 1 dE 2 where dE i = differential of E i wrt dR –Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above Ross et al. [SIGMOD96]

15 Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dA BC BCdA BdA C BdA Best Plan

16 Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dB CdB ACdB CDdB CD AdB C D dBA Best Plan

17 Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dC BdC ABdC BDdC DdC AB dC D BA Best Plan

18 Incorporating Incremental Plans: Propagation Based Differential Generation Equivalence Class (OR node) Operation (AND node) Propagation of dD BC BCdD CdD C dD B Best Plan

19 Incorporating Incremental Plans Logical representation AB A BdA B AdB dB dA For each equiv node and each base differential affecting it – –Introduce a new equiv node representing its differential – –Populate with the differential plans Maintain statistics for the full expression after successive merges Large space overhead! recomputation plan incremental plan Merge operator

20 Incorporating Incremental Plans AB A BdA B AdB dB dA Reuse the same structure for successive propagation cycles separate best plan pointers for each cycle separate statistics for the full expression after successive merges Also incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00] Actual space-efficient representation

21 Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

22 Maintenance Plan Computation Given Set of nodes M t materialized transiently –can include full results as well as differentials Set of nodes M p materialized permanently –includes full results but not differentials compute the best consolidated maintenance plan for M p

23 Maintenance Plan Computation Best plan computed using a query optimizer extended as follows: Plan accessing a materialized view (trans/perm) does not include its computation, only its use Cost of a maintenance plan totalcost(M p, M t ) =  e  Mp maintcost(e | M p, M t ) +  e  Mt trmatcost(e | M p, M t ) where maintcost(M p, M t ) : cost of cheapest maintenance plan for e (recomputation/incremental) trmatcost(M p, M t ) : cost of computing and materializing e

24 Approach Setting up the search space of maintenance plans Best maintenance plan computation Transient/Permanent materialized view selection

25 Transient/Permanent Materialized View Selection Given set of MVs M already materialized, determine Set of nodes M t to materialize transiently Set of nodes M p (  M) to materialize permanently such that totalcost(M p, M t ) is minimized Exhaustive approach too expensive. Need heuristics!

26 Transient/Permanent Materialized View Selection A Greedy Heuristic Input: Initial MVs M Output: M p (  M), M t, corresp. best plan Begin M p = M; M t = {} S = set of equivalence nodes in the DAG for M While ( S  {} ) Pick z  S which maximizes Benefit(z | M p, M t ) If ( Benefit(z | M p, M t )  0 ) break If ( z is a full result and maintcost(z | M p, M t ) < trmatcost(z | M p, M t ) ) M p = M p U {z} else M t = M t U {z} S = S – {z} Return (M p, M t ) End How to compute Benefit(z | M p, M t )?

27 Transient/Permanent Materialized View Selection Benefit Computation Benefit(z | M p, M t ) = gain(z | M p, M t ) - investment(z | M p, M t ) where gain(z | M p, M t ) =  e  Mp (maintcost(e | M p, M t ) - maintcost(e | M p, M t U {z})) +  e  Mt (trmatcost(e | M p, M t ) - trmatcost(e | M p, M t U {z})) and investment(z | M p, M t ) = min(maintcost(z | M p, M t ), trmatcost(z | M p, M t )) if z is a full result trmatcost(z | M p, M t ) if z is a differential Benefit computation expensive. Need efficient techniques!

28 Transient/Permanent Materialized View Selection Improving Efficiency of the Greedy Heuristic Cost-propagation based incremental techniques to efficiently compute Benefit Monotonicity assumption –Reduces the number of Benefit computations Techniques to determine if a node can be shared across a given maintenance plan –Reduces the number of nodes considered for transient materialization Adapted from Roy et al. [SIGMOD00]. See paper for details.

29 Benchmark Single Views –Same views as above, refreshed separately Set of Views –10 views (5 with aggregates, 5 without) on 8 distinct relations, refreshed together

30 Effect of Transient and Permanent Materialization Single Views Set of Views

31 Effect of Adaptive Maintenance Policy Selection Single Views Set of Views

32 Scalability Analysis Optimization Memory Requirements Optimization Time Negligible one-time costs

33 Conclusion Presented techniques Automate sharing of computation Automate view selection Automate maintenance policy selection and plan computation Do the above in an integrated manner –leading to benefits greater than could be achieved by considering each dimension individually Are efficient and scalable –the overall benefits greatly outweigh the one-time cost Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)

34 Future Work Extend presented techniques To handle limited space To speed up a workload of queries in addition to maintenance of a set of materialized views To work in dynamic query result caching environments

35 Questions


Download ppt "Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham."

Similar presentations


Ads by Google