Presentation on theme: "Materialization and Cubing Algorithms. Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values."— Presentation transcript:
Cube Materialization Each cell of the data cube is a view consisting of an aggregation of interest. The values of these cells are dependent on the values of other cells in the data cube. Materializing some or all of these cells is a common and powerful query optimization technique.
Materialization contd.. The size of the data warehouse and the complexity of queries can cause queries to take very long to complete. Materializing (precompute) frequently asked queries is a commonly used technique for performance improvement.
Issues in View Materialization What views should we materialize, and what indexes should we build on the precomputed results? Given a query and a set of materialized views, can we use the materialized views to answer the query? How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)
Bottom up Cubing (BUC) BUC is an algorithm for cube construction which proceeds from the apex to base cuboid(more specific). This notion is hence called the bottom up approach. BUC can use the Apriori pruning property to compute icberg cubes while applying the algorithm which will be clear in the next slide.
BUC algorithm It is a recursive algorithm which divides dimensions into partitions and facilitates iceberg pruning. It does not allow simultaneous aggregation and the best feature of BUC is the sharing of partitioning costs.
Bottom-Up Data Cube Computation example 1985198619871988 Norway10302024 … 23451432 USA14324211 1985198619871988 All471077667 All Norway84 … 114 USA99 All 297 Cell Values: Numbers of loan applications
Introduction to MOLAP cube Computing multiple related group-bys and aggregates is one of the core operations of On- Line Analytical Processing (OLAP) applications. Although is designed for MOLAP systems it can also be used for Relational OLAP (ROLAP) systems when table data is converted to an array, cubed as if in a MOLAP system, and then converted back to a table.
Array Storage There are three major issues relating to the storage of the array that must be resolved –It is likely in a multidimensional application that the array is too large to fit in memory –It is likely that many of the cells in the array are empty, because there is no data for that combination of coordinates –In many cases an array will need to be loaded from data that is not in array format (e.g., from a relational table or from an external load file)
Resolving Storage Issues A large n-dimensional array that can not fit into memory is divided into small size n-dimensional (corresponding to disk blocking size) chunks and each chunk is stored as one object on disk Sparse chunks (with data density less than 40%) use a “chunk-offset compression” where for each valid array entry a pair, (offsetInChunk, data), is stored To load data from formats other than arrays, a partition- based loading algorithm is used that takes as input the table, each dimension size and a predefined chunk size, and returns a (possibly compressed) chunked array
Basic Array Cubing Algorithm 1.Construct the minimum size spanning tree for the group- bys of the Cube 2.Compute any group-by D i1 D i2... D ik of a Cube from the “parent” D i1 D i2... D ik+1 which has the minimum size 3.Read in each chunk of D i1 D i2... D ik+1 along the dimension D ik+1 and aggregate each chunk to a chunk of D i1 D i2... D ik 4.Once the chunk of D i1 D i2... D ik is complete, we output the chunk to disk and use the memory for for the next chuck of D i1 D i2... D ik, keeping only one chunk in memory at a time