Presentation on theme: "1 ROLAP DATA FLOWS SCHEMA DATA MINING FORENSICS HOLAP NETWORK SECURITY ONLINE ANALYSIS MOLAP STREAMING DATA MULTI-DIMENSIONAL HIERARCHIES CUBOID BINARY."— Presentation transcript:
1 ROLAP DATA FLOWS SCHEMA DATA MINING FORENSICS HOLAP NETWORK SECURITY ONLINE ANALYSIS MOLAP STREAMING DATA MULTI-DIMENSIONAL HIERARCHIES CUBOID BINARY TREE FOREST SQL GROUP BY DATA CUBE RELATIONAL ALGEBRA OLTP OLAP LATTICE NOTATION THE Stephen A. Broeker
3 Conclusion DATA VALUE HYPERCUBES exceed the performance of existing hypercubes by enabling OLAP to drill down to individual data values. Therefore, DATA VALUE HYPERCUBES extend OLAP’s ability to render valuable information and insight.
4 Vision Analyze Streaming Data Improve Network Security Freedom from figuring out how to answer routine questions in order to think about what extraordinary questions could be asked.
5 Mutually Exclusive Approaches OLAPOLTP Broad views of data: Finds patterns obscured by detail. Narrow views of data: Finds detail obscured by patterns.
6 Distinct Purposes OLAP Online Analytical Processing Seeks detailed answers to complex questions based on large data sets. Discover information hiding in data. The priority is depth and breadth of understanding, speed is secondary. Example: Find the purchase patterns for men for all dental hygiene products in all stores. OLTP Online Transaction Processing Operate and Control: Snapshots of operational status. The priority is speed & detail. Example: John Smith used a debit card to buy toothpaste from a gas station.
7 Opportunity Eliminate the mutually exclusive tradeoff between OLTP versus OLAP. Now we can have the best of both worlds. VS Today.Tomorrow.
8 Capability of the Data Value Hypercube Enables the composition of totals from aggregates or data values. Detect outliers, anomalies, exceptions, and data errors. Detect trends and tendencies among measures, attributes or parameters. Find a “Needle in Haystack” by drilling- down to specific details. Spot data clusters, relationships and magnitudes of size, disparity, or distribution.
9 Data Mining Data Mining uses OLAP. Example: Associations Purchased with BreadConcurrence Butter90% Grape Jelly20% Cinnamon Spice10% People who buy bread “also” buy ‘ X ’. “Also” is presented as a percentage. Sample SizeConfidence 100 Transactions10% 1,000,000 Transactions90% Building a Data Warehouse: $1M Building a DBMS Team: $2M Having confidence in your results: Priceless.
10 Compare Roles Relational Database DATA VALUES Data Warehouse DATA AGGREGATES OLTP OLAP OPERATIONS BUSINESS INTELLIGIENCE
11 Compare OLTP to OLAP OLAPOLTP RESPONSE TIME TO QUERIES:SLOWFAST SPACE REQUIREMENT:LARGERELATIVELY SMALL DATA SOURCE:DATA CUBESRELATIONAL DATABASE GRANULARITY:GENERALIZEDDETAILED OUTPUT:DATA AGGREGATESSPECIFIC DATA VALUES SCOPE:ANALYZE, DECIDE, PLANOPERATE & CONTROL FLUX:BATCH OPERATIONSFREQUENT UPDATES FIND:TRENDSANOMALIES Compare OLTP to OLAP
12 Hypercubes are constructed so that each cell corresponds to a unique combination of database attribute values. Hypercube
13 OLAP Engines are implemented as multi-dimensional data cubes. Dependencies Data cubes with many dimensions are called hypercubes.
14 In geometry, the tesseract is the four-dimensional analog of the cube. The tesseract is to the cube as the cube is to the square. A generalization of the cube to dimensions greater than three is called a “hypercube”. Created by Jason Hise with Maya and Macromedia Fireworks. A 3D projection of an 8-cell performing a [[SO(4)#Geometry_of_4D_rotations|single rotation]] about a plane which bisects the figure from front to back and top to bottom. Released by the author into the public domain: Jason Hise grants anyone the right to use this work for any purpose, without any conditions, unless such conditions are required by law. Disambiguation
15 In this context, hypercubes are data structures. This picture is merely an abstract visual representation of a hypercube. Disambiguation
16 A Balanced B-Tree Forest is the data structure that is used to represent a Hypercube. Each dimension in the Hypercube is represented by a separate B-Tree. Concepts versus Implementation CONCEPTS FIRST Implementation Later
17 Hypercubes Have Dimensions Customer Part Supplier
18 Attributes Ordered into Hierarchies Part 1 Part 2 Part 3 Part 4Supplier 4 Supplier 3 Supplier 2 Supplier 1 Customer A Customer B Customer C Customer D
19 Multiple Attributes in a Single Dimension Jan Feb Mar Apr May June July August September October November December Monday Tuesday Wednesday Thursday Friday Saturday Sunday Dimensions are organized as hierarchies of attributes. Example, the time dimension of Year, Month, Day Drill-down is viewing data at progressively finer detail. Example: Sales per year, then month, then day. Roll-up is viewing data in progressively less detail. Example: Sales per day, then month, then year. UPDOWN
20 Attribute Complexity Jan Feb Mar Apr May June July August September October November December Monday Tuesday Wednesday Thursday Friday Saturday Sunday Attribute complexity increases in the presence of hierarchies. If we have total sales grouped by month, then we can use the results to compute the total sales grouped by year. Example, queries that group on time. These queries, (day), (month), (year), each represents a different granularity of the time dimension. (year) <= (month) <= (day)
21 Query Dependencies Jan Feb Mar Apr May June July August September October November December Monday Tuesday Wednesday Thursday Friday Saturday Sunday Hierarchies introduce query dependencies that we must account for when determining what queries to materialize. Often, hierarchies are not total orders, but partial orders on the attributes that make up a dimension. Example: Months and years cannot be divided evenly into weeks. If we group by week then we can’t determine the grouping by month or year. (month) !<= (week), (week) !<= (month), and similarly for week and year.
22 Limit of the Visual Analogy Although mathematicians can project geometric shapes having at least 10 dimensions onto a flat surface beyond 3 dimensions, the visual analogy of a hypercube as a data structure stops working, even though the logic of the analogy remains perfectly valid. The construction of 4 dimensional hypercube on a flat surface makes it obvious why we don’t go beyond 3D representations.
23 Cuboids This is a single data cell.
24 Cuboids Any subset of a hypercube is a cuboid.
25 Cuboids Slice.
26 Cuboids Dice.
27 Cuboids Also a dice.
28 Drilling down.
29 Rolling up More Detail Narrower Summaries Less Detail Broader Summaries
30 Challenges Large databases Data in a rapid and constant state of flux, i.e., streaming data. Constraints: Time, RAM, computing power Data Cube Materialization is problematic.
31 Predicted % Change to Data Warehouse Feeds OLTP OLAP UNSTRUCTURED DATA STRUCTURED DATA Multi- Dimensional Databases XML EDI Spreadsheets Web Pages RSS Web Log Voice recognition Instant Messaging Wikis Content Management Document Management Taxonomies, Ontologies Multimedia Legacy Databases Relational Databases Main Frame Databases +80% -80%
32 Existing Methods ROLAPMOLAPHOLAP Name:Relational OLAPMulti-Dimensional OLAPHybrid OLAP Data Source:Relational DatabaseHypercubeROLAP for Data Values MOLAP for Data Aggregates SQL:YesNoYes
33 Disambiguation Symbol: Name: Pi Use:Most often in Classical Geometry Arithmetic, Set Theory Function:Mathematical Constant Cartesian Product. Similar to summation, as indicated by the capital letter sigma:
34 Bottleneck Number of Cells in a Data Cube Given a database with l number of attributes, the number of cells in the corresponding fully populated data cube is 1 ≤ i ≤ l ( a i + 1 ) where each attribute i has a i values. The additional data cell accommodates the value “all”. +1+1
35 Cuboid Example In actuality, the network data stream has 16 dimensions. = 202 x 101 x 276 ≈ 5.8 million Consider a network hypercube with 3 dimensions: Typically there are: 1. Content 2. Source IP 3. Time Stamp 201 unique Content types 100 unique Source IPs 275 unique Time-Stamps Limit the hypercube to one million streams. the number of possible cells is:
36 Data Expansion Number of Cells Note: Log Scale 5 Dimensions 10 Dimensions 20 Dimensions Unique Data Values per Dimension 15 Dimensions B B