Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube.

Similar presentations


Presentation on theme: "1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube."— Presentation transcript:

1 1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube. Many cells are computable from other cells. But which cells to materialize? More cells = better query performance. Materialize the entire data cube. Best query response time. Excessive space requirements.

2 2 Data Value Hypercube DATA VALUE HYPERCUBES store data- record indices, whereas existing data cubes can only store data aggregates. versus ordinary data cubes DATA VALUE HYPERCUBES are generated as quickly as existing data cubes.

3 3 Remember this? Now it doesn’t matter. OLTP OLAP UNSTRUCTURED DATA STRUCTURED DATA Email Multi- Dimensional Databases XML EDI Spreadsheets Web Pages RSS Web Log Voice recognition Instant Messaging Wikis Content Management Document Management Taxonomies, Ontologies Multimedia Legacy Databases Relational Databases Main Frame Databases +80% -80%

4 4 Hypercubes are constructed so that each cell corresponds to a unique combination of database attribute values. 3 attributes require at least 8 cells. Hypercube

5 5

6 6 CustomerPart Customer Supplier None PartSupplier Part CustomerPartSupplier

7 7 CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Supplier Boeing Lockheed Customer Delta FedEx None Cockpit Jet Engine Wing Part

8 8 CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Supplier Boeing Lockheed Customer Delta FedEx None Cockpit Jet Engine Wing Part 1 2 3 4 5 6 7 8 3 attributes require at least 8 cells.

9 9 CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx Sales $10 $20 $30 $40 $50 $60 $70 $80 $90 $100 $110 $120 PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing Sales $30 $110 $190 $70 $150 $230 Cockpit Jet Engine Wing Part Sales $100 $260 $420 Supplier Boeing Lockheed Sales $330 $450 Customer Delta FedEx Sales $360 $420 All Sales $780 CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Sales $40 $60 $120 $140 $200 $220 CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx Sales $150 $180 $210 $240 This is entirely fictional data.

10 10 Lattice Notation A lattice is denoted as (L, <=). L = the set of elements (queries). <= is the dependence relation. ancestor(a) = {b | a <= b}. descendant(a) = {b | b <= a}. Every element is its own descendant and ancestor. next(a) = the immediate proper ancestors of a. next(a) = {b | a < b, there exists a < c, c < b}.

11 11 Lattice Diagrams Lattice diagrams are graphs. Elements are nodes. There is an edge from a to b iff b is in next(a). There is a path downward from y to x iff x <= y.

12 12 Hypercube Algebra Simple database warehouse example. Parts are purchased from suppliers and then sold to customers. Three dimensions: Part, Supplier, and Customer. The measure of interest is total sales. For each cell (p, s, c), store the total sales of part p that was bought from supplier s, and sold to customer c. Users are interested in consolidated sales. Example: what is the total sales of a given part p to a given customer c? This query is answered by looking up the value in cube cell (p, ALL, c). CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Sales $40 $60 $120 $140 $200 $220 Many cells are computable from other cells. Dependent cells. Example: cell (p, ALL, c) is the sum of cells (p, s1, c), …, (p, sn, c).

13 13 The Dependence Relation on Queries Consider two queries Q1 and Q2. Q1 ≤ Q2 iff Q1 can be answered using only Q2. Q1 is dependent on Q2. For example, the query (part), can be answered using only the query (part, customer). (part) <= (part, customer). Some queries are not comparable with each other using the <= operator. For example, (part) !<= (customer) and (customer) !<= (part). CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Sales $40 $60 $120 $140 $200 $220

14 14 B-TREE LOGIC EASIER THAN IT LOOKS ACEGIKMOQSUWYZ BFJNRVX DLT HP 13579111315171921232526 261014182224 41220 816

15 15 B-TREE LOGIC B IS FOR BALANCED 10020508099 GIVEN 3 RD ORDER B TREE WITH THE NUMBERS: 208050999010 INSERT 9 4980509910020 INSERT 49 5080519910020 INSERT 51 Insert any number < 20 and becomes the root. Insert any number > 50 and becomes the root. Insert any number > 20 and < 50 and it becomes the root. 50 20

16 16 B-Tree Forest Construction time for the tree forest is where d is the number of query dimensions and n i is the O ( 1≤ i ≤ d ( log n i )) number of attributes in the database at level d.

17 17 B-Tree Forest A Balanced B-Tree Forest is the data structure that is used to represent a Hypercube. Each dimension in the Hypercube is represented by a separate B-Tree. B-Trees are great for storing sparse data and have fast insertion and search characteristics, (nlogn).

18 18 B-Tree Forest A binary tree forest consists of multiple levels of binary trees. Each level represents a cube dimension. A binary tree consists of nodes – stems or leaves. Stems nodes point to left and right binary trees. Leaf nodes point to a linked list of fact table IDs. A linked list of fact table IDs points to fact table entries with identical attribute values. A depth first search on a binary tree forest results in a GROUP BY clause.

19 19 CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx Sales $10 $20 $30 $40 $50 $60 $70 $80 $90 $100 $110 $120 PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing Sales $30 $110 $190 $70 $150 $230 Cockpit Jet Engine Wing Part Sales $100 $260 $420 Supplier Boeing Lockheed Sales $330 $450 Customer Delta FedEx Sales $360 $420 All Sales $780 CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Sales $40 $60 $120 $140 $200 $220 CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx Sales $150 $180 $210 $240 B-Tree Forest in Reverse: A primer Boeing Lockheed Cockpit Wing Jet Engine Delta FedEx Supplier Tree Customer Tree Parts Tree

20 20 Extensive B-Trees Are Common BOEING GENERAL DYNAMICS LOCKHEED MARTIN HONEYWELL INT’LNORTHROP GRUMMAN UNITED TECHNOLOGIES AVIONICS ELEVATOR JET ENGINE AILERON FLIGHT CONTROLS STABILIZER COCKPIT FIN FUSELAGE RUDDER WING LANDING GEAR SOUTHWEST DHL DELTA VIRGINFED EX But let’s keep it simple for now.

21 21 PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing Sales $30 $110 $190 $70 $150 $230 Cockpit Jet Engine Wing Part Sales $100 $260 $420 Customer Delta FedEx Sales $360 $420 All Sales $780 CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Sales $40 $60 $120 $140 $200 $220 CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx Sales $150 $180 $210 $240 Incoming Data Stream Supplier Boeing Lockheed Sales $330 $450 CustomerPartSupplier Sales Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx $10 $20 $30 $40 $50 $60 Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx $70 $80 $90 $100 $110 $120 CustomerPartSupplier Sales CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx Sales $10 $20 $30 $40 $50 $60 $70 $80 $90 $100 $110 $120 DATA FLOW Chunk 1 2 intervals of Data Flow Chunk 2Chunk 1

22 22 Setting up Fact & Dimension Tables Supplier Boeing Lockheed Sales $330 $450 CustomerPartSupplier Sales Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx $10 $20 $30 $40 $50 $60 Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx $70 $80 $90 $100 $110 $120 CustomerPartSupplier Sales Chunk 2Chunk 1 CustomerPart Supplier Sales Cockpit Boeing Delta FedEx $10 $20 $30 $40 $50 $60 StringID Global String Table Boeing 0Lockheed1Cockpit2Jet Engine3 Part Wing4Delta5FedEx6 Lockheed Cockpit Jet Engine Wing Delta FedEx UNSORTED StringID Supplier Dimension Table Boeing 00 Lockheed 11 StringID Part Dimension Table Cockpit 20 Jet Engine 31 Wing 42 StringID Customer Dimension Table Delta 50 FedEx 61 SORTED SupplierID Fact Table PartCustomerSales 0000$10 0101$20 1200$30 1301$40 0410$50 0511$60 1610$70 1711$80 0820$90 0921$100 11020$110 11121$120

23 23 Let’s just say ‘Parts’ is the most significant data of interest. ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 11 Customer 0 1 0 1 0 1 0 1 0 1 0 1 $120 Supplier 0 0 1 1 0 0 1 1 0 0 1 1 Part 0 0 0 0 1 1 1 1 2 2 2 2

24 24 Understanding Nested B-Trees ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120

25 25 Understanding Nested B-Trees ID Fact Table Sales 0$10 1$20 2$30 3$40 4$50 5$60 6$70 7$80 8$90 9$100 10$110 Supplier 0 0 1 1 0 0 1 1 0 0 1 111 Part 0 0 0 0 1 1 1 1 2 2 2 2 Customer 0 1 0 1 0 1 0 1 0 1 0 1$120 Fact Table $10$20$30$40$50$60$70$80$90$100$110$120 Sales 001100110011 Supplier 000011112222 Part 010101010101 Customer ID01234567891011 ID StringID Supplier Dimension Table Boeing 00 Lockheed 11 StringID Part Dimension Table Cockpit 20 Jet Engine 31 Wing 42 StringID Customer Dimension Table Delta 50 FedEx 61 WingCockpit BBBLLL DDDDDDFFFFFF Jet Engine WingCockpit

26 26 Delta FedEx Delta FedEx Delta FedEx Delta FedEx Delta FedEx Making a B-Tree Forest IDFact Table Sales 0 $10 1 $20 2 $30 3 $40 4 $50 5 $60 6 $70 7 $80 8 $90 9 $100 10 $110 Supplier001100110011 11 Part000011112222 Customer010101010101 $120 Fact Table $10$20$30$40$50$60$70$80$90$100$110$120 Sales 001100110011 Supplier 000011112222 Part 010101010101 Customer ID01234567891011 ID WingCockpit BBBLLL DDDDDDFFFFFF Jet Engine WingCockpit BoeingLockheed Boeing Lockheed Boeing Lockheed Delta FedEx Drilling down the Hypercube to a Single Data Value

27 27 Data Structure & Concept Side by Side Do you see the Data Value Hypercube to the left? Delta FedEx Delta FedEx Delta FedEx Delta FedEx Delta FedEx Boeing Lockheed Boeing Lockheed Delta FedEx Boeing Lockheed Wing Cockpit Jet Engine CustomerSupplier Boeing Delta FedEx Lockheed Delta FedEx CustomerPartSupplier Boeing Cockpit Delta FedEx Lockheed Cockpit Delta FedEx Boeing Jet Engine Delta FedEx Lockheed Jet Engine Delta FedEx Boeing Wing Delta FedEx Lockheed Wing Delta FedEx PartSupplier Boeing Cockpit Jet Engine Wing Lockheed Cockpit Jet Engine Wing CustomerPart Cockpit Delta FedEx Jet Engine Delta FedEx Wing Delta FedEx Supplier Boeing Lockheed Customer Delta FedEx Cockpit Jet Engine Wing Part None

28 28 Network Data Stream ProtocolContentIDDestination IPSource IPTime Stamp ProtocolContentIDDestination IPSource IPTime Stamp 8110482521252312145875281166832030 8111149246455512457985661166832031 802139876124714361875611166832032 813175292458217621485681166832033 814258624588416745657231166832040 815439621558914365854791166832041 816179255865717985468221166832042 827134282315515875663121166832043 828191274638613456796581166832044 829483125367414861446791166832045 810 134823648717365695181166832046 81011146758848713441885451166832047 81012113541685314558752671166832048 9913423114455915587964671166832049 9914142355257717526214431166832050 StringID SMB0 LDAP1 SSH2 AOL3 JPEG4 ENGLISH5 ZIP6 COMPRESS7 GIFF8 POP9 SMPT10 IMAP11 FTP12 TELNET13 SKYPE14 CMS15 GLOBAL String Table FRENCH16 RUSSIAN17 BMP18 BASIC SOURCE19 C SOURCE20 DISCOVER21 String Table IDID BASIC SOURCE 190 BMP 181 C SOURCE 202 CMS 153 COMPRESS 74 DISCOVER 215 ENGLISH 56 FRENCH 167 GIFF 88 JPEG 49 RUSSIAN 1710 ZIP 611 CONTENT Dimension Table String Table IDID AOL 30 FTP 121 IMAP 112 LDAP 13 POP 94 SKYPE 145 SMB 06 SMTP 107 SSH 28 TELNET 139 PROTOCOL Dimension Table Only showing 2 out of 16 NETWORK DATA STREAM Dimensions

29 29 B-TREE Notation FTP B (1,3) Attribute Name Node B Level Record Number

30 30 NETWORK DATA STREAM POP B (1,9) AOL B (1,7) IMAP B (1,8) SKYPE B (1,4) FTP B (1,3) LDAP B (1,1) TELNET B (1,6) SMTP B (1,5) SSH B (1,2) SMB B (1,0) “Protocols” B-TREE

31 31 Notation BMP 4 B (7,9)(7,9)(7,9)(7,9) Chunk Record Number Attribute Name Record Count Tree nodes not only contain data aggregates but a linked list of data record indices.

32 32 “Content” B-Trees ZIP 3 (2,10) (2,11) (2,12) C SOURCE 4 (2,3) (2,4) (2,5) (2,6) BMP 1 (2,2) BASIC SOURCE 3 (1,15) (2,0) (2,1) RUSSIAN 3 (2,7) (2,8) (2,9) B (1,8) SSH C SOURCE 1 (1,4) BMP 1 (1,3) BASIC SOURCE 3 (1,0) (1,1) (1,2) B (1,0) AOL CMS 1 (1,5) B (1,1) FTP COMPRESS 1 (1,6) B (1,2) IMAP DISCOVER 2 (1,7) (1,8) B (1,3) LDAP FRENCH 1 (1,9) B (1,4) POP GIFF 1 (1,10) B (1,5) SKYPE JPEG 2 (1,11) (1,12) B (1,6) SMB RUSSIAN 1 (1,14) B (1,7) AOL

33 33 B-Tree Forest POP B (1,9) AOL B (1,7) IMAP B (1,8) SKYPE B (1,4) FTP B (1,3) LDAP B (1,1) TELNET B (1,6) SMTP B (1,5) SSH B (1,2) SMB B (1,0) Pointer C SOURCE 1 (1,4) BMP 1 (1,3) BASIC SOURCE 3 (1,0) (1,1) (1,2) B (1,0) AOL Level Index of Tree at the same level

34 34 ZIP 3 (2,10) (2,11) (2,12) C SOURCE 4 (2,3) (2,4) (2,5) (2,6) BMP 1 (2,2) BASIC SOURCE 3 (1,15) (2,0) (2,1) RUSSIAN 3 (2,7) (2,8) (2,9) B (1,8) SSH C SOURCE 1 (1,4) BMP 1 (1,3) BASIC SOURCE 3 (1,0) (1,1) (1,2) B (1,0) AOL CMS 1 (1,5) B (1,1) FTP COMPRESS 1 (1,6) B (1,2) IMAP DISCOVER 2 (1,7) (1,8) B (1,3) LDAP FRENCH 1 (1,9) B (1,4) POP GIFF 1 (1,10) B (1,5) SKYPE JPEG 2 (1,11) (1,12) B (1,6) SMB RUSSIAN 1 (1,14) B (1,7) AOL POP B (1,9) AOL B (1,7) IMAP B (1,8) SKYPE B (1,4) FTP B (1,3) LDAP B (1,1) TELNET B (1,6) SMTP B (1,5) SSH B (1,2) SMB B (1,0)

35 35 Conclusion B-tree forests are limited to data aggregates. Data aggregates only identify the existence of a dimensional combination. They do not provide access to complete data records. With current OLAP implementations, examining data records requires issuing additional database queries, which is inefficient. We solve this problem by extending a balanced b-tree forest to include references to data records. We call this new type of hypercube: the data value cube. Thus for our data cube, tree nodes not only contain data aggregates but a linked list of data record indices.

36 36 THE Q&A Stephen A. Broeker


Download ppt "1 Tough Choices Materialize nothing. Compute every cell on demand. Worst query response time. No space requirements. Materialize part of the data cube."

Similar presentations


Ads by Google