1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Advanced SQL Topics Edward Wu.
Chapter 1: The Database Environment
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Working with MS-ACCESS IS 240 – Database Management Lecture #2 – Assoc. Prof. M. E. Kabay, PhD, CISSP Norwich University
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
IBM Haifa Research Lab © 2008 IBM Corporation Contacts: Simona Cohen, Michael Factor, Dalit Naor
Relational Database and Data Modeling
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Introduction to Algorithms
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 6: Modular Programming Problem Solving & Program Design in.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 11: Structure and Union Types Problem Solving & Program Design.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Winter Education Conference Consequential Validity Using Item- and Standard-Level Residuals to Inform Instruction.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Relational data integrity
01/04/2014 cis110 1 Introduction to computing and the Internet Dr. Lahcen Ouarbya 29 St-James, Room 6 Tel:
So far Binary numbers Logic gates Digital circuits process data using gates – Half and full adder Data storage – Electronic memory – Magnetic memory –
RDFTL: An Event-Condition- Action Language for RDF George Papamarkos Alexandra Poulovassilis Peter T. Wood School of Computer Science and Information Systems.
CS2008 Data Management CS5035 Introduction to Database Systems Nigel Beacham based on materials by Dr Yaji.
SQL: The Query Language Part 2
1 NatQuery 3/05 An End-User Perspective On Using NatQuery To Extract Data From ADABAS Presented by Treehouse Software, Inc.
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
A View of the Business with Drillable Graphics Southern Computer Measurement Group May, 2012 Martha Hays.
Configuration management
Software change management
Information Systems Today: Managing in the Digital World
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
Database Performance Tuning and Query Optimization
ABC Technology Project
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
R ELATIONAL M ODEL TO SQL Data Model. 22 C ONCEPTUAL D ESIGN : ER TO R ELATIONAL TO SQL How to represent Entity sets, Relationship sets, Attributes, Key.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Yong Choi School of Business CSU, Bakersfield
Chapter Information Systems Database Management.
1 A Case for MLP-Aware Cache Replacement International Symposium on Computer Architecture (ISCA) 2006 Moinuddin K. Qureshi Daniel N. Lynch, Onur Mutlu,
Slide 14-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 5 14 Protection and Security.
©J.Tiberghien - ULB-VUB Version 2007 Première partie, chap. 2, page 1 Chapitre 1.2 Systèmes dexploitation.
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
HORIZONT TWS/WebAdmin TWS/WebAdmin for Distributed
Lecture 1 – Introduction, Overview
Splines IV – B-spline Curves
SIMOCODE-DP Software.
1 Overview of Testing Methodology Obtain generic Test Scripts Populate Self Testing Work Paper Template Prepare actual test scripts Determine sample Combine.
ETIS+: European Transport Policy Information System - Development and Implementation of Data Collection Methodology for EU Transport Modelling Funded by.
Addition 1’s to 20.
25 seconds left…...
1 Institutional Repository Workshop 1 – 3 April 2009 Presented by Leonard Daniels.
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
Essential Cell Biology
Chapter 15 A Table with a View: Database Queries.
Intracellular Compartments and Transport
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
Essential Cell Biology
14 Databases Foundations of Computer Science ã Cengage Learning.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
Computer Concepts BASICS 4th Edition
How Cells Obtain Energy from Food
Addition 14 Days (English) Slide Show Menu 1st2ndEnd DAY 11+1=2 1+2=3 1+3=4*1+4=5 1+5=6 1+6=7*1+7=8 1+8=9 1+9=10* DAY 22+2=4 2+3=5 2+4=6*2+5=7 2+6=8 2+7=9*2+8=10.
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
SQL Server 2012 Data Warehousing Deep Dive Dejan Sarka, SolidQ
Presentation transcript:

1 Presented by: Victor Gonzalez-Castro Lachlan MacKinnon A survey “Off the Record” – Using Alternative Data Models to Increase Data Density in Data Warehouse Enviroments.

2 Agenda  Introduction  Data Sparsity  State of the art  Relational Model  The Triple Store  The Binary Model  The Associative model  The Transrelational model  Our proposal  Questions

3 Introduction In Data Warehouse environments Data Sparsity is a common issue that remains unresolved. Alternative Data Models that abandon the traditional record storage/manipulation structure have been researched. We are investigating the use of these alternative data models to increase data density with the idea to decrease data sparsity.

4 Origin of Data Sparsity Data sparsity is originated from the aim of answering all possible user queries from the information stored in a Data Warehouse that contains Nulls. $ $ $ $ $ $ $ $ $ $ $ $ $ Time Dimension Month Year Day Fig.1. A three level dimension and Nulls. After [6][6]

5 Origin of Data Sparsity (Cont…) Data Sparsity is the result of the Cartesian product of all dimensions and all aggregation levels. (Sparse) (Dense) Fig.2. Data Sparsity and data density. From [6].

6 State of the art. (Relational) The Relational Model [7] uses the traditional record storage/manipulation structure.[7] 1234NutRedLondon It is the base model against which the other models will be compared. All RDBMS made a poor management of sparsity (missing information). Codd [7] suggested a fundamental change in the relational Model V2, the use of a 4 value-logic. No one has implemented this fundamental change

7 State of the art. (Relational) Major players on the Relational Market / SQL Server

8 State of the art. (TripleStore) IdentifierName 1Nut 2Red 3London …… The Triple Store. [1],[2]. It uses a Structure called the Name Store to keep all the names.[1],[2]. To construct the processing Structure, uses Triples ……… lmn

9 State of the art. (TripleStore) The major project in Triple Store is TriStarp Tristarp was stablished in Leaded by Peter King with Support from IBM Hursley labs. Dr. Sharman from IBM Hursley [1] is visiting the Tristarp team.[1] Current directions Further development of the persistent Triple Store Repository. Continuing Research on the graph-based model. Extending technology to manage partially structured data

10 State of the art. (Binary) SurPnameColorCity s1NutRedLondon s2BoltGreenParis s3ScrewBlueOslo The Binary Model [4] considers that all tables are Binary tables.[4] SurCity s1London s2Paris s3Oslo SurPname s1Nut s2Bolt s3Screw SurColor s1Red s2Green s3Blue

11 State of the art. (Binary) A Major Project in the Binary Model [4] is MONETDB.[4] Is a DBMS designed to provide high performance on complex queries against real-world sized database. Achieves this goal using innovations at all layers of a DBMS: a storage model based on vertical fragmentation, processing speed by self-tuning relational operators, algorithms designed to exploit modern hardware, self- managing indexing structures, modular and extensible software architecture, etc. It is developed at the Institute for Mathematics and Computer Science Research of The Netherlands.

12 State of the art. (Associative) IdentifierName 77Nut 08Red 32London 12That is 67Is located in The Associative Model [3] comprises two types of data structures Items and Links.[3 It differs from Binary and Triple store in one fundamental way; Associations themselves may be either the source or the target of other associations. It uses Quadruplets. IdentifierSourceVerbTarget

13 State of the art. (Associative) The Major product in the Associative Model is SentencesDB. Instead of using a separate, unique table for every different type of data, it uses a single, generic structure to contain all types of data. Information about the logical structure of the data and the rules that govern it are stored alongside the data in the database. The programs are truly reusable, and no longer need to be amended when the data structures change.

14 State of the art. (Transrelational) The TransRelational Model TM. [5] keeps the Relational model itself but abandon the record storage structure. It uses two structures:[5] The Record Reconstruction Table. The Field Values Table. Since there is currently no instantiation of the Transrelational Model available, We will build an implementation of the essential algorithms.algorithms P#PNAMECOLORCITY P1BoltBlueLondon P2CamBlueLondon P3CogGreenLondon P4NutRedOslo P5ScrewRedParis P6ScrewRedParis P#PNAMECOLORCITY

15 TransrelationalTransrelational. Algorithms TransrelationalTransrelational. Algorithms P#PNAMECOLORCITY P1NutRedLondon P2BoltGreenParis P3ScrewBlueOslo P4ScrewRedLondon P5CamBlueParis P6CogRedLondon P#PNAMECOLORCITY P#PNAMECOLORCITY P1BoltBlueLondon P2CamBlueLondon P3CogGreenLondon P4NutRedOslo P5ScrewRedParis P6ScrewRedParis Field Values Table (FVT) 1. A file for the suppliers relation 2. Sort each column in asc. Record Reconst. Table (RRT) P#PNAMECOLORCITY P1London NutRed 1. Go to Cell [1,1] of the FVT, fetch the value stored (P1). 3. Go to the corresponding RRT cell [4,2] and fetch the row number (4). The next (3 rd or COLOR) is the 4 th row in the FVT (Red). 5. Go to the corresponding RRT cell [4,1] and fetch value (1). The next 5 th column does not exist, so it wraps around to the 1 st column, so then is the 1 st row in the FVT. 4. Go to the corresponding RRT cell [4,3] and fetch value (1). The next 4 th or CITY) is the 1 st row in the FVT (London). 2. Go to the same cell [1,1] in the RRT and fetch the value (4). It is interpreted to mean that the next field value (PNAME), is in the 4 th row of the FVT. Go to that cell and fetch the value (Nut)

16 Alternative Data Models Comparison ModelStorage StructureLinkage Structure RelationalTable (Relation)By position Triple StoreName StoreTriple Store BinaryBinary TableJoins AssociativeItemsLinks TransrelationalField Values TableRecord Reconstruction Table

17 Our proposal (Our aims) To carry out an impartial survey on alternative Data Models. Compare whether or not the use of alternative data models can improve the Data Density in Data Warehouse environments. Observe the effect that such data density increase has on the data sparsity.

18 Our proposal (How…) We intend to use an implementation of each data model TransRelational TM We will use TPC-H data set to load each database. Run a set of benchmark metrics, where available if not we will develop our metrics to determine relative performance and then consider relative data density and sparsity.

19 Just Remember… Instead of storing data horizontally, do it vertically and eliminate duplicate values Bolt Screw Nut Nail Black Blue White Paris London Here are the Savings We are abandoning the traditional Record Structure, we are going “off the record”.

20 Questions?

21 Thanks !!

22 References 11. G C H Sharman and N Winterbottom, The Universal Triple Machine: a Reduced Instruction Set Repository Manager. Proceedings of BNCOD 6, pp , TriStarp Web Site: Updated November, Simon Williams. The Associative Model of Data, Second Edition, Lazy Software Ltd. ISBN: MonetDB. © by CWI. by CWIhttp://monetdb.cwi.nl 55. Date, C.J. An introduction to Database Systems. Appendix A. The Transrelational Model, Eighth Edition. Addison Wesley USA. ISBN: Pendse Nigel. Database explosion. Updated Aug, Codd, E.F. The Relational Model for Database Management Version 2. Addison-Wesley ISBN

23 P1BoltBlueLondon P2CamBlueLondon P3CogGreenLondon P5ScrewRedParis P4NutRedOslo Just Remember… Instead of store data horizontally, do it vertically and eliminate duplicate values. P6ScrewRedParis

24 Just Remember… Instead of storing data horizontally, do it vertically and eliminate duplicate values. P1 P2 P3 P4 P5 P6 Bolt Cam Cog Nut Screw Blue Green Red London Oslo Paris Here are the Savings We are abandoning the traditional Record Structure, we are going “off the record”.