RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Slides:



Advertisements
Similar presentations
Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
Advertisements

Blazing Queries: Using an Open Source Database for High Performance Analytics July 2010.
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
Exploring Microsoft Access 2003 Chapter 3 Information From the Database: Reports and Queries.
C6 Databases.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Concepts of Database Management Seventh Edition
Concepts of Database Management Sixth Edition
Concepts of Database Management Seventh Edition
A Fast Growing Market. Interesting New Players Lyzasoft.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Query-By-Example (QBE) 2440: 180 Database Concepts.
CSE 190: Internet E-Commerce Lecture 10: Data Tier.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Data Resource Management Data Concepts Database Management Types of Databases Chapter 5 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
MS Access: Database Concepts Instructor: Vicki Weidler.
Warehouse Activity Profiling
Concepts of Database Management, Fifth Edition
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION GLOBAL EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE ENHANCING DECISION MAKING Lecture.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Microsoft Access 2010 Building and Using Queries.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
Using SAS® Information Map Studio
CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.
Using Special Operators (LIKE and IN)
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Louisville User Group Meeting April 25, 2012 Lori Pieper Maximize WebFOCUS Performance with Hyperstage.
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
Views, Algebra Temporary Tables. Definition of a view A view is a virtual table which does not physically hold data but instead acts like a window into.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Advanced Adhoc Reporting 2010 Visions Conference July 28, 2010.
InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Data Warehousing.
Advanced Database Concepts
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Lesson 4: Querying a Database. 2 Learning Objectives After studying this lesson, you will be able to:  Create, save, and run select queries  Set query.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
An Overview of Data Warehousing and OLAP Technology
Column Oriented Database By: Deepak Sood Garima Chhikara Neha Rani Vijayita Gumber.
IT 5433 LM4 Physical Design. Learning Objectives: Describe the physical database design process Explain how attributes transpose from the logical to physical.
Intro to MIS – MGS351 Databases and Data Warehouses
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Building and Using Queries
Databases and Data Warehouses Chapter 3
MANAGING DATA RESOURCES
A developers guide to Azure SQL Data Warehouse
Data Warehousing Data Model –Part 1
Structured Query Language – The Fundamentals
Query Functions.
CSTORE E0261 Jayant Haritsa Computer Science and Automation
DOMINIK ŚLĘZAK ROUGH SETS and FCA Foundations and Case Studies of Feature Subset Selection and.
Slides based on those originally by : Parminder Jeet Kaur
Presentation transcript:

RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

2 Data Warehousing

3

4

5 Technology Layout

6 Two-Level Computing Large Data (10TB) and Mixed Workloads

7 Rough Sets Sport? = Yes Classes of records with the same values of the subset of the attributes

8 Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...

Packs storing the values of records for column Salary We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Rough Sets in Infobright Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes

10 Information Systems in Infobright Query minOUT max Nulls sum match ??? pattern

11 SELECT MAX(A) FROM T WHERE B>15; STEP 1STEP 2STEP 3DATA

Order Number Order Date Part ID Quantity$Amt Supplier ID Effective Date Expiry Date Part ID Description A Null234Pre-measured coffee packets – gold blend A Null235Pre-measured coffee packets – silver blend A Null3344-cup Cone coffee filters; quantity 50 Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows Advanced Knowledge Nodes Pack 1Pack 2 Pack 101 Pack 210 Pack 300

13 Community Inspirations  Count Distinct  Count(*) on Self-Joins  Decision Trees  Contingencies  New Objectives  New Schemas  New Volumes  New Queries  New KNs  New Data Types  SQL Extensions  Feature Extraction  Data Compression

14 Conclusion  Technology based on interaction between rough and precise operations, open for adding new structures  Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression  The core technology based on more data mining, rough sets, computing with rough values, et cetera  Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions

15 References  D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright- house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): (2008).  M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, J. Wróblewski: Method and System for Data Compression in a Relational Database. US Patent Application, 2008/ A1.  J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, M. Wojnarski: Method and System for Storing, Organizing and Processing Data in a Relational Database. US Patent Application, 2008/ A1.

THANK YOU!!! RSCTC 2008