Presentation on theme: "RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)"— Presentation transcript:
RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)
2 Data Warehousing
5 Technology Layout
6 Two-Level Computing Large Data (10TB) and Mixed Workloads
7 Rough Sets Sport? = Yes Classes of records with the same values of the subset of the attributes
8 Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...
Packs storing the values of records for column Salary We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Rough Sets in Infobright Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes
10 Information Systems in Infobright Query minOUT max Nulls sum match ??? pattern
11 SELECT MAX(A) FROM T WHERE B>15; STEP 1STEP 2STEP 3DATA
Order Number Order Date Part ID Quantity$Amt Supplier ID Effective Date Expiry Date Part ID Description A Null234Pre-measured coffee packets – gold blend A Null235Pre-measured coffee packets – silver blend A Null3344-cup Cone coffee filters; quantity 50 Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows Advanced Knowledge Nodes Pack 1Pack 2 Pack 101 Pack 210 Pack 300
13 Community Inspirations Count Distinct Count(*) on Self-Joins Decision Trees Contingencies New Objectives New Schemas New Volumes New Queries New KNs New Data Types SQL Extensions Feature Extraction Data Compression
14 Conclusion Technology based on interaction between rough and precise operations, open for adding new structures Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression The core technology based on more data mining, rough sets, computing with rough values, et cetera Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions
15 References D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright- house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): (2008). M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, J. Wróblewski: Method and System for Data Compression in a Relational Database. US Patent Application, 2008/ A1. J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, M. Wojnarski: Method and System for Storing, Organizing and Processing Data in a Relational Database. US Patent Application, 2008/ A1.