C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially"> C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data.

Similar presentations


Presentation on theme: "Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data."— Presentation transcript:

1 Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data sets, can have 10,000 attributes and 100 records. 10, 000 attributes, up to 100 million combinations of two attributes and up to 1 trillion 3 attribute sets!

2 Data Driven Algorithm Constructing the Max-conf kernel for small data sets: Input: i) a Database DB ii) a fixed consequent C Output: a set R of rules such that for any rule of the form X->C there exists a rule X'->C in R, where X' is a superset of X and X'->C has a a higher confidence then X->C

3 Algorithm: // DB(C) is the set of records that satisfy the consequent // RS is a working set which maintain the current subset of records that satisfy the consequent COMMON is the set of common descriptors for the record set RS; MaxConfKernelSet(DB, C, DB(C), RS, COMMON) { i= size(RS)+1; if (i==1) { COMMON=Descriptors in the ith record in DB(C);} RS=RS \union {ith record in DB(C)}; while (i<=size(DB(C))) do { Delete from COMMON the descriptors not shared by the ith record; Compute support of records satisfying {COMMON-C}; Compute the confidence of COMMON-C->C; if (COMMON-C)!=null) { if sufficient support and not duplicate output "COMMON-C->C [support, conf]" ; MaxConfKernelSet(DB, C, DB(C), RS, COMMON); RS=RS-{ith record in DB(C)}; i++; RS=RS \union {ith record in DB(C)}; } Invoke: MaxConfKenalSet(DB,C, DB(C), null, null); // RS, COMMON is empty initially

4 OLAP and Statistical databases Statistical databases – from early 80s –Mutidimensional datasets concerned with summariziation over the dimensions of the data sets. 2-D representations – census, socioeconomic data etd OLAP: on line analytical processing: mid 90s

5 Multi-dimensional Statistical Table

6 2-D representation of statistical data

7 A graph model for statistical data

8 A scheme for stat data

9 More schemes

10

11 Relational representation of statistical object

12 Automatic aggregation concept

13 Terms in SDB and OLAP

14 SDB and OLAP operators

15 Completeness of statistical algebra

16 Overlapping and timevarying categories

17 Physical organization

18 Encoding column category values

19 Array linearization

20 Header compression

21 Lattice of materialization

22 Partitioning of a data cube into subcubes

23 Cube operator

24 Data Cube – shortcomings of SQL

25 Sales Roll Up by Model by Year and by color

26 Using ALL value

27 3 dimensional rollup in SQL

28 Cross-tabulation in SQL

29 Cross Tabulation

30 CUBE operator

31 Support of histograms

32 A 3D data cube

33 ALL value and decoration field

34 Decorations

35 ROLLUP operator

36 Percentage of total as an aggregate function

37 Indices

38 STAR scheme

39


Download ppt "Horizontal data sets: Number of attributes is of the same order to several orders of magnitude higher than the number of records. Example: genetic data."

Similar presentations


Ads by Google