Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany.

Similar presentations


Presentation on theme: "6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany."— Presentation transcript:

1 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany

2 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 2 Data Mining What is Data mining? Data mining primitives –Task-relevant data –Kinds of knowledge to be mined –Background knowledge –Interestedness measures –Visualisation of discovered patterns Query language

3 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 3 Data Mining Concept Description (Descriptive Datamining) –Data generalisation Data cube (OLAP) approach (offline pre-computation) Attribute-oriented induction approach (online aggregation) Presentation of generalisation Descriptive Statistical Measures and Displays

4 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 4 What is Data mining? Discovery of knowledge from Databases –A set of data mining primitives to facilitate such discovery (what data, what kinds of knowledge, measures to be evaluated, how the knowledge is to be visualised) –A query language for the user to interactively visualise knowledge mined

5 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 5 Data mining primitives I Task-relevant data: attributes relevant for the study of the problem at hand Kinds of knowledge to be mined: characterisation, discrimination, association, classification, clustering, evolution,… Background knowledge: Knowledge about the domain of the problem (concept hierarchies, beliefs about the relationships, expected patterns of data, …)

6 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 6 Data mining primitives II Interestedness measures: support measures (prevalence of rule pattern) and confidence measures(strength of the implication of the rule) Visualisation of discovered patterns: rules, tables, charts, graphs, decision trees, cubes,…

7 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 7 Task-relevant Data Steps: Derivation of initial relation through database queries (data retrieval operations). (Obtaining a minable view) Data cleaning & transformation of the initial relation to facilitate mining Data mining

8 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 8 Kinds of knowledge to be mined Kinds of knowledge & templates (meta- patterns, meta-rules, meta-queries) –Association An Example: age(X:customer, W) Λ income(X, Y)  buys(X, Z) –Classification –Discrimination –Clustering –Evolution analysis

9 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 9 Background knowledge Knowledge from the problem domain –usually in the form of concept hierarchies (rolling up or drilling down) schema hierarchies (lattices) set-grouping hierarchies (successive sub-grouping of attributes) rule-based hierarchies

10 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 10 Interestedness measures I Simplicity: More complex the structure, the more difficult it is to interpret, and so likely to be less interesting (rule length,…) Certainty: Validity, trustworthiness # tuples containing both A and B confidence(A  B)  # tuples containing A Sometimes called “certainty factor”

11 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 11 Interestedness measures II Utility: Support is the percentage of task- relevant data tuples for which the pattern is true # tuples containing both A and B support(A  B)  total # tuples

12 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 12 Visualisation of discovered patterns Hierarchies tables pie/bar charts dot/box plots ……

13 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 13 Descriptive Datamining (Concept Description & Characterisation ) Concept description:Description of data generalised at multiple levels of abstraction Concept characterisation: Concise and succinct summarisation of a given collection of data Concept comparison: Discrimination

14 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 14 Data Generalisation Abstraction of task-relevant high conceptual level data from a database containing relatively low conceptual level data –Data cube (OLAP) approach (offline pre- computation) (Figs 2.1 & 2.2, pages 46 &47) –Attribute-oriented induction approach (online aggregation) Presentation of generalisation (Tables 5.3 & 5.4 on p. 191, and Figs 5.2, 5.3, & 5.4 on pages 192 & 193)

15 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 15 Descriptive Statistical Measures and Displays I Measures of central tendency –Mean, Weighted mean (weights signifying importance or occurrence frequency) –Median –Mode Measures of dispersion –Quartiles, outliers, boxplots

16 6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 16 Descriptive Statistical Measures and Displays II Displays –Histograms (Fig 5.6, page 214) –Barcharts –Quantile plot (Fig 5.7, page 215) –Quantile-Quantile plot (Fig 5.8, page 216) –Scatter plot (Fig 5.9, page 216) –Loess curve (Fig 5.10, page 217)


Download ppt "6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany."

Similar presentations


Ads by Google