Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theoretic Frameworks for Data Mining Reporter: Qi Liu.

Similar presentations


Presentation on theme: "Theoretic Frameworks for Data Mining Reporter: Qi Liu."— Presentation transcript:

1 Theoretic Frameworks for Data Mining Reporter: Qi Liu

2 How can be a framework for data mining? Encompass all or most typical data mining tasks Have a probablistic nature Be able to talk about inductive generations Deal with different types of data Recognize the data mining process as an iterative and interactive process Account for background knowledge in deciding what is an interesting discovery

3 Statistics Framework Statistics viewpoint: – Volume of data – Computational feasibility – Database integration – Simplicity of use – Understandablity of results

4 Machine Learning Framework Data mining is applied machine learning Machine learning focuses on the prediction, based on known properties learned from the training data Data mining focuses on the discovery of (previously) unknown properties on the data Data mining can not use supervised methods due to unavailability

5 Probablistic Framework To find the underlying joint distribution(e.g., Bayesian network) of the variables in the data. Advantages: – Solid background – Clustering/Classification fit easily into this framework Lackage: – Can not take the iterative and interactive nature of the data mining process into account

6 Data Compression Framework Goal: to compress the data set by finding some structure for it and then encoding the data using few bits. Minimum description length(MDL) principle Instances: association rules, a decision tree, clustering

7 Microeconomic Framework To find actionable patterns that increase utility Define utility function from a perspective of customers

8 Inductive Database Framework Store both data and patterns An inductive database I(D,P) consist of a data component D and a pattern component P. We assume that both the data and the pattern components D and P are sets of sets. This assumption is motivated by an analogy with traditional relational databases. PS: deductive database: partial rules

9 Information Theoretic Framework Data mining is a process of information transimission from an algorithm to data miner. Model the data miner’s state of mind as a probability distribution, called the background distribution, which represents the uncertainty and misconceptions. In the data mining process, properties of the data(referred as patterns) are revealed.

10

11 Attention! Focus on the data miner as much as on the data. An interesting pattern should be defined subjectively, rather than objectively. The primary concern is understanding the data itself, rather than the stochastic source than generated it.

12 Bird’s eye view on IT framework – A data miner is able to formalize her beliefs in a background distribution, denoted P* – Kraft’s inequality is an equality – Code length of x with a probability P: -log(P(x)) – The entropy of P* could be small due to the data miner being overly confident – Update P* to be a new background distribution P*’ – Measure the reduction of code length:

13 Trade-off Good data mining algorithms are those that are able to pinpoint those patterns that lead to a large information gain. A trade-off between the information gain due to the revealing of a pattern in the data, and the description length of the pattern, that should define a pattern’s interest to the data miner.

14 How to determine P* and P*’?

15 Patterns

16 More issues about the framework The cost of a pattern should be specified in advance by the data miner. Joint Patterns Cases: – Clustering and alternative clustering – Dimensionality reduction(PCA) – Frequent pattern mining – Community detection – Subgroup discovery and supervised learning


Download ppt "Theoretic Frameworks for Data Mining Reporter: Qi Liu."

Similar presentations


Ads by Google