Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.

Similar presentations


Presentation on theme: "Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover."— Presentation transcript:

1 Data Mining By Fu-Chun (Tracy) Juang

2 What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover rules and patterns from data. ► Similar to knowledge discovery (in artificial intelligence) or statistical analysis. ► => Knowledge discovery in database.

3 Type of Knowledge Discovered ► Classification ► Association Rules ► Clustering ► Others -- Sequential Pattern -- Pattern within Time Series -- Pattern within Time Series

4 Classification ► Deal with Prediction ► Work from an existing set of events to create hierarchy of classes. Use this classification hierarchy to predict which “ class ” a new item belong. Use this classification hierarchy to predict which “ class ” a new item belong.

5 Classification (cont.) ► Example: Credit-card company classified population into 4 range of credit worthiness (bad, average, good and excellent) based on payment history of the existing customers. Credit-card company classified population into 4 range of credit worthiness (bad, average, good and excellent) based on payment history of the existing customers. The company will find some rules between credit worthiness and other information about the customers, such as their educational history, age and salary. The company will find some rules between credit worthiness and other information about the customers, such as their educational history, age and salary. Use this classification rules to determine (predict) credit worthiness of a new applicant. Use this classification rules to determine (predict) credit worthiness of a new applicant.

6 Classification : Rules ► Some of the rules looks like: ∀ person P, P.degree = masters and P.income > 75,000 => P.credit = excellent P.income > 75,000 => P.credit = excellent ∀ person P, P.degree = bachelors or ( P.income ≥ 25,000 and P.income ≤75,000) ( P.income ≥ 25,000 and P.income ≤75,000) => P.credit = good => P.credit = good

7 Classification : Decision-Tree ► A popular technique for classification. ► Each leaf node of the tree represents a class ( e.g. good credit & bad credit) ► Each internal node has a function associate with it, to determine which child to go to for the new item. (e.g. married & salary range) (e.g. married & salary range) ► When trying to place a new item in a class, we traverse the decision-tree until we reach a leaf node.

8 Decision-Tree

9 Classification : Regression ► A special application of classification rules. ► Regression deals with the prediction of a value, rather than a class. ► e.g. If having a series of test results of a patient, use regression rule to predict the probability of survival of that patient.

10 Association Rules ► Retail shops are often interested in Associations between different items that people buy. ► X => Y, if a costumer buys X, he is likely to buy Y ► e.g. A female retail shopper buys a handbag, she is likely to buy shoes. association rule: Handbag => Shoes association rule: Handbag => Shoes ► e.g. A person who bought the book Database System Concept is likely to buy Operating System Concepts. association rule: DBS Concept => OS Concept association rule: DBS Concept => OS Concept

11 Association Rules : Support & Confidence ► Association Rules need to have degree of Support and Confidence. ► Data miners use Support and Confidence of the association rules to determine whether the particular association rule is significant.

12 Association Rule: Support ► Support is a measure of what fraction of the population satisfies both LHS and RHS of the rule. ► Which is how frequently a specific itemset (LHS + RHS) occurs in the database. (LHS + RHS) occurs in the database. ► If only 0.001% of all purchases in store include Milk and Screwdrivers, then the support of rule: milk => screwdriver is low. milk => screwdriver is low. ► If 50% purchases include Milk and Juice, the support of rule: milk => juice is high.

13 Association Rule: Confidence ► Confidence is a measure of how often the RHS (consequent) is true when the LHS (antecedent) is true ► e.g. the rule: bread => milk has a confidence of 80% if 80% of the purchases that include bread also include milk. has a confidence of 80% if 80% of the purchases that include bread also include milk. ► A rule with low confidence is not meaningful.

14 Clustering ► Clustering is to group similar points together in a single set. ► In business, groups of customers who has similar buying patterns. ► In medicine, groups of patients who shows similar reactions to prescribed drugs.

15 References ► A. Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed., McGraw-Hill, 2006 ► R. Elmasri, S.B. Navathe: Fundamentals Of Database Systems, 4 th ed., Addison Wesley, 2003


Download ppt "Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover."

Similar presentations


Ads by Google