Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.

Similar presentations


Presentation on theme: "Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006."— Presentation transcript:

1 Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006

2 Overview Explanation of Data Mining Benefits of Data Mining Data Mining Background Data Mining Models Data Warehousing Problems and Issues of Data Mining Potential Applications of Data Mining

3 What Is Data Mining? Data mining is: The automated extraction of hidden predictive information from databases. It is an extension of statistics with a few artificial intelligence and machine learning twists.

4 What Is Data Mining? (cont.) Now the term data mining is stretched beyond its limits and applied to any form of data analysis. It encompasses a number of different technical approaches, such as clustering, data summarization, learning classification rules, finding dependency networks, analyzing changes, and detecting anomalies.

5 Why Data Mining? Data mining software allows users to analyze large databases to solve business decision problems. For example, the data mining software would use the historical information of previous interaction between a business and its customer to build a model of customer behavior for predicting customer responses to new products.

6 Data Mining Background Data mining research has drawn on a number of other fields:

7 Data Mining Background Data mining research has drawn on a number of other fields: Machine learning

8 Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics

9 Data Mining Background Data mining research has drawn on a number of other fields: Machine learning Statistics Inductive learning

10 Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies:

11 Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning

12 Inductive Learning Strategies Inductive learning where the system infers knowledge itself from observing its environment has two main strategies: Supervised learning Unsupervised learning

13 Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users:

14 Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model

15 Data Mining Models IBM has identified two types of models or modes of operation which may be used to reveal information of interest to users: Verification Model Discovery Model

16 Data Warehousing Data mining potential can be enhanced if the appropriate data has been collected and stored in a data warehouse. The data warehousing market consists of tools, technologies, and methodologies that allow for the construction, usage, management, and maintenance of the hardware and software used for a data warehouse, as well as the actual data itself.

17 Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way:

18 Data Warehouse The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time- variant and non-volatile collection of data in support of management's decision making process".

19 Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.

20 Data Warehouse (cont.) Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

21 Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period.

22 Data Warehouse (cont.) Time-Variant: All data in the data warehouse is identified with a particular time period. Non-Volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

23 Problems and Issues of Data Mining Data mining systems rely on database to supply the raw data for input. Problems rise because databases tend to be dynamic, incomplete, noisy, and large. Other problems relate to adequacy and the information stored.

24 Problems and Issues

25 Limited information

26 Problems and Issue Limited information Uncertainty

27 Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields

28 Problems and Issue Limited information Uncertainty Size, update, and irrelevant fields Noise and missing values

29 Ways to Treat Missing Data by Discovery Systems

30 Simplify disregard missing values.

31 Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records.

32 Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values.

33 Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain.

34 Ways to Treat Missing Data by Discovery Systems Simplify disregard missing values. Omit the corresponding records. Infer missing values from known values. Treat missing data as a special value to be included additionally in the attribute domain. Average over the missing values using Bayesian techniques.

35 Potential Applications of Data Mining

36 Retail and Marketing

37 Potential Applications of Data Mining Retail and Marketing Identify buying patterns from customers Find associations among customer demographic characteristics Predict response to mailing campaigns Analyze Market basket

38 Potential Applications of Data Mining Banking

39 Potential Applications of Data Mining Banking Detect patterns of fraudulent credit card use Identify “loyal” customers Predict customers likely to change their credit card affiliation Determine credit card spending by customer groups Find hidden correlations between different financial indicators Identify stock trading rules from historical market data

40 Potential Applications of Data Mining Insurance and Health Care

41 Potential Applications of Data Mining Insurance and Health Care Claim analysis – i.e. which medical procedures are claimed together Predict which customers will buy new policies Identify behavior patterns of risky customers Identify fraudulent behavior

42 Potential Applications of Data Mining Transportation

43 Potential Applications of Data Mining Transportation Determine the distribution schedules among outlets Analyze loading patterns

44 Potential Applications of Data Mining Medicine

45 Potential Applications of Data Mining Medicine Characterize patient behavior to predict office visits Identify successful medical therapies for different illnesses

46 References Dilly, R. (n.d.). Retrieved March 30, 2006, from Data Mining Web site: http://www.ppc.qub.ac.uk/tec/courses/determining/stu_notes/d m_book_1.html Reed, M. (n.d.). A definition of data warehousing. Retrieved March 30, 2006, from Internet Journal Web site: http://www.intranetjournal.com/features/datawarehousing.html. Thearling, K. (n.d.). Retrieved March 30, 2006, from Information about data mining and analytic technologies Web site: http://www.thearling.com/.


Download ppt "Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006."

Similar presentations


Ads by Google