Presentation is loading. Please wait.

Presentation is loading. Please wait.

Academic Year 2014 Spring Academic Year 2014 Spring.

Similar presentations


Presentation on theme: "Academic Year 2014 Spring Academic Year 2014 Spring."— Presentation transcript:

1 Academic Year 2014 Spring Academic Year 2014 Spring

2 MODULE CC3005NI: Advanced Database Systems “Distributed Database (DDB) and Data Mining (DM)” (PART – 2) Academic Year 2014 Spring Academic Year 2014 Spring

3  Data Mining is a knowledge discovery process of automated extraction of hidden predictive information or patterns from data in large databases.  Key Points  Knowledge discovery in databases – KDD  Automated process  Extraction or searching for interesting / useful information or pattern or trend  From large databases Data Mining:

4  Problem: Data Explosion  Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge! Motivation:

5  Solution: Data Warehousing and Data Mining  Data Warehousing and On-Line Analytical Processing (OLAP)  Extraction of interesting knowledge (rules, regularities, patterns, future trends, predictions) from large databases. Motivation:

6  Data Mining aims to find information from data in:  Relational Databases  Data Warehouses  Advanced Database and information repositories o Object Oriented and Object Relational Databases o Spatial Databases o Time-Series Data and Temporal Data o Text databases and Multimedia Databases o Heterogeneous and Legacy Databases o WWW Data Mining: From Where?

7  Data Mining is aimed at providing these capabilities:  Automated discovery of previously unknown patterns o Data Mining tools sift through large amounts of data to discover meaningful new correlations and hidden patterns  Automated prediction of trends and behaviours o Data Mining tools predict future trends, behaviours, allowing businesses to make proactive, knowledge driven decisions  Results used to help business make better business decisions and to gain a competitive advantages Data Mining: Objectives

8  Database Analysis and Decision Support  Market Analysis and Management o Target Marketing, Customer Relation Management, Market Basket Analysis, Market Segmentation  Risk Analysis and Management o Forecasting, Customer Retention, Quality Control, Competitive Analysis  Fraud Detection and Management o Identify unusual spending patterns, Irregularities  Other Applications o Text Mining (News Group, email, Documents) and Web Analysis o Intelligent Query Answering Data Mining: Example

9 Data Mining: Interdisciplinary Subject Data Mining Database Technology Statistics Visualization Artificial Intelligence Machine Learning Other Disciplines

10  Process of Information / Knowledge Extraction is carried out repetitively, adaptively and progressively.  Comprehension of application domain  Preparation of Data sets  Discovery of Patterns  Evaluation of Patterns and implications  Comprehension of Application Domain  Develop a good understanding of application domain. Data Mining Process:

11  Preparation of data sets  Identify a subset of data of database / Data Warehouse on which to carry out Data Mining  Encode / cleaning data to make it suitable input to Data Mining Algorithms  Discovery of Patterns  Apply techniques of Data Mining on data set extracted earlier in order to discover repetitive patterns in data. Data Mining Process:

12  Evaluation of Patterns and Implications  Draw implications from discovered patterns  Evaluating which experiments to carry out next, which hypothesis to formulate, or which consequences to draw in process of knowledge discovery. Data Mining Process:

13  There are various techniques / algorithms used for Data Mining, including:  Association  Classification  Sequential Patterns  Patterns with Time Series  Categorisation and Segmentation Data Mining: Some Techniques

14  Association rules discover regular patterns within large data sets, such as presence of two items within group of tuples.  These rules discover situation in which presence of item in transaction is linked to presence of another item with high probability. Association Rules

15  Quality of association rules can be measured precisely, by defining properties of SUPPORT and CONFIDENCE. SUPPORT is minimum (percentage) of transactions (or baskets) containing both items A and B (A and B could both be single or group items) CONFIDENCE is minimum (percentage) of those baskets containing both items A and B, among those containing A. Association Rules

16 milk + bread + cereal milk + bread + sugar + eggs milk + bread + butter Shopping Baskets milk + bread + butter Customer - 1Customer - 2Customer - 3 Customer - n hmmm... which items are frequently purchased together by my customer? Marketing Analyst milkbreadsugarbuttercerealegg Basket – 1110010 Basket – 2111001 Basket – 3110100 Basket – n001001 Boolean Representation Association – Example Shopping Habits

17 Association – Example

18  Strategy 1: Place milk and bread within close proximity may further encourage customers to purchase these items together within single visits to store! How Data Mining (DM) Improves Business?

19  Strategy 2: Place milk and bread at opposite ends of store may entice customers who purchase such items to pick up other items along way! How Data Mining (DM) Improves Business?

20  Strategy 3: Put these two items into package at reduced price!!!

21 Classification  Classify phenomenon in a predefined class. Place milk and bread within close proximity may further encourage customers to purchase these items together within single visits to store! Classifier is an algorithm that carries out classification Classifier is typically presented as decision trees. In these trees nodes are labeled by conditions that allow decision making. Examples:  Motor Insurance  Health Insurance

22 Classification – Example

23

24  Discover patterns between events such that presence of one set of items/objects in database of events over period of time.  Detection of sequential patterns is equivalent to detecting association among events with certain temporal relationships (time dimension).  Examples  Understand and Analyse long term Customer buying Behaviour  Medical Diagnosis Sequential Patterns

25  Discover links between two sets of data which are time dependent, and is based on degree of similarity between patterns that both time series demonstrate.  Similarities can be detected within positions of time series  Examples  Stock Market Movement (Compare Market performance of Oct 2001 with Oct 2007)  New home owners’ buying patterns within two months of purchase  Products selling patterns in different seasons. Patterns with Time Series

26  Categorisation is process of partitioning given collection of events or items into a set of segments/clusters which share some common properties.  Segments/Clusters may be predefined, or may be determined during process of categorisation Categorisation & Segmentation

27  Examples  Classification of customer profile: by frequency of visits, types of financing used, amount of purchase, etc.  Demographic information: age, income group, place of residence, buying habits, etc.  Planning store promotions and advertisements, planning seasonal marketing strategies, planning additional stores. Categorisation & Segmentation

28 Other Data Mining Approaches

29 Typical Application Area of DM  Finance and Banking  Retails and Sales  Credit Card Operations  Medical Diagnoses and Healthcare  Insurance  Others

30 Integrated DM Environment  To maximise its potential and performance, Data Mining tools must be fully integrated with Data Warehouse environment as well as flexible interactive business analysis tools. OLAP (On Line Analytical Processing) enables more sophisticated end user business model to be applied when navigating Data Warehouse Data Mining Server can be integrated with Data Warehouse and OLAP server. This integration enables operational decisions to be directly implemented and tracked. As warehouse expands with new decisions and results, organisation can continually mine best practices and apply them to future decisions.

31 Integrated DM Environment

32 Thank you!!! Questions are WELCOME Academic Year 2014 Spring Academic Year 2014 Spring


Download ppt "Academic Year 2014 Spring Academic Year 2014 Spring."

Similar presentations


Ads by Google