Presentation is loading. Please wait.

Presentation is loading. Please wait.

CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180. Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary.

Similar presentations


Presentation on theme: "CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180. Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary."— Presentation transcript:

1 CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180

2 Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary

3 CRISP-DM A guide or blueprint as to how to conduct a data mining project Breaks down life cycle of a data mining project into 6 phases Developed to give a standardized approach towards data mining projects Intended for better, faster results from data mining

4 Why a Standard Process? There was a clear need for data mining, but no sense of direction as to how organizations launch their own data mining projects Before data mining was very scattered Used to encourage good habits and best practices Makes it reliable and repeatable with people who have little data mining experience Monitoring and maintenance is easier

5 CRISP-DM Creation Created by 4 data mining veterans DaimlerChrysler ISL NCR OHRA SIG group created to develop a standard data mining process Association of data mining enthusiasts got together, large input from wide range of people Data miners, data warehousing vendors, management consultants Started to refine and improve model, had live trials for data mining projects

6 CRISP-DM: Process Flow Data Mining Methodology For all businesses Complete Outline Life Cycle: 6 Phases

7 CRISP-DM: 6 Phases Business Understanding Understanding the business objectives, business goals, how can data mining help in this regard Data understanding Start with a data set, increase familiarity, get some insight and identify any data quality issues Data preparation All activities included to make the final data set that will be used in the different modeling techniques Modeling Choose a modeling technique, create the model design and test it Evaluation Thoroughly evaluate the model and the results to see if it meets the business objectives, process review Deployment Can be using the model to create a dashboard or report, or putting the data mining process across the entire organization

8 Phases and Tasks Business Understanding Data Understanding Determine Business Objectives Assess the Situation Determine the Data Mining Goals Produce a Project Plan Collect the Initial Data Describe the Data Explore the Data Verify Data Quality Select Data Clean Data Construct Data Integrate Data Plan Deployment Plan Monitoring Maintenance Produce Final Report Data Preparation Modeling Evaluation Deployme nt Format Data Select the Modeling Technique Generate Test Design Build the Model Assess the Model Evaluate Results Review Process Determine Next Steps

9 Phase 1: Business Understanding Summary: Focuses on project objectives, requirements from a business perspective. Then converting that knowledge into problems or thoughts that can be solved with data mining. Rough outline of what to do to achieve the objectives.

10 Phase 1: Business Understanding Determine the business objectives: Have a deep understanding of what the client wants, from a business perspective, what they REALLY want accomplished Understand any business related questions associated with it Assess the situation: A more detailed understanding of what resources you need as well as any constraints, potential obstacles and assumptions you might need to make More specific details are found here Determine the data mining goals: Determine the data mining objectives that need to be completed in order to achieve this business goal EX. Business goal: Increase our overall restaurant sales in the northeast and southeast regions of the US Data mining goal: predict how well people from those specific regions embrace our flavor of food given data from several franchises in the past 3 years, demographic information, price of item, and other intangible factors such as culture, brand recognition, and reputation

11 Phase 1: Business Understanding Produce a project plan Project the goals that data miners want to achieve in order to get closer to achieving the business goals. What do data miners have to achieve in order to achieve those business goals EX. Business goal: To reduce churn rate for our internet provider company Data mining goals: Identify the characteristics of high value customers based on the most recent 5 years of data Identify which customers left after 1 year of service Build a mathematical model (logistic regression) to determine which customer is most likely to leave within 3 years of service

12 Phase 2: Data Understanding Summary: It starts with some data already collected and proceeds with activities in order to get more familiar with the data set. Identify data quality problems Discover data insight Detecting subsets Extracting hidden information.

13 Phase 2: Data Understanding Collect the initial data Acquire the necessary data to complete data mining goals and the entire project Loading data, and possibly integrating data if you are taking data from multiple data sources Describe the data Examine the properties of the acquired data, do you have everything you need? EX. Data formatting, quantity of data, number of records, fields within each table, datatype within each field Explore the data You start to tackle data mining questions, you start using querying, visualization and reporting Aggregations, relationships between data, subsets of data

14 Phase 2: Data Understanding Verify data quality examine the quality of data, is everything you need there? Are there any missing gaps? Does the data make any sense? The spelling? Any ambiguity?

15 Phase 3: Data Preparation Summary: 50% to 70% of the time will be spent on this phase. All the activities used to construct the final dataset from the original raw data. A lot of steps will be taken to prepare the data. Selecting certain tables, records, attributes, doing some conversions and transformations, data cleaning

16 Phase 3: Data Preparation Select data Decide on the data to be used for analysis Defines which attributes and which records and tables are selected Data types and data volume that you want Relevance to data mining goals Clean data Make sure data quality is at a high level Removing corrupt, inaccurate, or duplicate data from table, record or database Construct data This is where you start preparing the final data set Create derived attributes, new records, transform and format data (date for example) Integrate data This is where you combine information from multiple tables into one and create new records or values Maybe join multiple data source Mathematical calculations on the data, and group them a certain way

17 Phase 3: Data Preparation Format data This is extra formatting required in order for the data set to be accepted into the modeling tool The design of the data, illegal characters

18 Phase 4: Modeling Summary: Time to select a modeling technique for the data set you finalized based on the data mining goals and objectives. You will have to set the parameter settings to optimize results and then compare results if you used several modeling techniques.

19 Phase 4: Modeling Select the modeling technique Time to select the actual modeling technique you will use on your data set Examples are: decision trees, sequential patterns, linear/logistic regression, clustering, categorical analysis, segmentation Generate test design Make sure you have a way to test the model’s quality and validity Have a training data set that you built your model off of and then test that on a test data set to see its accuracy EX. For categorical analysis, run the model on a test data set and compare those results to the real results. Did it categorize everything correctly? What was the error rate? Build the model Time to run the model you built on the data set and see the results

20 Phase 4: Modeling Assess the model judge the success of the data mining model based on the results, data mining success criteria, desired test design Make sure to contact business analysts and domain experts to discuss the results in a business context, see if it makes sense Consider if it is a good model that can be given to others in the organization

21 Phase 5: Evaluation Summary: Thoroughly evaluate the model. Review the steps that were executed to construct the model to make sure it properly aligns with the business objectives. Make sure all important business issues have been considered. At the end, you should decide whether you want to keep this data mining model and the results or not.

22 Phase 5: Evaluation Evaluate results Assess if the model and results meet business requirements Is there any reason at all that this data mining model is deficient? Did it give you everything you want? Test the model multiple times in the real world Document any challenges, useful tips, information and hints for future reference Review process Did we correctly build the model? Is there any important factor or task that we left out or overlook? Determine next steps Decide where to proceed next: move to deployment, run the model a few more times with new data sets, or set up new data mining projects Includes analysis of remaining resources and budget to determine next steps

23 Phase 6: Deployment Summary: In this phase, you are going to determine how the results will be used. Who will use them, how often? The model and the knowledge gained will need to be given in a way so clients will understand it and other people can run the model throughout the organization. It can be as simple as making a report or implementing a repeatable data mining process across the enterprise.

24 Phase 6: Deployment Plan deployment Takes the results and develops a strategy on how the data results will be sent throughout the organization Plan monitoring and maintenance Need to teach people how to independently operate and maintain the data mining model if it becomes part of the day to day business Teach people how to correctly use the data mining results Produce final report Project leader and team write up a final report Can be a summary of project and experiences Can be a comprehensive presentation of the data mining results

25 Summary CRISP-DM A way to design a data mining model that is reliable and repeatable by people with little data mining skills Provides a uniform framework Flexible to account for differences in data and business problems and objectives


Download ppt "CRISP-DM Tommy Wei Cory Hutchinson ISDS 4180. Overview What is CRISP-DM (CRoss Industry Standard Process for Data Mining) Blueprint Phases and Tasks Summary."

Similar presentations


Ads by Google