Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL.

Similar presentations


Presentation on theme: "Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL."— Presentation transcript:

1 Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL

2 Outline ● Data Mining Concept ● Brief History, Basic Understanding, Relationships, Capabilities ● Data Mining and OLAP ● OLAP Cubes, Visualization, Simple Processes ● Data Mining Process ● The can and cannots, problem definition, preparation, and deployment. ● Data Mining with SAP ● Overview of SAP, models, ABC Classification, and Decision Trees. ● Open Source Data Mining Tools ● Various tool examples, models, data mining misconceptions, and your life, their data. ● Data Mining with SQL ● Features involved with SQL, querying, model testing, etc.

3 What is Data Mining? ● Exploration and Analysis of massive amounts of data ● Summarizes large data into useful information ● Motivated to find useful patterns for company use ● Establish Relationships and locate Trends ● Knowledge Discovery in Data (KDD)

4 Brief History ● Began when business data started to be stored on computers. ● Rapidly developed simultaneously with advancements in computer technology. ● 1960s: Collecting/Storage of data on computers, tapes, and disks ● 1980s : Intro. of relational databases using SQL. ● 1990s: Data warehousing is introduced. ● 1990s:” Data mining” term is introduced. ● Present Day: Continues to be driven by business wanting useful data.

5 Basic Understanding

6 Basic Data Mining Process

7 Data Mining Relationships ● Classes: stored data is used to locate data in predetermined groups. ● Restaurants could track customer data to find when customers visit and what they usually order. ● Clusters: the data items are grouped according to logical relationships or consumer preferences. ● Data can be mined to identify market segments or consumer affinities ● Associations: the data mined can link certain processes or habits together. ● Grocery chain found that men buy diapers on Thursdays/Saturdays, which they also tended to buy beer for the upcoming weekend. ● Sequential Patterns: anticipating the behavior patterns/trends. ● An out outdoor supplier could predict that if sleeping bags and hiking shoes are purchased then a backpack is likely to be also in the same group of items.

8 Business Perspective of Data Mining ● Strong Consumer Focus ● Retail, Financial, Communication, and Marketing Organizations ● Companies look to indications internal and external factors ● Internal: ● Price ● Product Positioning ● Staff skills ● External ● Economic indicators ● Competition ● Customer Demographics -Sales -Customer satisfaction -Corporate Profits

9 Data Mining and OLAP ● On-Line Analytical Processing ● Fast analysis of shared multidimensional data ● Supports data summarization, cost allocation, time series analysis, and what if analysis ● Complementary Activities ● OLAP provides multidimensional view of data, which data mining usually can not. ● Work together in tandem ● Data mining can select dimensions for a cube, create new values for the dimension, or create new values for a cube. ● OLAP can analyze data mining results at various levels of scales

10 Data Mining and OLAP Cubes

11

12 Data Mining Visualization

13 What data mining can and can’t do Can: ● Find patterns and relationships in your data ● Can discover hidden information in your data Can’t: ● Does not eliminate the need to know your business or your data ● Can not tell you the value of information of your organization https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEFGIE

14 Data mining Process https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEF GIE

15 Problem Definition 1.Focuses on understanding the project and requirements. 2.Understanding the project objectives and requirements and converting this knowledge into a data mining problem. 3.Developing a preliminary implementation plan https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEFGIE

16 Data Gathering and preparation 1. Involves data collection and exploration 2. Determining how well the data addresses the problem 3. Identify data quality problems 4. Scan for patterns in the data https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEFGIE

17 Model Building and Evaluation 1. Select and apply various modeling techniques 2. Calibrate parameters to optimal values 3. Using algorithms that might require data transformation https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEFGIE

18 Knowledge Deployment 1. Using Data mining with a target environment 2. Insight and actionable can be derived from data 3. Integration of data mining models within applications https://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm#CHDEFGIE

19 Data Mining tools- SAP 1. SAP- Software, applications, And products in data processing 2. Fourth largest software company in the world 3. Business software package designed to intergrate all areas of business 4. Provides end to end solutions for financials, manufacturing, logistics, distribution 5. Shares common business information with everyone employee http://www.dvwsolutions.com/blog/entry/sap-data-mining.html

20 Models of data mining in SAP Clustering 1. Identifies clusters of data objects identified in Transactions. 2.A cluster is a collection of data objects that are similar to one another. 3. A Good clustering method produces high quality clusters to ensure the inter cluster similarity is low and the intra cluster similarity is high. http://www.dvwsolutions.com/blog/entry/sap-data-mining.html

21 ABC Classification This method involve Classifying your products into three categories to decide which one should be focused on. A= Oustanding Performance. B= Average Importance. C= Relatively unimportant. http://www.dvwsolutions.com/blog/entry/sap-data-mining.html

22 Decision Trees 1. Is the most popular Predictive modeling technique since it provides rules and logic techniques that enable intelligent decision making. 2. Following the rules of a decision tree gives you a clear example of how data Flows. 3.The best use of a decision tree is Classifying existing customers records into customer segments. that behave in a particular manner. http://www.dvwsolutions.com/blog/entry/sap-data-mining.html

23 Example of a Decision Tree http://www.dvwsolutions.com/blog/entry/sap-data-mining.html

24 Open Source Data Mining Tools 1. Orange - data mining software that utilizes the python language built for both novice and experts 2. Weka - a java based data mining software Weka allows the use of sql databases through java database connectivity 3. Rattle Gui - a data mining GUI that uses the R statistical programming language to manipulate and display data trends 4. Apache Mahout - a collection of machine learning algorithms that use the Apache Hadoop platform 5. RapidMiner - integrated environment for machine learning, data mining, text mining, predictive analytics, and application development. www.predictiveanalyticstoday.com/top-free-data-mining-software/

25 Data Mining Tools (Cont.) Orange Weka

26 Data Mining Misconceptions ● Data Mining has become a buzzword recently and because of this people have developed misconceptions of what Data Mining really is ● Data Mining is often referred to as the entire range of big data analytics, including collection, extraction, analysis and statistics ● This is too broad of a definition for Data Mining essentially what Data Mining does is find unknown patterns, unusual records and dependencies without a hypothesis on the analytical outcomes ● The most important objective of any data miner should be to find useful information that is easily understood from large data sets

27 Your Life, Their Data Companies are using data more and more as we become a connected society a few of these companies are using your data daily. ● Fitbit has started using their activity trackers as a measure of public health and selling their finding to local governments. ● Facebook has been mining user data since they launched their advertising strategy selling advertisement space with slogans like “Long term relationships with faceless customers” ● Almost every disclaimer or user agreement you agree to online has a data mining clause that companies use

28 Features of Data Mining with SQL ● Multiple data sources: You can use any tabular data source including spreadsheets and text files. ● Integrated data cleansing makes easy for modeling and also with retraining and updating. ● Multiple customizable algorithms: includes clustering, neural networks, decision trees, and even your own custom plug-in algorithms.

29 Features of Data Mining with SQL Cont. ● Model Testing Infrastructure: Test your data models using cross-validation, classification matrices, lift charts, and also scatter plots. ● Querying and drillthrough: SQL Server Data Mining provides the DMX language for integrating prediction queries into applications. You can also retrieve detailed statistics and patterns from the models, and then use case data.

30 Features of Data Mining with SQL Cont. ● Client tools: In addition to the development and design studios provided by SQL Server, you can use Add-ins for Excel to create, query, and browse models. Or, create custom clients, including Web services. ● Security and deployment: Provides role-based security through Analysis Services, including separate permissions for drillthrough to model and structure data. Easy deployment of models to other servers, so that users can access the patterns or perform predictions.

31 Video Explaining Data Mining https://youtu.be/R-sGvh6tI04


Download ppt "Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL."

Similar presentations


Ads by Google