Presentation is loading. Please wait.

Presentation is loading. Please wait.

Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.

Similar presentations


Presentation on theme: "Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006."— Presentation transcript:

1 Waqas Haider Bangyal

2 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006 “Data Mining: Introductory and Advanced Topics”, by Dunham, Margaret H, Prentice Hall, 2003

3 What Is Data Mining? Data mining is the principle of sorting through large amounts of data and picking out relevant information. n The extraction of knowledge from data is called data mining. n Data mining can also be defined as the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules. n The ultimate goal of data mining is to discover knowledge.

4 Data Rich, Information Poor

5 Motivation Lots of data is being collected and warehoused  Web data, e-commerce  purchases at department/grocery stores  Bank/Credit Card transactions Computers have become cheaper and more powerful Data collected and stored at enormous speeds (GB/hour) remote sensors on a satellite telescopes scanning the skies

6 Motivation Traditional techniques infeasible for raw data Human analysts may take weeks to discover useful information We are drowning in data, but starving for knowledge! Data mining may help scientists in classifying and segmenting data

7 Motivation To which class does this star belong? such an analysis can no longer be conducted manually huge amounts of data are automatically collected

8 Why is data mining important? Rapid computerization of businesses produce huge amount of data How to make best use of data? A growing realization: knowledge discovered from data can be used for competitive advantage.

9 Evolution of Database Technology 1960s: Data collection, database creation, IMS and network DBMS 1970s: Relational data model, relational DBMS implementation 1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.) 1990s—2000s: Data mining and data warehousing, multimedia databases, and Web databases

10 Evolution of Database Technology Evolutionary StepBusiness QuestionEnabling TechnologiesProduct Providers Data Collection (1960s) "What was my total revenue in the last five years?" Computers, tapes, disks IBM,static data delivery Data Access (1980s) "What were unit sales in New England last March?” Relational databases (RDBMS), Structured Query Language (SQL), ODBC Oracle, Sybase, Informix, IBM, Microsoft dynamic data delivery at record level Data Warehousing (1990) "What were unit sales in New England last March? Drill down to Boston." multidimensional databases, data warehouses Oracle,Pilot, dynamic data delivery at multiple levels Data Mining ( Emerging Today) "What’s likely to happen to Boston unit sales next month? Why?" Advanced algorithms, massive databases Pilot, Lockheed, IBM, SGI, numerous startups (nascent industry) Prospective, proactive information delivery

11

12 Data Warehouse example Data Warehouses: Data warehousing is defined as a process of centralized data management and retrieval. It is repository of information collected from multiple sources, stored under a unified schema and usually reside at a single site

13 The process Of Data Mining There are 3 main steps in the Data Mining process: Preparation: data is selected from the warehouse and “cleansed”. Processing: algorithms are used to process the data. This step uses modeling to make predictions. Analysis: output is evaluated.

14 Reasons for growing popularity Growing data volume- enormous amount of existing and appearing data that require processing. Limitations of Human Analysis- humans lacking objectiveness when analyzing. Low cost of Machine Learning- the data mining process has a lower cost than hiring highly trained professionals to analyze data.

15 Applications of Data Mining Data Mining is applied in the following areas: Prediction of the Stock Market: predicting the future trends. Bankruptcy prediction: prediction based on computer generated rules, using models Foreign Exchange Market: data Mining is used to identify trading rules. Fraud Detection: construction of algorithms and models that will help recognize a variety of fraud patterns.

16 Results of Data Mining Include: Forecasting what may happen in the future Classifying people or things into groups by recognizing patterns Clustering people or things into groups based on their attributes Associating what events are likely to occur together Sequencing what events are likely to lead to later events

17 Data Mining Functions Two types of model: Predictive models predict unknown values based on known data Descriptive models identify patterns in data Each type has several sub-categories, each of which has many algorithms. We won't have time to look at ALL of them in detail.

18 Data Mining Functions

19 Thanks


Download ppt "Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006."

Similar presentations


Ads by Google