Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome! Knowledge Discovery and Data Mining

Similar presentations


Presentation on theme: "Welcome! Knowledge Discovery and Data Mining"— Presentation transcript:

1 Welcome! Knowledge Discovery and Data Mining
Qiang Yang Hong Kong University of Science and Technology Lecture 1 2019/5/12 Course Introduction

2 Data Mining: An Example
You are a marketing manager for a brokerage company Problem: Churn is too high (also known as Attrition) Turnover (after six month introductory period ends) is 40% Customers receive incentives (average cost: $160) when account is opened Giving new incentives to everyone who might leave is very expensive (as well as wasteful) Bringing back a customer after they leave is both difficult and costly 2019/5/12 Course Introduction 2

3 … A Solution One month before the end of the introductory period is over, predict which customers will leave If you want to keep a customer that is predicted to churn, offer them something based on their predicted value The ones that are not predicted to churn need no attention If you don’t want to keep the customer, do nothing How can you predict future behavior? Build models Test models 2019/5/12 Course Introduction 3

4 Convergence of Three Technologies
2019/5/12 Course Introduction 4

5 Why Now? 1. Increasing Computing Power
Moore’s law doubles computing power every 18 months Powerful workstations became common Cost effective servers (SMPs) provide parallel processing to the mass market 2019/5/12 Course Introduction 5

6 2. Improved Data Collection
Data Collection  Access  Navigation  Mining The more data the better (usually) 2019/5/12 Course Introduction 6

7 3. Improved Algorithms (AI + Data Base)
Techniques have often been waiting for computing technology to catch up Statisticians already doing “manual data mining” Good machine learning = intelligent application of statistical processes A lot of data mining research focused on tweaking existing techniques to get small percentage gains 2019/5/12 Course Introduction 7

8 Definition: Predictive Model
A “black box” that makes predictions about the future based on information from the past and present Large number of inputs usually available 2019/5/12 Course Introduction 8

9 How are Models Built and Used?
View from 20,000 feet: 2019/5/12 Course Introduction 9

10 The Data Mining Process
2019/5/12 Course Introduction 10

11 What the Real World Looks Like
2019/5/12 Course Introduction 11

12 Predictive Models are…
Decision Trees Nearest Neighbor Classification Neural Networks Rule Induction K-means Clustering 2019/5/12 Course Introduction 12

13 Data Mining is Not ... Data warehousing
SQL / Ad Hoc Queries / Reporting Software Agents Online Analytical Processing (OLAP) Data Visualization 2019/5/12 Course Introduction 13

14 Common Uses of Data Mining
Marketing: Direct mail marketing Web site personalization Fraud Detection Credit card fraud detection Science Bioinformatics Gene analysis Web & Text analysis Google 2019/5/12 Course Introduction 14

15 Course Description Data Mining and Knowledge Discovery Focus:
Focus 1: Theoretical foundations in Pattern Recognition and Machine Learning Algorithms: Differences? where they apply? Focus 2: Broad survey of recent research Focus 3: Hands-on, apply algorithms to KDD data sets 2019/5/12 Course Introduction

16 Topic 1: Foundations Classification algorithms Clustering algorithms
Association algorithms Sequential Data Mining Novel Applications Web Customer Relationship Management Biological Data 2019/5/12 Course Introduction

17 Topic 2: Hands On Apply learned algorithms to selected data sets
Get familiar with existing software packages and libraries Final Project will involve working with some datasets 2019/5/12 Course Introduction

18 Prerequisites Statistics and Probability would help,
but not necessary Pattern Recognition would help, Databases Knowledge of SQL and relational algebra But not necessary One programming language One of Java, C++, Perl, Matlab, etc. Will need to read Java Library 2019/5/12 Course Introduction

19 Grading Grade Distribution: Assignments (30%) Midterm Exam: 30%
Paper Presentation and Presentation: 40% 2019/5/12 Course Introduction


Download ppt "Welcome! Knowledge Discovery and Data Mining"

Similar presentations


Ads by Google