Welcome! Knowledge Discovery and Data Mining

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

CIS 430 Data Mining and Knowledge Discovery Dr. Iren Valova What’s it all about???
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
CS/CMPE 535 – Machine Learning Outline. CS Machine Learning (Wi ) - Asim LUMS2 Description A course on the fundamentals of machine.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining.
Knowledge Discovery Centre: CityU-SAS Partnership 1 Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
COMP 4332 / RMBI 4330 Big Data Mining (Spring 2015) Lei Chen Hong Kong University of Science and Technology
Xiaoying Sharon Gao Mengjie Zhang Computer Science Victoria University of Wellington Introduction to Artificial Intelligence COMP 307.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
CpSc 881: Machine Learning Introduction. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Machine Learning Lecture 1. Course Information Text book “Introduction to Machine Learning” by Ethem Alpaydin, MIT Press. Reference book “Data Mining.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
ML, DM education What’s cookin’ ? Maja Skrjanc, Tanja Urbancic, Peter Flach.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
MIS2502: Data Analytics Advanced Analytics - Introduction.
2016/2/4Course Introduction1 COMP 4332, RMBI 4330 Advanced Data Mining (Spring 2012) Qiang Yang Hong Kong University of Science and Technology
Data Mining Copyright KEYSOFT Solutions.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
How Text Mining Helps To Find Fradulent Buyers. Text analytics can help companies discover true market perceptions, but only if the analysis is done in.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
DATA MINING © Prentice Hall.
CIS 430 Data Mining and Knowledge Discovery Dr. Iren Valova
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Machine Learning & Data Science
MIS5101: Data Analytics Advanced Analytics - Introduction
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
כריית נתונים.
Supporting End-User Access
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
MIS2502: Data Analytics Introduction to Advanced Analytics
Dept. of Computer Science University of Liverpool
MIS2502: Data Analytics Introduction to Advanced Analytics and R
CSE591: Data Mining by H. Liu
Presentation transcript:

Welcome! Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology qyang@cs.ust.hk http://www.cs.ust.hk Lecture 1 2019/5/12 Course Introduction

Data Mining: An Example You are a marketing manager for a brokerage company Problem: Churn is too high (also known as Attrition) Turnover (after six month introductory period ends) is 40% Customers receive incentives (average cost: $160) when account is opened Giving new incentives to everyone who might leave is very expensive (as well as wasteful) Bringing back a customer after they leave is both difficult and costly 2019/5/12 Course Introduction 2

… A Solution One month before the end of the introductory period is over, predict which customers will leave If you want to keep a customer that is predicted to churn, offer them something based on their predicted value The ones that are not predicted to churn need no attention If you don’t want to keep the customer, do nothing How can you predict future behavior? Build models Test models 2019/5/12 Course Introduction 3

Convergence of Three Technologies 2019/5/12 Course Introduction 4

Why Now? 1. Increasing Computing Power Moore’s law doubles computing power every 18 months Powerful workstations became common Cost effective servers (SMPs) provide parallel processing to the mass market 2019/5/12 Course Introduction 5

2. Improved Data Collection Data Collection  Access  Navigation  Mining The more data the better (usually) 2019/5/12 Course Introduction 6

3. Improved Algorithms (AI + Data Base) Techniques have often been waiting for computing technology to catch up Statisticians already doing “manual data mining” Good machine learning = intelligent application of statistical processes A lot of data mining research focused on tweaking existing techniques to get small percentage gains 2019/5/12 Course Introduction 7

Definition: Predictive Model A “black box” that makes predictions about the future based on information from the past and present Large number of inputs usually available 2019/5/12 Course Introduction 8

How are Models Built and Used? View from 20,000 feet: 2019/5/12 Course Introduction 9

The Data Mining Process 2019/5/12 Course Introduction 10

What the Real World Looks Like 2019/5/12 Course Introduction 11

Predictive Models are… Decision Trees Nearest Neighbor Classification Neural Networks Rule Induction K-means Clustering 2019/5/12 Course Introduction 12

Data Mining is Not ... Data warehousing SQL / Ad Hoc Queries / Reporting Software Agents Online Analytical Processing (OLAP) Data Visualization 2019/5/12 Course Introduction 13

Common Uses of Data Mining Marketing: Direct mail marketing Web site personalization Fraud Detection Credit card fraud detection Science Bioinformatics Gene analysis Web & Text analysis Google 2019/5/12 Course Introduction 14

Course Description Data Mining and Knowledge Discovery Focus: Focus 1: Theoretical foundations in Pattern Recognition and Machine Learning Algorithms: Differences? where they apply? Focus 2: Broad survey of recent research Focus 3: Hands-on, apply algorithms to KDD data sets 2019/5/12 Course Introduction

Topic 1: Foundations Classification algorithms Clustering algorithms Association algorithms Sequential Data Mining Novel Applications Web Customer Relationship Management Biological Data 2019/5/12 Course Introduction

Topic 2: Hands On Apply learned algorithms to selected data sets Get familiar with existing software packages and libraries Final Project will involve working with some datasets 2019/5/12 Course Introduction

Prerequisites Statistics and Probability would help, but not necessary Pattern Recognition would help, Databases Knowledge of SQL and relational algebra But not necessary One programming language One of Java, C++, Perl, Matlab, etc. Will need to read Java Library 2019/5/12 Course Introduction

Grading Grade Distribution: Assignments (30%) Midterm Exam: 30% Paper Presentation and Presentation: 40% 2019/5/12 Course Introduction