Special Topics in Data Mining. Direct Objectives To learn data mining techniques To see their use in real-world/research applications To get an understanding.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Lecture Notes for Chapter 2 Introduction to Data Mining
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining Lecture 2: data.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Dr. Awad Khalil Computer Science Department AUC
Chapter 5: Data Mining for Business Intelligence
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Basics: Data Remark: Discusses “basics concerning data sets (first half of Chapter.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
What is Data? Attributes
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Academic Year 2014 Spring Academic Year 2014 Spring.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
KNOWLEDGE DISCOVERY & DATA MINING Abhishek M. Mehta ROLL NO:24.
Data Mining.
Data Mining – Intro.
Lecture Notes for Chapter 2 Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

Special Topics in Data Mining

Direct Objectives To learn data mining techniques To see their use in real-world/research applications To get an understanding of the methodological principles behind data mining To be able to read about data mining in the popular press with a critical eye To implement & use data mining models using DM software

Special Topics in Data Mining Grade Structure Review Paper & Presentation: 30% Final Project Implementation & Present.: 40% Final Project Paper: 30%

Special Topics in Data Mining Data Mining in Specific field for Review Paper Data Mining in Security Data Mining in Telecommunications and Control Text and Web Mining Data Mining in Biomedicine and Science Data Mining for Insurance Data Mining in Banking and Commercial Data Mining in Sales Marketing and Finance Data Mining in Business

What is Data Mining? Not well defined…. Since Data Mining is Confluence of Multiple Disciplines No one can agree on what data mining is! In fact the experts have very different descriptions: Different fields have different views of what data mining is (also different terminology!)

What is Data Mining? Since Data Mining is Confluence of Multiple Disciplines Data Mining Database Technology Statistics Other Disciplines Information Science Machine Learning Visualization

What is Data Mining? “finding interesting structure (patterns, statistical models, relationships) in data bases”. - Fayyad, Chaduriand “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.” - Fayyad

What is Data Mining? “a knowledge discovery process of extracting previously unknown, actionable information from very large data bases” – Zorne “a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions.”--- Edelstein

What is Data Mining? Data mining is the process of extracting hidden patterns from data. Data mining is the process of discovering new patterns from large data sets involving methods from statistics and artificial intelligence but also database management. “data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner” Hand, Mannila, Smyth

What is Data Mining? Knowledge Discovery in Databases (KDD) Data Mining, also popularly known as Knowledge Discovery in Databases (KDD)... The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The iterative process consists of the following steps: (From Zaiane) Data cleaning:... Data integration:... Data selection:... Data transformation:... Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. Pattern evaluation:... Knowledge representation:...

What is Data Mining? Knowledge Discovery in Databases (KDD) ….. Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. …..

What is Data Mining? Software Can use any software you like – must know how to input, manipulate, graph, and analyze data. SAS, Weka, SPSS, Systat, Enterprise Miner, JMP, Minitab, Matlab, SQL Server

What is Data Mining? Software Can use any software you like – must know how to input, manipulate, graph, and analyze data. SAS, Weka, SPSS, Systat, Enterprise Miner, JMP, Minitab, Matlab, SQL Server

Data Data Data It’s all about the data - where does it come from? – www – Gene – Business processes/transactions – Telecommunications and networking – Medical imagery – Government, demographics (data.gov!) – Sensor networks – sports

What is Data? Collection of objects and their attributes An attribute is a property or characteristic of an object – Examples: eye color of a person, temperature, etc. – Attribute is also known as variable, field, characteristic, or feature A collection of attributes describe an object – Object is also known as record, point, case, sample, entity, or instance Attribute values are numbers or symbols assigned to an attribute Attributes Objects

Record Data Data that consists of a collection of records, each of which consists of a fixed set of attributes

Document Data Each document becomes a `term' vector, – each term is a component (attribute) of the vector, – the value of each component is the number of times the corresponding term occurs in the document.

Transaction Data A special type of record data, where – each record (transaction) involves a set of items. – For example, consider a grocery store. The set of products purchased by a customer during one shopping trip constitute a transaction, while the individual products that were purchased are the items.

Transaction Data weblogs, phone calls… , -, 3/22/00, 10:35:11, W3SVC, SRVR1, , 781, 363, 875, 200, 0, GET, /top.html, -, , -, 3/22/00, 10:35:16, W3SVC, SRVR1, , 5288, 524, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 10:35:17, W3SVC, SRVR1, , 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 16:18:50, W3SVC, SRVR1, , 60, 425, 72, 304, 0, GET, /top.html, -, , -, 3/22/00, 16:18:58, W3SVC, SRVR1, , 8322, 527, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 16:18:59, W3SVC, SRVR1, , 0, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:54:37, W3SVC, SRVR1, , 140, 199, 875, 200, 0, GET, /top.html, -, , -, 3/22/00, 20:54:55, W3SVC, SRVR1, , 17766, 365, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:54:55, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:07, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 1061, 382, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:39, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:56:03, W3SVC, SRVR1, , 1081, 382, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:56:04, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:56:33, W3SVC, SRVR1, , 0, 262, 72, 304, 0, GET, /top.html, -, , -, 3/22/00, 20:56:52, W3SVC, SRVR1, , 19598, 382, 414, 200, 0, POST, /spt/main.html, -,

Graph Data Examples: Generic graph and HTML Links

Ordered Data Genomic sequence data

Time Series Data

Spatio-Temporal Data Average Monthly Temperature of land and ocean

, -, 3/22/00, 20:55:07, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 1061, 382, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 20:55:36, W3SVC, SRVR1, , 0, 258, 111, 404, 3, GET, /spt/images/bk1.jpg, -, , -, 3/22/00, 10:35:11, W3SVC, SRVR1, , 781, 363, 875, 200, 0, GET, /top.html, -, , -, 3/22/00, 10:35:16, W3SVC, SRVR1, , 5288, 524, 414, 200, 0, POST, /spt/main.html, -, , -, 3/22/00, 10:35:17, W3SVC, SRVR1, , 30, 280, 111, 404, 3, GET, /spt/images/bk1.jpg, -, …, Relational Data , Doe, John, 12 Main St, , Madison, NJ, ,Trank, Jill, 11 Elm St, , Chester, NJ, … 07911, Chester, NJ, 07954, 34000,, 40.65, , Madison, NJ, 56000, , … Most large data sets are stored in relational data sets Special data query language: SQL Oracle, MSFT, IBM Good open source versions: MySQL, PostGres

Data Quality What kinds of data quality problems? How can we detect problems with the data? What can we do about these problems? Examples of data quality problems: – Noise and outliers – missing values – duplicate data

Noise Noise refers to modification of original values – Examples: distortion of a person’s voice when talking on a poor phone and “snow” on television screen Two Sine WavesTwo Sine Waves + Noise

Outliers Outliers are data objects with characteristics that are considerably different than most of the other data objects in the data set

Missing Values Reasons for missing values – Information is not collected (e.g., people decline to give their age and weight) – Attributes may not be applicable to all cases (e.g., annual income is not applicable to children) Handling missing values – Eliminate Data Objects – Estimate Missing Values – Ignore the Missing Value During Analysis – Replace with all possible values (weighted by their probabilities)

Duplicate Data Data set may include data objects that are duplicates, or almost duplicates of one another – Major issue when merging data from heterogeous sources Examples: – Same person with multiple addresses Data cleaning – Process of dealing with duplicate data issues

Examples of Data Mining Successes Market Basket (WalMart) Recommender Systems (Amazon.com) Fraud Detection in Telecommunications (AT&T) Target Marketing / CRM Financial Markets DNA Microarray analysis Web Traffic / Blog analysis