1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.

Slides:



Advertisements
Similar presentations
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Advertisements

SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Data Mining Techniques
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Data Mining Chun-Hung Chou
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Inductive learning Simplest form: learn a function from examples
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Data Mining Functionalities / Data Mining Tasks Concepts/Class Description Concepts/Class Description Association Association Classification Classification.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Data Mining and Decision Support
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining Tarek Soukieh 11/18/2010. Agenda 1.The Evolution of Database Technology 2.Introduction 3.Data Preprocessing 4.OLAP vs. Data Mining 5.Data.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining Functionalities
Data Mining.
By Arijit Chatterjee Dr
DATA MINING © Prentice Hall.
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
Data Analysis.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
©Jiawei Han and Micheline Kamber
Data Mining: Concepts and Techniques
Presentation transcript:

1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007

2 Outline Why data mining? Data mining applications Data mining functionalities Concept description Association analysis Outlier Analysis Evolution Analysis Classification Clustering

3 Why data mining? Motivation: Wide availability of huge amounts of data Need for turning data into useful info & knowledge Data mining: Extracting or “mining” knowledge from large amounts of data Knowledge : useful patterns Semiautomatic process Focus on automatic aspects

4 Data mining applications Prediction. Examples: Credit risk Customer switching to competitors Fraudulent phone calling card usage Associations. Examples: Related books for buy Related accessories for suggest: e.g. camera Causation discovery: e.g. medicine Clusters. Example: Clusters of disease

5 Data mining functionalities Concept description Characterization & discrimination Association analysis Outlier Analysis Evolution Analysis Classification and Prediction Clustering

6 Concept description Description of concepts summarized, concise & precise Ways: Data characterization Summarizing the data of the target class in general terms Data discrimination Comparison of the target class with the contrasting class(es) Examples of Output forms: Pie charts, bar charts, curves & multidimensional tables

7 Association analysis Mining frequent patterns For discovery of interesting associations within data Kinds of frequent patterns: Frequent itemset Set of items frequently appear together. E.g. milk and bread Frequent subsequence E.g. pattern of customers’ purchase: First a PC, then a digital camera & then a memory card Frequent substructure Structural forms such as graphs, trees, or lattices Support and confidence

8 Outlier Analysis Outliers: data objects disobeying the general behavior of data Approaches to outliers Discard as noise or exceptions Keep for applications such as fraud detection Example: detecting fraudulent usage of credit cards Ways: Using statistical tests Using distance measures Using deviation-based methods

9 Evolution Analysis Description and modeling of trends For objects with changing behavior over time Ways: Applying other data mining tasks on time related data Association analysis, classification, prediction, clustering & … Distinct ways time-series data analysis sequence or periodicity pattern matching similarity-based data analysis Example: stock market: predict future trends in prices

10 Classification and Prediction Classification: Process of finding a model that distinguishes data classes Purpose: using the model to predict the class of new objects Deriving model: Based on the analysis of a set of training data data objects with known class labels Example: In a credit card company Classification of customers based on their payment history Prediction of a new customer’s credit worthiness

11 Classification A two-step process for classification: First: Learning or training step Building the classifier by analyzing or learning from training data Second: classifying step Using classifier for classification Accuracy of a classifier (on a given test set) Percentage of test set tuples correctly classified by classifier Classification methods: Decision tree, Naïve Bayesian classification, Neural network, k-nearest neighbor classification, …

12 Decision tree Decision tree induction : Learning of decision trees from class-labeled training tuples Decision tree: A flowchart-like tree structure Internal nodes: tests on attributes Branches: outcomes of the test Leaves: class labels Usage in classification: Prediction by tracing a path from the root to a leaf node Testing attribute values of new tuple against decision tree Easily converting Decision tree to classification rules

13 Decision tree example: Does a customer buys a computer?

14 Bayesian Classification Bayesian classification Predicting the probability that a new tuple belongs to a particular class High accuracy and speed in large databases Based on Bayes’ theorem Conditional probability Naïve Bayesian classifier Assumption: class conditional independence Good for Simplifying computations

15 Clustering The process of grouping a set of physical or abstract objects into classes of similar objects Generating class labels for objects currently without label Clustering based on this principle: Maximizing the intraclass similarity and Minimizing the interclass similarity Clustering also for facilitating taxonomy formation Hierarchical organization of observations

16 An example: clustering customers in a restaurant Summarization Clustering Preprocessing Restaurant database Object View for Clustering Young at midnight A Set of Similar Object Clusters White Collar for Dinner Retired for Lunch

17 Steps of database Clustering 1. Define object-view 2. Select relevant attributes 3. Generate suitable input format for the clustering tool 4. Define similarity measure 5. Select parameter settings for the chosen clustering algorithm 6. Run clustering algorithm 7. Characterize the computed clusters

18 Challenge: database clustering Data collections are in many different formats Flat files Relational databases Object-oriented database Flat file format: The simplest and most frequently used format in the traditional data analysis area Databases are more complex than flat files

19 Challenge: database clustering (cont.) Challenge: Changing clustering algorithms to become more directly applicable to real-world databases Issues related to databases: Different types of objects in DB Relationships between objects: 1:1, 1:n & n:m Complexity in definition of object similarity Due to the presence of bags of values for an object Difficulty in selection of an appropriate similarity measure Due to the presence of different types for attributes of objects

20 Refferences Han, J., Kamber, M., Data Mining: Concepts and Techniques, Second Edition, Elsevier Inc., 2006, 770 p., ISBN Silberschatz, A., Korth, F., Sudarshan, S., Database System Concepts, Fifth Edition, McGraw-Hill, 2005, ISBN Ryu, T., Eick, C., A Database Clustering Methodology and Tool, in Information Sciences 171(1-3): (2005).