Data Mining Lecture 1.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Introduction to Machine Learning Alejandro Ceccatto Instituto de Física Rosario CONICET-UNR.
CS583 – Data Mining and Text Mining
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data warehouse example
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
/faculteit technologie management Introduction to Data Mining a.j.m.m. (ton) weijters (slides are partially based on an introduction of Gregory Piatetsky-Shapiro)
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Lesson Outline Introduction: Data Flood
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Data Mining Knowledge Discovery: An Introduction
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Data Mining: Introduction
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
CSCE 4143 Section 001: Data Mining Spring 2019.
Presentation transcript:

Data Mining Lecture 1

Instructor Info Name: Ertan Karakurt Contact : ertankarakurt@akillisistemler.com.tr 10+ years experience on Data Mining and Intelligent Applications Development General Purpose Data Mart Development for Financial Modeling Behavioral Clustering of Retail Customers in Banking Sector Propensity Modeling for Cross Selling Attrition/Retention Modeling Modeling Algorithms Library Development for Defense ...

Instructor Info Ertan Karakurt founder of İzmir based Akıllı Sistemler fuzzy/exact searching/matching engine for Databases: search space analyzing, learning algorithm space analyzing, learning parallelization architecture

Course Objective stimulate university and industry cooperation create an opportunity to work with real life applications and problems in Data Mining case studies on data dictionaries case studies on physically built data mining models adjusting/utilizing the balance point between theory and application in Data Mining

Course Syllabus Course topics: Introduction (Week1-Week2) What is Data Mining? Data Collection and Data Management Fundamentals The Essentials of Learning The Emerging Needs for Different Data Analysis Perspectives Data Management and Data Collection Techniques for Data Mining Applications (Week3-Week4) Data Warehouses: Gathering Raw Data from Relational Databases and transforming into Information. Information Extraction and Data Processing Techniques Data Marts: The need for building highly specialized data storages for data mining applications

Course Syllabus Case Study 1: Working and experiencing on the properties of The Retail Banking Data Mart (Week 4 –Assignment1) Data Analysis Techniques (Week 5) Statistical Background Trends/ Outliers/Normalizations Principal Component Analysis Discretization Techniques Case Study 2: Working and experiencing on the properties of discretization infrastructure of The Retail Banking Data Mart (Week 5 –Assignment 2) Lecture Talk: In-class discussion

Course Syllabus Clustering Techniques (Week 6) K-Means Clustering Condorcet Clustering Other Clustering Techniques Case Study 3: Working and experiencing on the properties of the clustering infrastructure for The Retail Banking (Week 6 – Assignment3) Lecture Talk: In-class Discussion

Course Syllabus Classification Techniques (Week 7- Week 8- Week 9) Inductive Learning Decision Tree Learning Association Rules Regression Probabilistic Reasoning Bayesian Learning Case Study 4: Working and experiencing on the properties of the classification infrastructure of Propensity Score Card System for The Retail Banking (Assignment 4) Week 9

Course Syllabus Prediction Techniques (Week 10- Week 11) Neural Networks Radial Basis Networks Reinforcement Learning Case Study 5: Working and experiencing on the properties of the prediction infrastructure of Propensity Score Card System for The Retail Banking (Assignment 5) (Week 11) Other Classification and Prediction Techniques (Week 12- Week 13) Text Mining and Web Mining Explanation Based Learning Rule Based Learning Genetic Algorithms Recurrent Networks Case Study 6: Working and experiencing on the properties of Genetic Algorithms infrastructure for Neural Network Topology Estimation (Assignment 6) (Week 13)

Course Syllabus Assesment: One midterm examination (%35) One final examination (%55) In-class reviewed Case Studies Based Assignments (%10) There will be six assignments for each reviewed case studies. The assignments encouraged to be done by groups of two or three people

Course Syllabus Text Book: Supplementary Books: Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann, 2006. Supplementary Books: Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001 P.-N.Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2006. ISBN: 0-321-32136-7 Tom M. Mitchell, Machine Learning, McGraw-Hill, 1997. C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2007 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Inter-science, 2001.

Week1- What Is Data Mining? "Drowning in Data yet Starving for Knowledge" ??? "Computers have promised us a fountain of wisdom but delivered a flood of data" William J. Frawley, Gregory Piatetsky-Shapiro, and Christopher J. Matheus

Week1-What Is Data Mining? Data flood Information society produces vast amounts of data Data are generated by: Bank, telecom, other business transactions ... Scientific data: astronomy, biology, etc Web, text, image, and e-commerce

Week1-What Is Data Mining? AT&T handles billions of calls per day As of 2003, according to Winter Corp. Survey, AT&T has a 26 TB decision-support database. Web 1998: 26 million pages 2003: Google searches 4+ billion pages, many hundreds TB 2005: Google searches 8+ billion pages 2008: 1+ trillion (1,000,000,000,000) pages.

Week1-What Is Data Mining? UC Berkeley 2003 estimate: 5 exabytes (5 million terabytes) of new data was created in 2002. Twice as much information was created in 2002 as in1999 (growth rate: about 30% a year) Other growth rate estimates are even higher Very few data will ever be looked at by a human Tools are needed to make sense and use of data

Week1-What Is Data Mining? raw atomic Information: processed re-organized grouped Knowledge patterns, models, findings ‘behind’ Information Wisdom perfect orchestration of Knowledge Data (Operation) Information (Analytic) Data Knowledge Wisdom “Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?” T. S. Eliot

Week1-What Is Data Mining? Hypothesis: current data bases contain a lot of potentially important knowledge that can be used for wise-decisionining Mission of DM: find it !!!

Week1-What Is Data Mining? Data Mining (Alternative Name: Knowledge Discovery in Databases KDD) definitions: mining knowledge from data process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) knowledge or patterns from data in large databases. discover knowledge that characterizes general properties of data discover patterns on the previous and current data in order to make predictions on future data

Week1-What Is Not Data Mining? "Torturing data until it confesses ... and if you torture it enough, it will confess to anything" Jeff Jonas, IBM "An Unethical Econometric practice of massaging and manipulating the data to obtain the desired results" W.S. Brown “Introducing Econometrics” "A buzz word for what used to be known as DBMS reports" An Anonymous Data Mining Skeptic

Week1-What Is Data Mining?

Week1-What Is Data Mining? Data Mining -an interdisciplinary field Databases Statistics High Performance Computing Machine Learning Visualization Mathematics

Week1-What Is Data Mining? Data Mining -an interdisciplinary field Large Data sets in Data Mining Efficiency of Algorithms is important Scalability of Algorithms is important Real World Data Lots of Missing Values Pre-existing data - not synthetic Data not static - prone to updates Domain Knowledge in the form of integrity constraints available. Exploratory data analysis

Week1-Data Mining Application Examples Credit Assessment Stock Market Prediction Fault Diagnosis in Production Systems Medical Discovery Fraud Detection Hazard Forecasting Buying Trends Analysis Organizational Restructuring Target Mailing ---

Week1-Data Mining Application Examples Credit Assessment Stock Market Prediction Fault Diagnosis in Production Systems Medical Discovery Fraud Detection Hazard Forecasting Buying Trends Analysis Organizational Restructuring Target Mailing ---

Week1-Data Mining Application Examples Can I develop a general characterization/profile of different investor types? (characterization) What characteristics distinguish between Online and Broker investors? (classification) Can I develop a model which will predict the average trades/month for a new investor? (regression)

Week1-Data Mining Application Examples the natural question is to predict the Diagnosis from the symptoms (Medical Diagnosis Prediction)

Week1-Data Mining Application Examples Assessing Credit Risk Situation: Person applies for a loan Task: Should a bank approve the loan? Need to predict the credit risk of the person people with bad credit are not likely to repay.

Week1-Data Mining Application Examples A person buys a book (product) at amazon.com. Task: Recommend other books (products) this person is likely to buy Amazon does clustering based on books bought: customers who bought “Advances in Knowledge Discovery and Data Mining”, also bought “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” Recommendation program is quite successful

Week 1-End read Course Text Book Chapter 1