Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Introduction Peixiang Zhao.

Slides:



Advertisements
Similar presentations
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advertisements

2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Introduction Pertemuan 01 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Overviews of ITCS 6161/8161: Advanced Topics on Database Systems Dr. Jianping Fan Department of Computer Science UNC-Charlotte
Overview of CS Class Jiawei Han Department of Computer Science
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
MIS2502: Data Analytics Advanced Analytics - Introduction.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Mining of Massive Datasets Edited based on Leskovec’s from
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Midterm Review Peixiang Zhao.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Final Review Peixiang Zhao.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Book web site:
1 1 Data Mining: Concepts and Techniques — Chapter 1 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
School of Computer Science & Engineering
Data Analytics for ICT.
Data Mining 101 with Scikit-Learn
Data Mining: Concepts and Techniques Course Outline
CS7280: Special Topics in Data Mining Information/Social Networks
Data Warehousing and Data Mining
Data Science introduction.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Objectives Data Mining Course
Data Mining: Introduction
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
CSCE 4143 Section 001: Data Mining Spring 2019.
Presentation transcript:

Tallahassee, Florida, 2016 CIS4930 Introduction to Data Mining Introduction Peixiang Zhao

Welcome to CIS4930 Course Website: – – Everything about the course can be found here Syllabus, announcements, policies, schedules, slides, assignments, resource… – Make sure you check the course website periodically Please read the class syllabus, policies, and lecture schedule; ask now if you have questions 1

Teaching Staff Instructor: Peixiang Zhao – Research interest Generally, data and information science including database systems and data mining Specifically, graph data, information network analysis, large-scale data-intensive computation and analytics – Brief history Illinois (Ph.D. from UIUC) Florida (Assistant professor at FSU starting from Aug. 2012) TA: – Yongjiang Liang – Office hours: Tuesday 10am – 11am 2

Prerequisite Must know how to program, and have data structure and algorithm background – COP3330: Object-oriented Programming – COP4530: Data structures and algorithms – Knowledge on probability theory, statistics, and linear algebra 3

Textbook Data Mining: Concepts and Techniques. 3 rd edition – Jiawei Han, Micheline Kamber, Jian Pei References – Introduction to Data Mining Introduction to Data Mining – Data Mining: The Textbook Data Mining: The Textbook – The Elements of Statistical Learning The Elements of Statistical Learning – Pattern recognition and Machine Learning Pattern recognition and Machine Learning4

Course Format Two 75-min lectures/week – Lecture slides are used to complement the lectures, not to substitute the textbook Four homework (40%) – Written assignments and machine problems Datasets or software might be provided – Individual work – Due right before the class starts in the due date – No late homework will be accepted One midterm (15%) and one final (40%) – Check dates and make sure no conflict! Quizzes (5%) 5

You Tell Me -- Why Are You Taking this Course? – – – Data mining tops LinkedIn’s list of the “hottest skills of 2014” – Data scientist: the sexiest job of 21 st century (Harvard Business Review) – Data scientist: 2015’s hottest profession (Mashable) 6

Why Data Mining? 7 Big Data However, we are drowning in data, but starving for knowledge! – There is often information “ hidden ” in the data that is not readily evident – Human analysts may take weeks to discover useful information – Much of the data is never analyzed at all

What is Data Mining Non-trivial extraction of implicit, previously unknown, and potentially useful information from data – a.k.a. KDD (knowledge discovery in databases) – Data to be mined Relational databases, data warehouses; Data streams and sensor data; Time-series data, temporal data, sequence data; Graphs, social networks and multi-linked data; Spatial data and spatiotemporal data; Multimedia data; Text data; WWW data – Knowledge to be obtained Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis 8

The Goal: Decision Support Typical procedure – Data  Knowledge  Action/Decision  Goal Examples – Netflix collects user ratings of movies  What types of movies you will like  Recommend new movies to you  Users stay with Netflix – Gene sequences of cancer patients  Which genes lead to cancer?  Appropriate treatment  Save life – Road traffic  Which road is likely to be congested?  Suggest better routes to drivers  Save time and energy 9

Example: Association Rule Mining Data – A set of transactions, each of which consists of a set of items Association rules – A set of rules that characterize associations between items 10 Market-Basket transactions Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Example: Classification Process – Construct models (functions) based on training data with known class labels – Describe and distinguish classes or concepts for future prediction – Predict testing data with unknown class labels Applications – Spam identification – Treatment prediction – Document categorization – …… 11

Ads Targeting 12 featuresclass labels a classifier: f(x)=y: features  class labels training testing

Fraud Detection 13 categorical continuous class Test Set Training Set Model Learn Classifier

Example: Clustering Goal – Finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups 14

Example: Outlier Detection Outliers (Anomalies) – Global: observations inconsistent with rest of the dataset – Local: Observations inconsistent with their neighborhoods A local instability or discontinuity Applications – Fraud/intrusion detection – Customized marketing – Weather prediction 15 One persons noise could be another person’s signal. - Edward Ng

Data Mining Tasks Prediction Methods: Use some variables to predict unknown or future values of other variables – Classification – Regression – Outlier detection Description Methods: Find human-interpretable patterns that describe the data – Clustering – Association rule mining 16

Data Mining: Confluence of Multiple Disciplines 17 Data Mining Machine Learning Statistics Applications Algorithm Pattern Recognition High-Performance Computing Visualization Database Technology

The Top 10 Data Mining Algorithms 1.C4.5: classification 2.K-Means: clustering 3.SVM: classification 4.Apriori: association analysis 5.EM: statistical learning 6.PageRank: link mining 7.AdaBoost: bagging and boosting 8.kNN: classification 9.Naive Bayes: classification 10.CART: classification 18

Questions Any questions? Please feel free to raise your hands. 19