Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.

Similar presentations


Presentation on theme: "Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses."— Presentation transcript:

1 Introduction

2  Instructor: Cengiz Örencik  E-mail: cengizorencik@beykent.edu.tr  Course materials:  myweb.sabanciuniv.edu/cengizo/courses

3  Reference Books ◦ Veri Madenciliği: Kavram ve Algoritmaları, Doç. Dr. Gökhan Silahtaroğlu, 2013 ◦ Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kamber, 2010

4  1 midterm%30  2 inclass quiz %20  1 final %50  HW ?

5  Fundamental data mining tools / concepts  Classification, clustering, associations and correlations algorithms  Real life examples and implementations

6  Data preprocess  Data Warehouses ◦ Data from different sources/different structure  unified schema, reside at a single site ◦ Periodic data  summary  Associations and correlations ◦ Market basket analysis, etc.  Classification and prediction ◦ E.g. is he trustable for credit application?

7  Cluster Analysis ◦ People with similar spending patterns  Text and WEB mining  Privacy preserving data mining ◦ Protect personal information

8  “Necessity is the mother of invention” Plato

9  Continuously petabytes of new data is produced ◦ 90% of world's data generated over last two years ◦ Twitter, facebook, online shopping, mobese cams etc.  Easy to access and store data  e.g. customer voice records  Web Crawler  e.g. twits that contain “election” and “party” terms  Hard part is getting knowledge from the data

10  Data mining is extracting non-trivial (previously unknown) and valid knowledge from large amounts of data that can be used in decision making  Non-trivial ◦ Huge cost to get predictable info ◦ Not to prove sth you already know  Diaper – beer correlation  Large data ◦ Validity  Decision making

11 DatabasesData Mining  Query ◦ Suitable  SQL – relational DB  Data ◦ Dynamic  Output ◦ known ◦ Subset of data  Query ◦ Not suitable ◦ No common language  Data ◦ Static  Output ◦ Not known ◦ Not subset of data

12  Database queries ◦ List of the people that has a boat at Kalamış marine and has the name “Ahmet” ◦ Credit card owners under 30 that has >5000 TL/m spending  Data Mining Queries ◦ Credit application with low risk (classification) ◦ Card owners with similar buying patterns (clustering) ◦ Products purchased together with PS4 games (association rules)

13 Databases Data Warehouse Data Mining patterns Knowledge CleaningSelection transformation Evaluation Presentation

14 14 Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems

15  Market analysis ◦ Target audience, customer relations  Risk analysis ◦ Resource management, check competitive enterprise  Fraud detection ◦ Insurance, banking ◦ Modeling using history data  Document similarity ◦ plagiarism

16  Want to fit data into a model  Predictive mining ◦ Classify people that may not pay mortgage payments ◦ Predict people that leave your company for another ◦ Predict exchange market (borsa)  Descriptive mining ◦ Shows hidden information ◦ Shows your best customers ◦ Which products sell together ◦ Which customers have similar shopping trends

17  Classification [Predictive]  Clustering [Descriptive]  Association Rules [Descriptive]


Download ppt "Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses."

Similar presentations


Ads by Google