Advanced Data Mining: Introduction

Slides:



Advertisements
Similar presentations
1 Data Mining Introductions What Is It? Cultures of Data Mining.
Advertisements

Overview of Data Mining and the KDD Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data warehouse example
Data Mining: Concepts and Techniques
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Data mining By Aung Oo.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Introduction Pertemuan 01 Matakuliah: M0614 / Data Mining & OLAP Tahun : Feb
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining Instructor –Elham Gholami Contact پاییز
Data Mining: Concepts and Techniques. Overview 1.Introduction 2.Data Preprocessing 3.Data Warehouse and OLAP Technology: An Introduction 4.Advanced Data.
1 Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Mining of Massive Datasets Edited based on Leskovec’s from
CENG 514. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
There is an inherent meaning in everything. “Signs for people who can see.”
Book web site:
1 1 Data Mining: Concepts and Techniques — Chapter 1 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser.
Why Data Mining? What Is Data Mining?
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
Data Mining.
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining.
Big DATA.
Presentation transcript:

Advanced Data Mining: Introduction

Material Covered Chapter 1 from Ullman’s book. Many slides are from the “Data Mining: Concepts and Techniques” book. 2

3 Why Data Mining? The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability Automated data collection tools, database systems, Web, computerized society –Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets

from “Data Mining: Concepts and Techniques” 4 Evolution of Sciences Before 1600, empirical science s, theoretical science –Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. 1950s-1990s, computational science –Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) –Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models now, data science –The flood of data from new scientific instruments and simulations –The ability to economically store and manage petabytes of data online –The Internet and computing Grid that makes all these archives universally accessible –Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge! Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002

5 What Is Data Mining? Data mining (knowledge discovery from data) –Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data –Data mining: a misnomer? Alternative names –Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, business intelligence, etc. Watch out: Is everything “data mining”? Negative examples: –Simple search and query processing –(Deductive) expert systems from “Data Mining: Concepts and Techniques”

6 Knowledge Discovery (KDD) Process This is a view from typical database systems and data warehousing communities Data mining plays an essential role in the knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation from “Data Mining: Concepts and Techniques”

7 Data Mining in Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems from “Data Mining: Concepts and Techniques”

Directions in modeling Pattern extraction  Model Discovery Statistical modeling –E.g., decide that the data comes from a Gaussian distribution, estimate μ,σ parameters. Machine learning –Train an algorithm, then apply to new data. Results of Complex Queries (computational approaches) –E.g., summarization of the importance of a webpage in the form of a “pagerank” value. –E.g., prominent feature extraction, such as frequent itemsets and similar items. 8

9 Multi-Dimensional View of Data Mining Knowledge to be mined (or: Data mining functions) –Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc. –Descriptive vs. predictive data mining –Multiple/integrated functions and mining at multiple levels Data to be mined –Database data (extended-relational, object-oriented, heterogeneous, legacy), data warehouse, transactional data, stream, spatiotemporal, time- series, sequence, text and web, multi-media, graphs & social and information networks Techniques utilized –Data-intensive, data warehouse (OLAP), machine learning, statistics, pattern recognition, visualization, high-performance, etc. Applications adapted –Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc. from “Data Mining: Concepts and Techniques”

Meaningfulness of patterns A big data-mining risk is that you will “discover” patterns that are meaningless. Bonferroni’s principle: (roughly) if you look in more places for interesting patterns than your amount of data will support, you are bound to find meaningless patterns 10

Rhine Paradox Joseph Rhine was a parapsychologist in the 1950’s who hypothesized that some people had Extra-Sensory Perception He devised an experiment where subjects were asked to guess 10 hidden cards – red or blue He discovered that almost 1 in 1000 had ESP – they were able to get all 10 right! He told these people they had ESP and called them in for another test of the same type Alas, he discovered that almost all of them had lost their ESP What did he conclude? He concluded that you shouldn’t tell people they have ESP; it causes them to lose it! 11

12 Major Challenges in Data Mining Efficiency and scalability of data mining algorithms Parallel, distributed, stream, and incremental mining methods Handling high-dimensionality Handling noise, uncertainty, and incompleteness of data Incorporation of constraints, expert knowledge, and background knowledge in data mining Pattern evaluation and knowledge integration Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering, information networks Application-oriented and domain-specific data mining Invisible data mining (embedded in other functional modules) Protection of security, integrity, and privacy in data mining from “Data Mining: Concepts and Techniques”

Kdnuggets polls

Kdnuggets polls

Kdnuggets polls

Things Useful to Know Probability Linear Algebra basics Hash functions Indices Secondary storage Power laws 16

Big Data Sizes: Tiny  0s Small  1000s fitting in memory Medium  (may not fit in memory) Large  Huge  From Graefe’s “New algorithms for join and grouping operations”,