CS 5310 Data Mining Hong Lin.

Slides:



Advertisements
Similar presentations
Data Mining: A Closer Look Chapter Data Mining Strategies.
Advertisements

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Week 9 Data Mining System (Knowledge Data Discovery)
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Data Mining – Intro.
Introduction to Machine Learning Approach Lecture 5.
Chapter 10: Architectural Design
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Enabling Organization-Decision Making
Issues with Data Mining
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Decision Support Systems Chapter 10.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Brief Intro to Machine Learning CS539
Big data classification using neural network
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Data Mining ICCM
SNS COLLEGE OF TECHNOLOGY
Who am I? Work in Probabilistic Machine Learning Like to teach 
What Is Cluster Analysis?
as presented on that date, with special formatting removed
CEE 6410 Water Resources Systems Analysis
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
Chapter 11: Learning Introduction
Data Mining 101 with Scikit-Learn
Introduction C.Eng 714 Spring 2010.
CH. 1: Introduction 1.1 What is Machine Learning Example:
Knowledge Representation
Introduction to Azure Machine Learning Studio
Machine Learning & Data Science
Basic Intro Tutorial on Machine Learning and Data Mining
Advanced Embodiment Design 26 March 2015
Data Warehousing and Data Mining
COSC 4335: Other Classification Techniques
Artificial Intelligence Lecture No. 28
Course Introduction CSC 576: Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning for Space Systems: Are We Ready?
Machine Learning in Business John C. Hull
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

CS 5310 Data Mining Hong Lin

Chapter 1 - Introducing Machine Learning AI – wars between machines and their makers? AI algorithms are still application specific Fundamental concepts about machine learning The origins and practical applications of ML How computers turn data into knowledge and action How to match a machine learning algorithm to your data

Origins of ML Data everywhere Recorded data Explosion of recorded data – electronic sensors Governments Businesses Individuals Era of Big Data

Machine Learning ML: Development of computer algorithms to transform data into intelligent action 3 elements: available data, statistical methods, computing power Data mining vs Machine learning ML: teaching computers how to use data to solve a problem DM: teaching computers to identify patterns that humans then use to solve a problem DM involves ML but not vice versa

Uses & Abuses of ML The power of ML – Deep Blue, Watson Machines are still intellectual horsepower without direction Machines are good at answering questions but not asking them

ML successes

Limits of machine learning Not a substitute for human brain Limited ability to make simple common sense inferences without lifetime experiences Translate language – 1994 episode of the television show Improvements made by Google, apple, Microsoft – still limited ability to understand context

Machine Learning Ethics Ethical implications is something not to ignore Legal issues and social norms Laws Terms of service Trust Privacy Racial, ethnic, religious, etc Simple exclusion of some sensitive data may not be sufficient Inappropriate use of data may hurt users

How Machines Learn Human brains are capable of learning from birth Conditions necessary for computers to learn must be made explicit Basic learning process components: Data storage Abstraction Generalization Evaluation Entire learning process inextricably linked

Data Storage Human – electrochemical signals in a network of biological cells Computer – RAM and CPU Ability to store/retrieve data alone is not sufficient for learning Sustainable strategy Memorizing a small set of representative ideas Developing strategies on how the ideas relate Large ideas can be understood without memorization by rote

Abstraction Assigning meaning to stored data Knowledge representation – formation of logical structures that assist in turning raw sensory information into a meaningful insight Model – explicit description of the patterns within the data Types of models: Mathematical equations Relational diagrams such as trees and graphs Logical if/else rules Groupings of data known as clusters

Training Process of fitting a model to a dataset Learned model does not provide new data, but result in new knowledge Observations -> Data -> Model Model results in the discovery of previously unseen relationships among data

Generalization Learning process must provide actionable insight Generalization – process of turning abstracted knowledge into a form that can be utilized for future action Limiting the patterns to those most relevant to future tasks Heuristics – educated guesses about where to find the most useful inferences Cons of heuristics Human – heuristics guided by emotions Machines – heuristics may result in bias, conclusions are systematically erroneous, or wrong in a predictable manner

Biases Biased towards Biased against

Evaluation Bias is necessary to drive action in the face of limitless possibility Evaluation – measure the learner’s success in spite of its biases and use this information to inform additional training if needed No Free Lunch theorem Model evaluated on a new test dataset Noise – unexplained or unexplainable variants in data Causes of noises Measurement error Issues with human subjects Data quality problems Complex phenomena that impact the data unsystematically

Overfitting Effect of trying to model noise Attempting to explain noise results in erroneous conclusions More complex models that miss the true pattern Not generalize well to the test dataset

Machine learning in practice Data collection Data exploration and preparation Model training Model evaluation Model improvement Successes and failures of the deployed model might provide additional data to train next generation learner

Types of input data Unit of observation – smallest entity with measured properties of interest for a study, e.g., persons, objects, transactions, time points, etc Units of observation can be combined Unit of analysis – smallest unit from which the inferences is made

Datasets Stored units of observation and their properties Examples – instances of unit of observation Features – recorded properties or attributes of examples Matrix format Row – example Column – feature Forms of features Numeric Categorical/nominal Ordinal Non-ordinal

Types of machine learning algorithms Predictive model Prediction of one value using other values in the dataset Target feature – the feature being predicted Supervised learning – target values provide a way for the learner to know how well it has learned the desired task Classification – predicting which category an example belongs to Class – target feature to be predicted is a categorical feature Levels – categories the class is divided into, may or may be ordinal

Numeric prediction Linear regression – a common form Boundaries between classification models and numeric prediction models is not necessarily firm

Descriptive model Summarizing data in new and interesting ways No single feature is more important than any other Unsupervised learning – the process of training a descriptive model E.g., pattern discovery – identify useful associations within data, e.g., market basket analysis Clustering – dividing a dataset into homogeneous groups Segmentation analysis – identify groups of individuals with similar behavior or demographic information

Meta-learners Not ties to a specific learning task Focus on learning how to learn more effectively Use the result of some learnings to inform additional learning

ML Algorithms

Matching input data to algorithms Determine which of the 4 learning tasks your project represents Classification Numeric prediction Pattern detection Clustering Choose among algorithms Distinctions among algorithms Strengths and weaknesses

End of Chapter 1