Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Chapter 7 Hypothesis Testing
Machine learning continued Image source:
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
An Overview of Machine Learning
Instructor : Dr. Saeed Shiry
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
CIS 678 Artificial Intelligence problems deduction, reasoning knowledge representation planning learning natural language processing motion and manipulation.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Ensemble Learning: An Introduction
Learning From Data Chichang Jou Tamkang University.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Announcements Homework 3 due Monday Test next week: Wednesday or Thursday (your choice); at SL 228 testing center; one hour time limit; no calculators;
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Part I: Classification and Bayesian Learning
Topic: Models of the Universe Key Terms: Geocentric Theory Heliocentric Theory.
Radial Basis Function Networks
Online Learning Algorithms
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
P. STATISTICS LESSON 7.2 ( DAY 2)
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Digital Image Processing Lecture 25: Object Recognition Prof. Charlene Tsai.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Data Mining and Decision Support
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Computacion Inteligente Least-Square Methods for System Identification.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CS 9633 Machine Learning Support Vector Machines
Supervised Time Series Pattern Discovery through Local Importance
Statistical Learning Dong Liu Dept. EEIS, USTC.
Face Recognition and Detection Using Eigenfaces
INF 5860 Machine learning for image classification
Computational Learning Theory
Computational Learning Theory
Kepler’s Laws of Planetary Motion
Machine learning overview
CS639: Data Management for Data Science
Lecture 14 Learning Inductive inference
Pattern Analysis Prof. Bennett
Presentation transcript:

Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini

Outline What is pattern analysis? Illustrate issues via example Pattern definitions Examples of practical tasks Pattern algorithms Summary

Pattern Analysis The automatic detection of patterns in data from the same source. Make predictions of new data coming from the same source. Data may take many forms: images, text, records of commercial transactions, genome sequences, family tree

Data Driven Analysis DPP2P2 P3P3 Mercury Venus Earth1.00 Mars Jupiter Saturn Kepler Analyzed Brahe’s Planetary Motion Data P = Period D = Average Distance from Sun

Found “Regularities” Observed P 3 = D 2 Developed three laws of planetary motion. Compressible: Data can be represented by one column Predictable: Discovering hidden relations allow us to predict other columns. Third Law is exact.

Data Representation I Nonlinear Model of D and P Linear Model of

Data Representation II Assume we know plane of orbit, so we can represent positions as (x,y) pairs Also know orbit is ellipse

Data Representation Pattern is nonlinear function of x,y Pattern is linear function of Linear relationships are easier to find.

Set of Hypotheses Hypothesis Ellipse compute Hypothesis Circle compute UNDERFITS

Set of Hypotheses Hypothesis any continuous function OVERFITS!!! Depends on size of hypothesis class Use domain knowledge to limit hypotheses

Approximate Pattern Noisy Data

Typical Pattern Analysis Approximate not exact. Data has errors and omissions. Cannot predict graduate school performance from GRE’s and grades alone. Best Representation/Model unknown. Make approximate predictions – need to address how accurate estimates are.

Definition: Exact Pattern A general exact pattern, f, for data source S satisfies for all data x from source S

Approximate Pattern A general approximate pattern, f, for data source S satisfies for all data x from source S

Statistical Pattern A general statistical pattern, f, for data source S generated iid according to distribution D satisfies for all data x from source S

Two and Multiclass Classification Example – Character Recognition two class - is it an A or not? multiclass – what letter is it ?

Regression Example –Determine drug bioavailability through the intestine. Estimate apparent permeability as assayed via intestinal cell line.

Density Estimation Estimate the probability that a particular event occurs, p(x). Use it to detect improbably events like fraud.

Principal Component Analysis Find a projection of the data that captures the major variance in the data. Eigenfaces - capture essential qualities of faces to help ID and reduce storage needs.

Pattern Analysis Algorithm A Pattern Analysis Algorithm input = finite set of data from source S a.k.a. the training set output = detector function f or no patterns detected

Pattern Algorithm Issues Efficiency and Scalability – memory and CPU requirements, large data sets Robustness – find approximate patterns on noisy data Stability - discover genuine patterns, find same problems on different views of the dataset

Stability Generalization – Find pattern on future data Pattern may exist by chance for finite sample Provide statistical guarantee that pattern truly exist with caveat that with small probability that algorithm may have been mislead.

Example Observe that for state agency that all 20 babies adopted in last 10 years from country x are girls. Pattern, only girls are available for adoption from that country. With probability p=(0.5)2 20 could observe data even if chance of girls and boys equally likely. So with chance p, we were mislead.

Statistical Learning Theory Produce a pattern based on a finite sample. Provide bounds on the probability that pattern approximately represents a true pattern with some probability. Probably Approximately Correct

Recoding Strategy With proper representation, the problem can become easier (linear model works). Develop general purpose linear learning methods. Change recoding using “kernel functions”

Key Ideas Patterns are regularities in data from a specified source Algorithm takes finite sample and computes pattern Efficiency, robustness, and stability Representation -- Kernels Strategy = Generic Algorithms + Recoding Many Learning Tasks in this framework