111 Support Vector Machines and Predictive Data Modeling Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Chapter 2 Principles Of Science And Systems. What Is Science? Science Depends On Skepticism And Accuracy Deductive And Inductive Reasoning Are Both Useful.
Lesson Overview 1.1 What Is Science?.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
Copyright © Allyn & Bacon (2007) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Copyright © Allyn & Bacon (2010) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Model Assessment and Selection
From Big Data to Little Knowledge
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
The loss function, the normal equation,
Instructor : Saeed Shiry
Introduction to Predictive Learning
Learning From Data Chichang Jou Tamkang University.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 1 Explaining Behavior.
Machine Learning CMPT 726 Simon Fraser University
Developing Ideas for Research and Evaluating Theories of Behavior
Introduction to Predictive Learning
SVM Support Vectors Machines
Part 4: ADVANCED SVM-based LEARNING METHODS
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 1 INTRODUCTION and OVERVIEW.
Part I: Classification and Bayesian Learning
Acquiring Knowledge in Science. Some Questions  What is science and how does it work?  Create a list of words to describe science  Which ways of knowing.
Introduction to Social Science Research
Section 2: Science as a Process
Definitions of Reality (ref . Wiki Discussions)
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Learning from observations
Learning from Observations Chapter 18 Through
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Statistics Introduction 2. The word Probability derives from the Latin probabilitas, which can also mean probity, a measure of the authority of a witness.
Where did plants and animals come from? How did I come to be?
11 Overview of Predictive Learning Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota Presented at the University.
Biological Science.
11/8/2015 Nature of Science. 11/8/2015 Nature of Science 1. What is science? 2. What is an observation? 3. What is a fact? 4. Define theory. 5. Define.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 2 Basic Learning Approaches and Complexity Control.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
Definitions of Reality (ref. Wiki Discussions). Reality Two Ontologic Approaches What exists: REALISM, independent of the mind What appears: PHENOMENOLOGY,
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
©2005, Pearson Education/Prentice Hall CHAPTER 1 Goals and Methods of Science.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Data-Driven Knowledge Discovery and Philosophy of Science
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Lesson Overview Lesson Overview What Is Science?.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 4 Statistical Learning Theory.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Philosophy of science What is a scientific theory? – Is a universal statement Applies to all events in all places and time – Explains the behaviour/happening.
Advanced Methodologies for Predictive Data-Analytic Modeling
CS 9633 Machine Learning Support Vector Machines
Introduction to Research Methodology
Predictive Learning from Data
Introduction to Research Methodology
Predictive Learning from Data
Overview of Machine Learning
Introduction.
CS639: Data Management for Data Science
Presentation transcript:

111 Support Vector Machines and Predictive Data Modeling Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota Presented at Tech Tune Ups, ECE Dept, June 1, 2011

22 2 Acknowledgements Research on Predictive Learning supported by NSF grant ECCS The A. Richard Newton Breakthrough Research Award from Microsoft ResearchA. Richard Newton Breakthrough Research Award Joint work with grad students F. Cai & S. Dhar Parts of this presentation are from the books Introduction to Predictive Learning, by Cherkassky and Ma, Springer 2011 Learning from Data, by Cherkassky and Mulier, Wiley 2007

333 OUTLINE Introduction + Motivation 4 parts of this course: Philosophy, induction and predictive data modeling Support vector machines (SVM) SVM practical issues and applications Advanced SVM-based learning technologies

444 Motivation 1 Two critical points: (1) Humans can not reason about uncertainty in a rational way Examples (2) Humans and animals have excellent biological capabilities to cope with uncertainty and risk Examples

555 Motivation 2 Growth of data in digital age Is it possible to extract knowledge from this data? – philosophical and cultural implications How to extract knowledge from data? – business and technological aspects Is this a natural domain of statistics?

666 Motivation 3: biological learning Rosenblatt’s Perceptron (early 1960’s) - an early attempt to simulate biological learning (simple learning algorithm for a linear classifier) Young scientists in Moscow tried to understand generalization properties of such ‘machines’ and developed new statistical learning theory

777 Motivation 4: why SVM? Support Vector Machines - developed in the USSR in mid-1960’s - later introduced in the West in mid-1990’s - currently the most widely used method for modeling high-dimensional data -based on new mathematical theory different from classical statistics VC-theory also provides philosophical framework for ‘learning from data’ This new predictive modeling methodology is still poorly understood

888 PART 1: Philosophy, induction and predictive data modeling Understanding uncertainty and risk Induction and knowledge discovery Philosophy and statistical learning Predictive learning approach Introduction to VC-theory

999 Understanding Uncertainty Humans tend to avoid uncertainty, and try to explain unpredictable events Aristotle: All men by nature desire knowledge Learning ~discovering regularities from data Ancient cultures, i.e. Ancient Greeks, had no formal concepts related to randomness: Unpredictable events (wars, natural disasters etc.) were thought to be controlled by Gods or Fate. In modern society, religion has been replaced by science and pseudo-science

10 Gods, Prophets and Shamans

11 Science and Uncertainty Math, Logic and Science are about certainty ~ deterministic rules Probability and empirical data: involves uncertainty ~ inferior knowledge This view dominates modern science, i.e. True Scientific knowledge consists of deterministic Laws of Nature There is a (true, causal) model explaining a given natural phenomenon (i.e., disease)

12 Causal Determinism in Science Popular view of science - deterministic rules (laws of Nature) - reflects objective reality (single truth) - knowledge inferred from (observed) data Digital technology enables growth of data  Can expect rapid growth of knowledge by applying (statistical, data mining etc.) algorithms to this data Reality is more sobering (as usual)

13 Popular Hype: the data deluge makes scientific method obsolete Wired Magazine, 16/07: We can stop looking for (scientific) models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot. Early Detection of Cancer (or other diseases): Massive data analysis of cancer samples in order to identify unique proteins for tens of thousands of types of cancer. The goal is that (in the future) we can all be screened for these proteins as early warning signals for cancer.

14 REALITY Many studies have questionable value - statistical correlation vs causation Some border stupidity/ pseudoscience - US scientists at SUNY discovered Adultery Gene !!! (based on a sample of 181 volunteers interviewed about sexual life) Usual conclusion - more research is needed …

15 Some Views on Science Karl Popper: Science starts from problems, and not from observations Werner Heisenberg: What we observe is not nature itself, but nature exposed to our method of questioning Albert Einstein: Reality is merely an illusion, albeit a very persistent one.

16 Scientific Discovery Always involves ideas (models) and facts (data) Classical first-principle knowledge: hypothesis  data  scientific theory Note: deterministic, simple models Modern data-driven discovery: Computer program + DATA  knowledge Note: statistical, complex systems Two philosophies, poorly understood

17 COMPLEX SYSTEMS A. Einstein: When the number of factors coming into play in a phenomenological complex is too large, scientific method in most cases fails us. Example: weather prediction Does digital technology make Einstein’s claim obsolete?

18 Examples of Complex Systems Life Sciences Healthcare Climate modeling Social Systems (i.e. financial markets) Attempts to understand and model such systems using deterministic approach usually fail

19 Problem of Induction in Philosophy Francis Bacon: advocated empirical knowledge (inductive) vs scholastic David Hume: What right do we have to assume that the future will be like the past? Philosophy of Science tries to resolve this dilemma/contradiction between deterministic logic and uncertain nature of empirical data. Digital Age: growth of empirical data, and this dilemma becomes important in practice.

20 What is ‘a good model’? All models are mental constructs that (hopefully) relate to real world Two goals of data-driven modeling: - explain available data - predict future data All good (scientific) models make non-trivial predictions  Good data-driven models can predict well, so the goal is to estimate predictive models

21 Three Types of Knowledge Growing role of empirical knowledge Classical philosophy of science differentiates only between (first-principle) science and beliefs (demarcation problem) Importance of demarcation btwn empirical knowledge and beliefs in applications

22 Examples of Nonscientific Beliefs Aristotle’s science - everything is a mix of 4 basic elements: earth, water, air and fire Geocentric system of the world Origin of life (spontaneous generation) - disproved by L. Pasteur in 19 th century Modern belief: every medical condition can be traced to genetic variations - is it a popular belief or scientific theory ?

23 Popper’s Demarcation Principle Karl Popper: Every true (inductive) theory prohibits certain events or occurences, i.e. it should be falsifiable First-principle scientific theories vs beliefs or metaphysical theories Risky prediction, testability, falsifiability

24 Popper’s conditions for scientific hypothesis -Should be testable -Should be falsifiable Example 1: Efficient Market Hypothesis(EMH) The prices of securities reflect all known information that impacts their value Example 2: We do not see our noses, because they all live on the Moon

25 Predictive Learning: Formalization Given: data samples ~ training data (x,y) Estimate: a model, or function, f(x) that - explains this data and - can predict future data Classification problem:  Learning ~ function estimation

26 Application Example: predicting gender of face images Training data: labeled face images Male etc. Female etc.

27 Predicting Gender of Face Images Input ~ 32x32 pixel image Model ~ indicator function f(x) separating 1024-dimensional pixel space in two halves Model should predict well new images Difficult machine learning problem, but easy for human recognition

28 Learning ~ Reliable Induction Induction ~ function estimation from data: Deduction ~ prediction for new (test) inputs:

29 Common Learning Problems Classification Regression Note: explanation does not ensure prediction

30 Common Learning Problems Unsupervised learning (i.e., clustering) Note: many other types of problems exist. All such problems ~ inductive learning setting

31 Generalization and Complexity Control Consider regression estimation Ten training samples Fitting linear and 2-nd order polynomial:

32 Complexity Control (cont’d) The same data set: Using k-nn regression with k=1 and k=4  Generalization depends on model complexity

33 Complexity Control: issues Theoretical + conceptual - how to define model complexity Practical 1 - high-dimensional data Practical 2 - true model is not known  resampling for choosing opt. complexity Model selection ~ choosing opt model complexity

34 Resampling Split available data into 2 sets: Training + Validation (1) Use training set for model estimation (via data fitting) (2) Use validation data to estimate the ‘prediction’ error of the model Change model complexity index and repeat (1) and (2) Select the final model providing lowest (estimated) prediction error BUT results are sensitive to data splitting

35 K-fold cross-validation 1.Divide the training data Z into k (randomly selected) disjoint subsets {Z 1, Z 2,…, Z k } of size n/k 2.For each ‘left-out’ validation set Z i : - use remaining data to estimate the model - estimate prediction error on Z i : 3.Estimate ave prediction risk as

36 Example of model selection 25 samples are generated as with x uniformly sampled in [0,1], and noise ~ N(0,1) Regression estimated using polynomials of degree m=1,2,…,10 Polynomial degree m = 5 is chosen via 5-fold cross-validation. The curve shows the polynomial model, along with training (* ) and validation (*) data points, for one partitioning. mEstimated R via Cross validation

37 Statistical vs Predictive Approach Binary Classification problem estimate decision boundary from training data where y ~ binary class label (-1/+1) Assuming distribution P(x,y) is known: (x1,x2) space

38 Classical Statistical Approach (1) parametric form of unknown distribution P(x,y) is known (2) estimate parameters of P(x,y) from the training data (3) Construct decision boundary using estimated distribution and given misclassification costs Estimated boundary Modeling assumption: Distribution P(x,y) can be accurately estimated from available data

39 Predictive Approach (1) parametric form of decision boundary f(x,w) is given (2) Explain available data via fitting f(x,w), or minimization of some loss function (i.e., squared error) (3) A function f(x,w*) providing smallest fitting error is then used for predictiion Estimated boundary Modeling assumptions - Need to specify f(x,w) and loss function a priori. - No need to estimate P(x,y)

40 Two Different Methodologies System Identification (~ classical statistics) - estimate probabilistic model (class densities) from available data - use this model to make predictions System Imitation (~ biological learning) - need only predict well, i.e. imitate specific aspect of unknown system; - multiplicity of good models; - can they be interpreted and/or trusted? Which approach works for high-dim. data?

41 Classification with High-Dimensional Data Digit recognition 5 vs 8: each example ~ 32 x 32 pixel image  1,024-dimensional vector x Medical analogy -Each pixel ~ genetic marker -Each patient (sample) described by 1024 genetic markers -Two classes ~ presence/ absence of a disease Estimation of P(x,y) with finite data is not possible Accurate estimation of decision boundary in 1024-dim. space is possible, using just a few hundred samples

42 Statistical vs Predictive: Discussion Classical statistics has modeling goals: - interpretable model explaining the data - few important input variables (risk factors) - prediction performance is not verified but (usually) assumed – Why? Predictive modeling has different goals: - prediction (generalization) is the main goal - prediction accuracy is measured/reported - model interpretation is not important, as it cannot be objectively evaluated

43 PART 1: Philosophy, induction and predictive data modeling Understanding uncertainty and risk Induction and knowledge discovery Philosophy and statistical learning Predictive learning approach Introduction to VC-theory

44 Empirical Risk Minimization ERM principle for learning –Model parameterization: f(x, w) –Loss function: L(f(x, w),y) –Estimate risk from data: –Choose w* that minimizes R emp  model f(x, w*) explains past data ERM principle ~ biological approach Statistical Learning Theory (aka VC-theory) under what conditions the ERM-style models will generalize (predict) well?

45 Inductive Learning Setting The learning machine observes samples (x,y), and returns an estimated response Recall ‘ first-principles ’ vs ‘ empirical ’ knowledge  Two types of inference: identification vs imitation Risk

46 VC-theory basics - 1 Goals of Predictive Learning - explain (or fit) available training data - predict well future (yet unobserved) data Similar to biological learning Example: given 1, 3, 7, … predict the rest of the sequence. Rule 1: Rule 2: randomly chosen odd numbers Rule 3: BUT for sequence 1, 3, 7, 15, 31, 63, …, Rule 1 seems very reliable (why?)

47 VC-theory basics - 2 Main Practical Result of VC-theory: If a model explains well past data AND is simple, then it can predict well This explains why Rule 1 is a good model for sequence 1, 3, 7, 15, 31, 63, …, Measure of model complexity ~ VC-dimension ~ Ability to explain past data 1, 3, 7, 15, 31, 63 BUT can not explain all other possible sequences  Low VC-dimension (~ large falsifiability) For linear models, VC-dim = DoF (as in statistics) differentBut for nonlinear models they are different

48 VC-theory basics - 3 Strategy for modeling high-dimensional data: Find a model f(x) that explains past data AND has low VC-dimension, even when dim. is large SVM approach Large margin = Low VC-dimension ~ easy to falsify

49 SUMMARY & DISCUSSION Predictive data modeling: - training data similar to future (test) data - performance index/loss function - predictive methodology is different from classical statistics - may not be a single true model - ‘conventional’ model interpretation is hard Understanding of uncertainty and risk: - changing due to technological advances - cultural and ethical issues