Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 27 February 2008.

Slides:



Advertisements
Similar presentations
Determining How Costs Behave
Advertisements

Simple Linear Regression Analysis
Multiple Regression and Model Building
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Correlation and Linear Regression.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 4 The Relation between Two Variables
Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited.
x – independent variable (input)
Chapter 12 Simple Regression
Simple Linear Regression
The Simple Regression Model
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Linear Regression/Correlation
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Relationships Among Variables
Correlation & Regression
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Inference for regression - Simple linear regression
Chapter 15 Correlation and Regression
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Computing & Information Sciences Kansas State University Friday, 21 Nov 2008CIS 530 / 730: Artificial Intelligence Lecture 35 of 42 Friday, 21 November.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Correlation & Regression
Examining Relationships in Quantitative Research
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 16 February 2007 William.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
–Working with relationships between two variables “Donation “ made to teacher & Stats Test Score.
Chapter 14 Correlation and Regression
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Data Mining and Decision Support
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 25 February 2008 William.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Chapter 13 Simple Linear Regression
Chapter 4 Basic Estimation Techniques
Chapter 7. Classification and Prediction
Linear Regression.
AP Statistics Chapter 14 Section 1.
LECTURE 11: Advanced Discriminant Analysis
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Chapter 11: Simple Linear Regression
Chapter 11 Simple Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
Linear Regression/Correlation
Product moment correlation
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 27 February 2008 William H. Hsu Department of Computing and Information Sciences, KSU Readings: Section 6.11, Han & Kamber 2e Chapter 1, Sections , Goldberg Sections , Mitchell Regression and Prediction Lecture 16 of 42

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture Outline Readings –Section 6.11, Han & Kamber 2e –Suggested: Chapter 1, Sections , Goldberg Paper Review: “Genetic Algorithms and Classifier Systems”, Booker et al Evolutionary Computation –Biological motivation: process of natural selection –Framework for search, optimization, and learning Prototypical (Simple) Genetic Algorithm –Components: selection, crossover, mutation –Representing hypotheses as individuals in GAs An Example: GA-Based Inductive Learning (GABIL) GA Building Blocks (aka Schemas) Taking Stock (Course Review): Where We Are, Where We’re Going

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition –Working with relationships between two variables Size of Teaching Tip & Stats Test Score © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Correlation & Regression Univariate & Bivariate Statistics –U: frequency distribution, mean, mode, range, standard deviation –B: correlation – two variables Correlation –linear pattern of relationship between one variable (x) and another variable (y) – an association between two variables –relative position of one variable correlates with relative distribution of another variable –graphical representation of the relationship between two variables Warning: –No proof of causality –Cannot assume x causes y © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Scatterplot! No Correlation –Random or circular assortment of dots Positive Correlation –ellipse leaning to right –GPA and SAT –Smoking and Lung Damage Negative Correlation –ellipse learning to left –Depression & Self-esteem –Studying & test errors © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Pearson’s Correlation Coefficient “r” indicates… –strength of relationship (strong, weak, or none) –direction of relationship positive (direct) – variables move in same direction negative (inverse) – variables move in opposite directions r ranges in value from –1.0 to +1.0 Strong Negative No Rel. Strong Positive Go to website! –playing with scatterplots © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Practice with Scatterplots r =.__ __ © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Correlation Guestimation © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Samples vs. Populations Sample statistics estimate Population parameters –M tries to estimate μ –r tries to estimate ρ (“rho” – greek symbol --- not “p”) r correlation for a sample based on a the limited observations we have ρ actual correlation in population the true correlation Beware Sampling Error!! –even if ρ=0 (there’s no actual correlation), you might get r =.08 or r = -.26 just by chance. –We look at r, but we want to know about ρ © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Hypothesis testing with Correlations Two possibilities –Ho: ρ = 0 (no actual correlation; The Null Hypothesis) –Ha: ρ ≠ 0 (there is some correlation; The Alternative Hyp.) Case #1 (see correlation worksheet) –Correlation between distance and points r = –Sample small (n=6), but r is very large –We guess ρ < 0 (we guess there is some correlation in the pop.) Case #2 –Correlation between aiming and points, r =.628 –Sample small (n=6), and r is only moderate in size –We guess ρ = 0 (we guess there is NO correlation in pop.) Bottom-line –We can only guess about ρ –We can be wrong in two ways © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Reading Correlation Matrix r = p = Probability of getting a correlation this size by sheer chance. Reject Ho if p ≤.05. sample size r (4) = -.904, p .05 © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Predictive Potential Coefficient of Determination –r² –Amount of variance accounted for in y by x –Percentage increase in accuracy you gain by using the regression line to make predictions –Without correlation, you can only guess the mean of y –[Used with regression] 20%0%80%100%60%40% © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Limitations of Correlation linearity: –can’t describe non-linear relationships –e.g., relation between anxiety & performance truncation of range: –underestimate stength of relationship if you can’t see full range of x value no proof of causation –third variable problem: could be 3 rd variable causing change in both variables directionality: can’t be sure which way causality “flows” © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Regression Regression: Correlation + Prediction –predicting y based on x –e.g., predicting…. throwing points (y) based on distance from target (x) Regression equation –formula that specifies a line –y’ = bx + a –plug in a x value (distance from target) and predict y (points) –note y= actual value of a score y’= predict value Go to website! –Regression Playground © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Regression Graphic – Regression Line if x=18 then… y’=47 if x=24 then… y’=20 See correlation & regression worksheet © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Regression Equation y’= bx + a –y’ = predicted value of y –b = slope of the line –x = value of x that you plug-in –a = y-intercept (where line crosses y access) In this case…. –y’ = (x) So if the distance is 20 feet 20 –y’ = ( 20 ) –y’ = –y’ = See correlation & regression worksheet © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition SPSS Regression Set-up “Criterion,” y-axis variable, what you’re trying to predict “Predictor,” x-axis variable, what you’re basing the prediction on Note: Never refer to the IV or DV when doing regression © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Getting Regression Info from SPSS x y’ = b ( x ) + a 20 y’ = ( 20 ) See correlation & regression worksheet a b © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Predictive Ability Mantra!! –As variability decreases, prediction accuracy ___ –if we can account for variance, we can make better predictions As r increases: –r² increases “variance accounted for” increases the prediction accuracy increases –prediction error decreases (distance between y’ and y) –Sy’ decreases the standard error of the residual/predictor measures overall amount of prediction error We like big r’s!!! © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Drawing a Regression Line by Hand Three steps 1.Plug zero in for x to get a y’ value, and then plot this value –Note: It will be the y-intercept 2.Plug in a large value for x (just so it falls on the right end of the graph), plug it in for x, then plot the resulting point 3.Connect the two points with a straight line! © 2005 Sinn, J. Winthrop University

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time Series Prediction Forecasting the Future and Understanding the Past Santa Fe Institute Proceedings on the Studies in the Sciences of Complexity Edited by Andreas Weingend and Neil Gershenfeld NIST Complex System Program Perspectives on Standard Benchmark Data In Quantifying Complex Systems Vincent Stanford Complex Systems Test Bed project August 31, 2007

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Chaos in Nature, Theory, and Technology Rings of Saturn Lorentz Attractor Aircraft dynamics at high angles of attack Aircraft dynamics at high angles of attack

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time Series Prediction A Santa Fe Institute competition using standard data sets Santa Fe Institute (SFI) founded in 1984 to “… focus the tools of traditional scientific disciplines and emerging computer resources on … the multidisciplinary study of complex systems…” “This book is the result of an unsuccessful joke. … Out of frustration with the fragmented and anecdotal literature, we made what we thought was a humorous suggestion: run a competition. …no one laughed.” Time series from physics, biology, economics, …, beg the same questions: –What happens next? –What kind of system produced this time series? –How much can we learn about the producing system? Quantitative answers can permit direct comparisons Make some standard data sets in consultation with subject matter experts in a variety of areas. Very NISTY; but we are in a much better position to do this in the age of Google and the Internet.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Selecting benchmark data sets For inclusion in the book Subject matter expert advisor group: –Biology –Economics –Astrophysics –Numerical Analysis –Statistics –Dynamical Systems –Experimental Physics

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition The Data Sets A.Far-infrared laser excitation B.Sleep Apnea C.Currency exchange rates D.Particle driven in nonlinear multiple well potentials E.Variable star data F.J. S. Bach fugue notes

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition J.S. Bach benchmark Dynamic, yes. But is it an iterative map? Is it amenable to time delay embedding?

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Competition Tasks Predict the withheld continuations of the data sets provided for training and measure errors Characterize the systems as to: –Degrees of Freedom –Predictability –Noise characteristics –Nonlinearity of the system Infer a model for the governing equations Describe the algorithms employed

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Complex Time Series Benchmark Taxonomy Complex Time Series Benchmark Taxonomy Natural Stationary Low dimensional Clean Short Documented Linear Scalar One trial Continuous Synthetic Nonstationary Stochastic Noisy Long Blind Nonlinear Vector Many trials Discontinuous Switching Catastrophes Episodes Synthetic Nonstationary Stochastic Noisy Long Blind Nonlinear Vector Many trials Discontinuous Switching Catastrophes Episodes

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time honored linear models Auto Regressive Moving Average (ARMA) Many linear estimation techniques based on Least Squares, or Least Mean Squares Power spectra, and Autocorrelation characterize such linear systems Randomness comes only from forcing function x(t)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Simple nonlinear systems can exhibit chaotic behavior Spectrum, autocorrelation, characterize linear systems, not these Deterministic chaos looks random to linear analysis methods Logistic map is an early example ( Elam 1957 ). Logisic map 2.9 < r < 3.99

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Understanding and learning comments from SFI Weak to Strong models - many parameters to few Data poor to data rich Theory poor to theory rich Weak models progress to strong, e.g. planetary motion: –Tycho Brahe: observes and records raw data –Kepler: equal areas swept in equal time –Newton: universal gravitation, mechanics, and calculus –Poincaré: fails to solve three body problem –Sussman and Wisdom: Chaos ensues with computational solution! Is that a simplification?

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Discovering properties of data and inferring (complex) models Can’t decompose an output into the product of input and transfer function Y(z)=H(z)X(z) by doing a Z, Laplace, or Fourier transform. Linear Perceptrons were shown to have severe limitations by Minsky and Papert Perceptrons with non-linear threshold logic can solve XOR and many classifications not available with linear version But according to SFI: “Learning XOR is as interesting as memorizing the phone book. More interesting - and more realistic - are real-world problems, such as prediction of financial data.” Many approaches are investigated

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time delay embedding Differs from traditional experimental measurements –Provides detailed information about degrees of freedom beyond the scalar measured –Rests on probabilistic assumptions - though not guaranteed to be valid for any particular system –Reconstructed dynamics are seen through an unknown “smooth transformation” –Therefore allows precise questions only about invariants under “smooth transformations” –It can still be used for forecasting a time series and “characterizing essential features of the dynamics that produced it”

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time delay embedding theorems “The most important Phase Space Reconstruction technique is the method of delays” –Assuming the dynamics f(X) on a V dimensional manifold has a strange attractor A with box counting dimension d A –s(X) is a twice differentiable scalar measurement giving {s n }={s(X n )} –M is called the embedding dimension –  –  is generally referred to as the delay, or lag –Embedding theorems: if {s n } consists of scalar measurements of the state a dynamical system then, under suitable hypotheses, the time delay embedding {S n } is a one-to-one transformed image of the {X n }, provided M > 2d A. ( e.g. Takens 1981, Lecture Notes in Mathematics, Springer-Verlag; or Sauer and Yorke, J. of Statistical Physics, 1991 ) VectorSequenceScalarMeasurement Time delay Vectors

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time series prediction Many different techniques thrown at the data to “see if anything sticks” Examples: –Delay coordinate embedding - Short term prediction by filtered delay coordinates and reconstruction with local linear models of the attractor ( T. Sauer ). –Neural networks with internal delay lines - Performed well on data set A ( E. Wan ), ( M. Mozer ) –Simple architectures for fast machines - “Know the data and your modeling technique” ( X. Zhang and J. Hutchinson ) –Forecasting pdf’s using HMMs with mixed states - Capturing “Embedology” ( A. Frasar and A. Dimiriadis ) –More…

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Time series characterization Many different techniques thrown at the data to “see if anything sticks” Examples: –Stochastic and deterministic modeling - Local linear approximation to attractors ( M. Kasdagali and A. Weigend ) –Estimating dimension and choosing time delays - Box counting ( F. Pineda and J. Sommerer ) –Quantifying Chaos using information-theoretic functionals - mutual information and nonlinearity testing.( M. Palus ) –Statistics for detecting deterministic dynamics - Course grained flow averages ( D. Kaplan ) –More…

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition What to make of this? Handbook for the corpus driven study of nonlinear dynamics Very NISTY: –Convene a panel of leading researchers –Identify areas of interest where improved characterization and predictive measurements can be of assistance to the community –Identify standard reference data sets: Development corpra Test sets –Develop metrics for prediction and characterization –Evaluate participants –Is there a sponsor? –Are there areas of special importance to communities we know? For example: predicting catastrophic failures of machines from sensors.

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Ideas?

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Terminology Evolutionary Computation (EC): Models Based on Natural Selection Genetic Algorithm (GA) Concepts –Individual: single entity of model (corresponds to hypothesis) –Population: collection of entities in competition for survival –Generation: single application of selection and crossover operations –Schema aka building block: descriptor of GA population (e.g., 10**0*) –Schema theorem: representation of schema proportional to its relative fitness Simple Genetic Algorithm (SGA) Steps –Selection Proportionate reproduction (aka roulette wheel): P(individual)  f(individual) Tournament: let individuals compete in pairs or tuples; eliminate unfit ones –Crossover Single-point:   { , } Two-point:   { , } Uniform:   { , } –Mutation: single-point (“bit flip”), multi-point

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Summary Points Evolutionary Computation –Motivation: process of natural selection Limited population; individuals compete for membership Method for parallelizing and stochastic search –Framework for problem solving: search, optimization, learning Prototypical (Simple) Genetic Algorithm (GA) –Steps Selection: reproduce individuals probabilistically, in proportion to fitness Crossover: generate new individuals probabilistically, from pairs of “parents” Mutation: modify structure of individual randomly –How to represent hypotheses as individuals in GAs An Example: GA-Based Inductive Learning (GABIL) Schema Theorem: Propagation of Building Blocks Next Lecture: Genetic Programming, The Movie