Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

EigenFaces.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Surface normals and principal component analysis (PCA)
Machine Learning Lecture 8 Data Processing and Representation
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 3. Interpolation and Extrapolation Hui Pan, Yunfei Duan.
Data mining and statistical learning - lecture 6
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Principal Component Analysis
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Principal Component Analysis
Curve-Fitting Regression
Face Recognition Jeremy Wyatt.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Principal Component Analysis Principles and Application.
Tables, Figures, and Equations
Dan Simon Cleveland State University
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #18.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
Summarized by Soo-Jin Kim
Principle Component Analysis Presented by: Sabbir Ahmed Roll: FH-227.
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #19.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Regression analysis Control of built engineering objects, comparing to the plan Surveying observations – position of points Linear regression Regression.
Lecture 22 - Exam 2 Review CVEN 302 July 29, 2002.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
CSE 185 Introduction to Computer Vision Face Recognition.
Linear Subspace Transforms PCA, Karhunen- Loeve, Hotelling C306, 2000.
Visualizing and Exploring Data 1. Outline 1.Introduction 2.Summarizing Data: Some Simple Examples 3.Tools for Displaying Single Variable 4.Tools for Displaying.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Feature Extraction 主講人:虞台文.
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Estimating standard error using bootstrap
Dimensionality Reduction
9.3 Filtered delay embeddings
Principal Component Analysis (PCA)
Principal Component Analysis
Chapter 12 Curve Fitting : Fitting a Straight Line Gab-Byung Chae
Principal Component Analysis
Dimensionality Reduction
Feature space tansformation methods
Principal Components What matters most?.
Principal Component Analysis
Presentation transcript:

Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004

Empirical modeling  Moore’s law: Gordon Moore made his famous observation in 1965, just four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.original paper From

Covariance and correlation  Consider n pairs of measurements on each of variables x and y.  A measure of linear association between the measurements of variable x and y is “sample covariance” – If s xy > 0: positively correlated – If s xy < 0: negatively correlated – If s xy = 0: uncorrelated  Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)

Correlation Y (Salary in $1000s)X (Years Experience) Strong relationship

Covariance & Correlation matrix  Given n measurements on p variables, the sample covariance is  and the covariance matrix,  The sample correlation coefficient for the i th and j th variables,  and the correlation matrix,

In two dimension

Fitting a line to data  When the correlation coefficient is large, it indicates a dependence of one variable on the other. The simplest relationship is the straight line: y =  0 +  1 x  Criteria for a best fit line: least squares  The resulting equation is called “regression equation”, and its graph is called the “regression line”.  The sum of squares of the error SS:  Least square equations:

A measure of fit  Suppose we have data points (x i, y i ) and modeled (or predicted) points (x i, ŷ i ) from the model ŷ = f(x).  Data {y i } have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.  Residual sum of squares: variation not explained by the model  Regression sum of squares: variation explained by the model  The coefficient of determination R 2 x1x1 x2x2 y1y1 y2y2 y Total variation in y = Variation explained by the model+ Unexplained variation (error)

Principal Component Analysis (PCA)  PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.  First principal axis points to the direction of the maximum variation in the data.  Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.  It can be used to: – Reduce number of dimensions in data. – Find patterns in high-dimensional data. – Visualize data of high dimensionality.

PCA, II  Assume X is an n  p matrix and is “centered (zero mean)”  Let a be the p  1 column vector of projection weights (unknown at this point) that result in the largest variance when the data X are projected along a.  We can express the projected values onto a of all data vectors in X as Xa.  Now define the variance along a as  We wish to maximize the variance under the constraint that a T a=1  optimization with constraints  method of Lagrange multipliers

Example, 2D  Covariance matrix,  Decomposition,  PCA, From CMU Computer Vision by Tai Sing Lee

PCA, III  If S = {s ik } is the p  p sample covariance matrix with eigenvalue, eigenvectors pairs ( 1, v 1 ), ( 2, v 2 ),…, ( p, v p ), the i th principal component is given by where 1  2  …  p  0 and x is the p-dimensional vector formed by the random variables x 1, x 2,…, x p.  Also

Applications  Dimensional reduction  Image compression  pattern recognition  Gene expression data analysis  Molecular dynamics simulation  …

Dimensional reduction  We can throw v 3 away, and keep w=[v 1 v 2 ] and can still represent the information almost equally well.  v 1 and v 2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space. From CMU Computer Vision by Tai Sing Lee

Image compression, I  A set of N images, I 1, I 2,…, I N, each of which has n pixels. – Dataset of N dimensions and n observations – Corresponding pixels form vectors of intensities  Expand each of them as a series, where the optimal set of basis vectors are chosen to minimize the reconstruction error,  Principle components of the set form the optimal basis. – PCA produces N eigenvectors and eigenvalues. – Compress: choose limited number (k<N) of components – Information loss when recreating original data

Image compression, II  Given a large set of 8x8 image patches, convert an image patch into a vector by stacking the columns together into one column vector.  Compute the covariance matrix  Transforming into a set of new bases by PCA.  Since the eigenvalues in S drops rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v 1,...v k where k << 64 as bases (k  10).  Then I = a 1 v 1 + a 2 v a k v k  The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU Computer Vision by Tai Sing Lee

Applications  Representation – N x N pixel image  X=(x 1... x N 2 ) – x i is intensity value  PCA for Pattern identification – Perform PCA on matrix of M images – If new image  Which original image is most similar? – Traditionally: difference original image and new image – PCA: difference PCA data and new image – Advantage: PCA data reflects similarities and differences in image data – Omitted dimensions  still good performance  PCA for image compression – M images, each containing N 2 pixels – Dataset of M dimensions and N 2 observations – Corresponding pixels form vectors of intensities – PCA produces M eigenvectors and eigenvalues – Compress: choose limited number of components – Information loss when recreating original data

Interpolation & Extrapolation  Numerical Recipes, Chapter 3  Consider n pairs of data of variables x and y, and we don’t know an analytic expression for y=f(x).  The task is to estimate f(x) for arbitrary x by drawing a smooth curve through x i ’s. – Interpolation: if x is in between the largest and smallest of x i ’s. – Extrapolation: if x is outside of the range (more dangerous, example: stock market)  Methods – Polynomials, rational functions – Trigonometric interpolation: Fourier methods – Spline fit.  Order: the number of points (minus one) used in an interpolation – Increasing order does not necessarily increase the accuracy.

Polynomial interpolation, I  Straight line interpolation – Given two points (x 1, y 1 ) and (x 2, y 2 ), use a straight line to join two points to find all the missing values in between  Lagrange interpolation – First order – Second order polynomials:

Polynomial interpolation, I  In general, the interpolating polynomial of degree N-1 through the N points y 1 =f(x 1 ), y 2 =f(x 2 ), …, y N =f(x N ) is

Example, I  The values are evaluated. P(x) = *(x-1.7)(x-3.0) *(x-1.1)(x-3.0) *(x-1.1)(x-1.7) P(2.3) = *( )( ) *( )( ) *( )( ) =

Example, II  What happens if we increase the number of data points?  Coefficient for 2 is  Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.  The problem with adding additional points will create “bulges” in the graph.

Rational Function Interpolation

Cubic Spline Interpolation  Cubic Spline interpolation use only the data points used to maintaining the desired smoothness of the function and is piecewise continuous.  Given a function f defined on [a, b] and a set of nodes a=x 0 <x 1 <…<x n =b, a cubic spline interpolation S for f is – S(x) is a cubic polynomial, denoted S j (x), on the subinterval [x j, x j+1 ] for each j =0, 1, …, n-1; – S j (x j ) = f(x j ) for j = 0, 1, …, n; – S j+1 (x j+1 ) = S j (x j+1 ) for j = 0, 1, …, n-2; – S’ j+1 (x j+1 ) = S’ j (x j+1 ) for j = 0, 1, …, n-2; – S’’ j+1 (x j+1 ) = S’’ j (x j+1 ) for j = 0, 1, …, n-2; – Boundary conditions: S’’(a)= S’’(b)= 0