Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.

Similar presentations


Presentation on theme: "Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004."— Presentation transcript:

1 Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004

2 Empirical modeling  Moore’s law: Gordon Moore made his famous observation in 1965, just four years after the first planar integrated circuit was discovered. The press called it "Moore's Law" and the name has stuck. In his original paper, Moore observed an exponential growth in the number of transistors per integrated circuit and predicted that this trend would continue.original paper From http://www.intel.com/research/silicon/mooreslaw.htm

3 Covariance and correlation  Consider n pairs of measurements on each of variables x and y.  A measure of linear association between the measurements of variable x and y is “sample covariance” – If s xy > 0: positively correlated – If s xy < 0: negatively correlated – If s xy = 0: uncorrelated  Sample linear correlation coefficient (“Pearson’s product moment correlation coefficient”)

4 Correlation 201 8316 9021 5911 436 363 7213 649 578 303 Y (Salary in $1000s)X (Years Experience) Strong relationship

5 Covariance & Correlation matrix  Given n measurements on p variables, the sample covariance is  and the covariance matrix,  The sample correlation coefficient for the i th and j th variables,  and the correlation matrix,

6 In two dimension

7 Fitting a line to data  When the correlation coefficient is large, it indicates a dependence of one variable on the other. The simplest relationship is the straight line: y =  0 +  1 x  Criteria for a best fit line: least squares  The resulting equation is called “regression equation”, and its graph is called the “regression line”.  The sum of squares of the error SS:  Least square equations:

8 A measure of fit  Suppose we have data points (x i, y i ) and modeled (or predicted) points (x i, ŷ i ) from the model ŷ = f(x).  Data {y i } have two types of variations; (i) variation explained by the model and (ii) variation not explained by the model.  Residual sum of squares: variation not explained by the model  Regression sum of squares: variation explained by the model  The coefficient of determination R 2 x1x1 x2x2 y1y1 y2y2 y Total variation in y = Variation explained by the model+ Unexplained variation (error)

9 Principal Component Analysis (PCA)  PCA selects a new set of axes for the data by moving and rotating the coordinate system in such a way that the dependency between the two variables is removed in a new transformed coordinate system.  First principal axis points to the direction of the maximum variation in the data.  Second principal axis is orthogonal to the first one and is in the direction of the maximum variation in the remaining allowable directions, and so on.  It can be used to: – Reduce number of dimensions in data. – Find patterns in high-dimensional data. – Visualize data of high dimensionality.

10 PCA, II  Assume X is an n  p matrix and is “centered (zero mean)”  Let a be the p  1 column vector of projection weights (unknown at this point) that result in the largest variance when the data X are projected along a.  We can express the projected values onto a of all data vectors in X as Xa.  Now define the variance along a as  We wish to maximize the variance under the constraint that a T a=1  optimization with constraints  method of Lagrange multipliers

11 Example, 2D  Covariance matrix,  Decomposition,  PCA, From CMU 15-385 Computer Vision by Tai Sing Lee

12 PCA, III  If S = {s ik } is the p  p sample covariance matrix with eigenvalue, eigenvectors pairs ( 1, v 1 ), ( 2, v 2 ),…, ( p, v p ), the i th principal component is given by where 1  2  …  p  0 and x is the p-dimensional vector formed by the random variables x 1, x 2,…, x p.  Also

13 Applications  Dimensional reduction  Image compression  pattern recognition  Gene expression data analysis  Molecular dynamics simulation  …

14 Dimensional reduction  We can throw v 3 away, and keep w=[v 1 v 2 ] and can still represent the information almost equally well.  v 1 and v 2 also provide good dimensions in which different objects/textures form nice clusters in this 2D space. From CMU 15-385 Computer Vision by Tai Sing Lee

15 Image compression, I  A set of N images, I 1, I 2,…, I N, each of which has n pixels. – Dataset of N dimensions and n observations – Corresponding pixels form vectors of intensities  Expand each of them as a series, where the optimal set of basis vectors are chosen to minimize the reconstruction error,  Principle components of the set form the optimal basis. – PCA produces N eigenvectors and eigenvalues. – Compress: choose limited number (k<N) of components – Information loss when recreating original data

16 Image compression, II  Given a large set of 8x8 image patches, convert an image patch into a vector by stacking the columns together into one column vector.  Compute the covariance matrix  Transforming into a set of new bases by PCA.  Since the eigenvalues in S drops rapidly, we can represent the image more efficiently in this new coordinate with the eigenvectors (principle components) v 1,...v k where k << 64 as bases (k  10).  Then I = a 1 v 1 + a 2 v 2 +...+ a k v k  The idea is that now you only store 10 code words, each is a 8x8 image basis, then you can transmit the image with only 10 numbers instead of 64 numbers. From CMU 15-385 Computer Vision by Tai Sing Lee

17 Applications  Representation – N x N pixel image  X=(x 1... x N 2 ) – x i is intensity value  PCA for Pattern identification – Perform PCA on matrix of M images – If new image  Which original image is most similar? – Traditionally: difference original image and new image – PCA: difference PCA data and new image – Advantage: PCA data reflects similarities and differences in image data – Omitted dimensions  still good performance  PCA for image compression – M images, each containing N 2 pixels – Dataset of M dimensions and N 2 observations – Corresponding pixels form vectors of intensities – PCA produces M eigenvectors and eigenvalues – Compress: choose limited number of components – Information loss when recreating original data

18 Interpolation & Extrapolation  Numerical Recipes, Chapter 3  Consider n pairs of data of variables x and y, and we don’t know an analytic expression for y=f(x).  The task is to estimate f(x) for arbitrary x by drawing a smooth curve through x i ’s. – Interpolation: if x is in between the largest and smallest of x i ’s. – Extrapolation: if x is outside of the range (more dangerous, example: stock market)  Methods – Polynomials, rational functions – Trigonometric interpolation: Fourier methods – Spline fit.  Order: the number of points (minus one) used in an interpolation – Increasing order does not necessarily increase the accuracy.

19 Polynomial interpolation, I  Straight line interpolation – Given two points (x 1, y 1 ) and (x 2, y 2 ), use a straight line to join two points to find all the missing values in between  Lagrange interpolation – First order – Second order polynomials:

20 Polynomial interpolation, I  In general, the interpolating polynomial of degree N-1 through the N points y 1 =f(x 1 ), y 2 =f(x 2 ), …, y N =f(x N ) is

21 Example, I  The values are evaluated. P(x) = 9.2983*(x-1.7)(x-3.0) - 19.4872*(x-1.1)(x-3.0) + 8.2186*(x-1.1)(x-1.7) P(2.3) = 9.2983*(2.3-1.7)(2.3-3.0) - 19.4872*(2.3-1.1)(2.3-3.0) + 8.2186*(2.3-1.1)(2.3-1.7) = 18.3813

22 Example, II  What happens if we increase the number of data points?  Coefficient for 2 is  Note: that the coefficient creates a P4(x) polynomial and comparison between the two curves. The original value P2(x) is given.  The problem with adding additional points will create “bulges” in the graph.

23 Rational Function Interpolation

24 Cubic Spline Interpolation  Cubic Spline interpolation use only the data points used to maintaining the desired smoothness of the function and is piecewise continuous.  Given a function f defined on [a, b] and a set of nodes a=x 0 <x 1 <…<x n =b, a cubic spline interpolation S for f is – S(x) is a cubic polynomial, denoted S j (x), on the subinterval [x j, x j+1 ] for each j =0, 1, …, n-1; – S j (x j ) = f(x j ) for j = 0, 1, …, n; – S j+1 (x j+1 ) = S j (x j+1 ) for j = 0, 1, …, n-2; – S’ j+1 (x j+1 ) = S’ j (x j+1 ) for j = 0, 1, …, n-2; – S’’ j+1 (x j+1 ) = S’’ j (x j+1 ) for j = 0, 1, …, n-2; – Boundary conditions: S’’(a)= S’’(b)= 0


Download ppt "Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004."

Similar presentations


Ads by Google