Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Window Fourier and wavelet transforms. Properties and applications of the wavelets. A.S. Yakovlev.
Lecture 4. Linear Models for Regression
Lecture 9 Support Vector Machines
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Chapter Outline 3.1 Introduction
A first look Ref: Walker (Ch.2) Jyun-Ming Chen, Spring 2001
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Bregman Iterative Algorithms for L1 Minimization with
Prediction with Regression
Pattern Recognition and Machine Learning
Signal Denoising with Wavelets. Wavelet Threholding Assume an additive model for a noisy signal, y=f+n K is the covariance of the noise Different options.
Extensions of wavelets
Data mining and statistical learning - lecture 6
MATH 685/ CSI 700/ OR 682 Lecture Notes
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Basis Expansion and Regularization
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
Support Vector Machine
Pattern Recognition and Machine Learning
3D Geometry for Computer Graphics
Curve-Fitting Regression
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Basis Expansions and Regularization Based on Chapter 5 of Hastie, Tibshirani and Friedman.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Support Vector Machines
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
(CH. 5 Part 2) Speaker: Brian Quanz 7/3/2008
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
An Introduction to Support Vector Machines Martin Law.
Image Denoising using Wavelet Thresholding Techniques Submitted by Yang
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
+ Review of Linear Algebra Optimization 1/14/10 Recitation Sivaraman Balakrishnan.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Aug. 27, 2003IFAC-SYSID2003 Functional Analytic Framework for Model Selection Masashi Sugiyama Tokyo Institute of Technology, Tokyo, Japan Fraunhofer FIRST-IDA,
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
An Introduction to Support Vector Machines (M. Law)
Image Denoising Using Wavelets
Different types of wavelets & their properties Compact support Symmetry Number of vanishing moments Smoothness and regularity Denoising Using Wavelets.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Class 19, spring 2001 CBCl/AI MIT Review by Evgeniou, Pontil and Poggio Advances in Computational Mathematics, 2000 The “b” problem We said that the solution.
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
ENEE698A Graduate Seminar Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods Shaohua (Kevin) Zhou Center for Automation.
Signal reconstruction from multiscale edges A wavelet based algorithm.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Signal & Weight Vector Spaces
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
1 Kernel-class Jan Recap: Feature Spaces non-linear mapping to F 1. high-D space 2. infinite-D countable space : 3. function space (Hilbert.
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Pattern Recognition and Image Analysis
By Poornima Balakrishna Rajesh Ganesan George Mason University A Comparison of Classical Wavelet with Diffusion Wavelets.
Bayesian fMRI analysis with Spatial Basis Function Priors
Wavelet Transform Advanced Digital Signal Processing Lecture 12
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Boosting and Additive Trees (2)
Statistical Learning Dong Liu Dept. EEIS, USTC.
Wavelet Transform (Section )
Feature space tansformation methods
I.4 Polyhedral Theory (NW)
Basis Expansions and Generalized Additive Models (2)
Basis Expansions and Generalized Additive Models (1)
I.4 Polyhedral Theory.
Linear Algebra Lecture 35.
Presentation transcript:

Basis Expansions and Regularization Part II

Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces

Smoothing Splines Among all functions with two continuous derivatives, find the f that Minimizes penalized RSS It is the same to find an f in the Sobolev space of functions with finite 2 nd derivatives. Optimal solution is a natural spline, with knot at unique values of input data points. (Exercise 5.7, [Theorem 2.3 in Green-Silverman 1994])

Optimality of Natural Splines Green, Silverman, Nonparametric Regression and Generalized Linear Models, p.16-17, 1994.

Optimality of Natural Splines Continued… Green, Silverman, Nonparametric Regression and Generalized Linear Models, p.16-17, 1994.

Multidimensional Splines Tensor products of one-dim basis functions  Consider all possible products of these basis elements  Get M1*M2*…*Mk basis functions  Fit coefficients by LS  Dimension grows exponentially Need to select some of these (MARS)  Provides flexibility, but introduces more spurious structures Thin-Plate splines for two dimensions  Generalization of smoothing splines in one dim  Penalty (integrated quad form in Hessian)  Natural extension to 2- dim leads to a solution with radial basis functions  High Computational complexity

Tensor Product

Additive v.s. Tensor Product More Flexiable

Thin-Plate Splines Min RRS + J(f) It leads to thin-plate splines if

Thin-Plate Splines Contour Plots for Heart Disease Data Response: Systolic BP, Inputs: Age, Obesity  Data points  64 lattice points used as knots  Knots inside the convex hull of data (red) should be used carefully  Knots outside the data convex hull (Green) can be ignored

Back to Spline N(x): the natural spline basis The minimization problem is written as: By solving it, we get:

Properties of S S  can be written in the Reinsch form S     while K is the penalty matrix. It is equivalent to say S y  is the solution of  can be represented as the eigenvectors and eigenvalues of  :

Properties of S   i =1/(1+ d i ) is shrunk towards zero, which leads to S*S  S. For comparison, the eigenvaules of a projection matrix in regression are 1 or 0, since H*H = H The first two eigenvalues of S  are always one, since d 1 =d 2 =0, corresponding to linear terms. The sequence of u i, ordered by decreasing  i, appear to increase in complexity.

Reproducing Kernel Hilbert Space A RKHS H K is a functional space generated by a positive definite kernel K with  i  0 and   i 2 < . Elements of H K have an expansion in terms of the eigen- function: with constraint that

Example of RK Polynomial Kernel in R 2 : K(x,y) = (1+ ) 2 which corresponds to Gaussian Radial Basis Functions

Regularization in RKHS Solve Representer Thm: optimizer lies in finite dim space where and K nxn = K(x i, x j )

Support Vector Machines SVM for a two-class classification problem has the form f(x) =  0 +   I K(x,x i ) where parameter  ’s are chosen by Most of the  ’s are zeros in the solution, and the non- zero  ’s are called support vectors.

Choose True Function Fitted Function

Nuclear Magnetic Resonance Signal Spline Basis is still too smooth to capture local spikes/bumps

Haar Wavelet Basis Haar Wavelats Father wavelet  (x) Mother wavelet  (x)

Haar Father Wavelet Father wavelet  (x) V 0 = {  0,k (x) ; k = … -1, 0, 1, …} Let  (x) = I(x  [0,1]), define  j,k (x) = 2 j/2  (2 j x - k) V j = {  j,k (x) ; k = … -1, 0, 1, …}  0,k (x) =  (x-k) Then   V 1  V 0  V -1  

Haar Mother Wavelet Father wavelet  (x) Mother wavelet  (x) Let  (x) =  (2x) -  (2x-1), then  j,k (x) = 2 j/2  (2 j x - k) form a basis for W j Let W j be the orthogonal complement of V j to V j+1 : V j+1 = V j + W j We have V j+1 = V j + W j = V j-1 + W j-1 + W j Thus, V J = V 0 + W 1 +  + W J-1

Daubechies Symmlet-p Wavelet Symmlet Wavelats Father wavelet  (x) Mother wavelet  (x)

Wavelet Transform Haar Wavelats Suppose N = 2^J in one-dimension Let W be the N x N orthonormal wavelet basis matrix, then y* = W T y is called the wavelet transform of y In practice, the wavelet transform is NOT performed by matrix multiplication as in y* = W T y Using clever pyramidal schemes, y* can be obtained in O(N) computations, faster than fast Fourier transform (FFT)

Wavelet Smoothing Stein Unbiased Risk Estimation (SURE) shrinkage This leads to the simple solution: The fitted function is given by

Soft Thresholding v.s Hard Thresholding Soft thresholdingHard thresholding   (LASSO) (Subset Selection)

Choice of Adaptive fitting of  a simple choice (Donoho and Johnstone, 1994) with  as an estimate of the standard deviation of the noise Motivation: for white noise Z 1, , Z N, the expected maximum of |Z j | is approximately

Wavelet Coef. of NMRS Original Signal Wavelet decomposition WaveShurnk Signal Signal W9W9 W8W8 W7W7 W6W6 W5W5 W4W4 V4V4

Nuclear Magnetic Resonance Signal Wavelet shrinkage fitted line in green

Wavelet Image Denoising JPEG2000 uses WTT OriginalNoise AddedDenoised

Summary of Wavelet Smoothing Wavelet basis adapt to smooth curve and local bumps Discrete Wavelet Transform (DWT) and Inverse Wavelet Transform computation is O(N) Data denoising Data compression: sparse presentation Lots of applications …