Measure Independence in Kernel Space Presented by: Qiang Lou.

Slides:



Advertisements
Similar presentations
10.4 Complex Vector Spaces.
Advertisements

5.4 Basis And Dimension.
Eigen Decomposition and Singular Value Decomposition
Ch 7.7: Fundamental Matrices
Applied Informatics Štefan BEREŽNÝ
Outlines Background & motivation Algorithms overview
Chain Rules for Entropy
Quantum One: Lecture Canonical Commutation Relations 3.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Pattern Recognition and Machine Learning
Chapter 5 Orthogonality
Principal Component Analysis
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Tch-prob1 Chapter 4. Multiple Random Variables Ex Select a student’s name from an urn. S In some random experiments, a number of different quantities.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Continuous Random Variables and Probability Distributions
4 4.6 © 2012 Pearson Education, Inc. Vector Spaces RANK.
Linear Equations in Linear Algebra
Matrix Algebra THE INVERSE OF A MATRIX © 2012 Pearson Education, Inc.
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
Lecture II-2: Probability Review
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
Summarized by Soo-Jin Kim
Principles of Pattern Recognition
4.1 Vector Spaces and Subspaces 4.2 Null Spaces, Column Spaces, and Linear Transformations 4.3 Linearly Independent Sets; Bases 4.4 Coordinate systems.
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
Section 4-1: Introduction to Linear Systems. To understand and solve linear systems.
Quantum One: Lecture Representation Independent Properties of Linear Operators 3.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
1 Chapter 3 – Subspaces of R n and Their Dimension Outline 3.1 Image and Kernel of a Linear Transformation 3.2 Subspaces of R n ; Bases and Linear Independence.
4 4.6 © 2012 Pearson Education, Inc. Vector Spaces RANK.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
Operations on Multiple Random Variables
4 © 2012 Pearson Education, Inc. Vector Spaces 4.4 COORDINATE SYSTEMS.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Joint Moments and Joint Characteristic Functions.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
EE611 Deterministic Systems Multiple-Input Multiple-Output (MIMO) Feedback Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Principal Component Analysis (PCA)
Quantum One.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Principal Component Analysis
§2-3 Observability of Linear Dynamical Equations
Presented by Nagesh Adluru
Static Output Feedback and Estimators
Digital Control Systems
§1-3 Solution of a Dynamical Equation
Feature space tansformation methods
Equivalent State Equations
Generally Discriminant Analysis
Chapter 3 Canonical Form and Irreducible Realization of Linear Time-invariant Systems.
I.4 Polyhedral Theory (NW)
I.4 Polyhedral Theory.
16. Mean Square Estimation
Vector Spaces RANK © 2012 Pearson Education, Inc..
Matrix Algebra THE INVERSE OF A MATRIX © 2012 Pearson Education, Inc.
Presentation transcript:

Measure Independence in Kernel Space Presented by: Qiang Lou

References I made slides based on following papers: F. Bach and M. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, Arthur Gretton, Ralf herbrich, Alexander Smola, Olivier Bousquet, Bernhard Scholkopf. Kernel Methods for Measuring Independence. Journal of Machine Learning and Research, 2005.

Outline  Introduction  Canonical Correlation  Kernel Canonical Correlation  Application Example

Introduction What is Independence? Intuitively, two variables y 1, y 2 are said to be independent if information on value of one variable does not give any information on the value of the other variable. Technically, y 1 and y 2 are independent if and only if and only if the joint pdf is factorizable in the following way: p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 )

Introduction How to measure Independence. --Can we use correlation? --Uncorrelated variables means Independent variables? Remark: y 1 and y 2 are uncorrelated means: E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0

Introduction The answer is “No” Fact: Independence implies uncorrelatedness, but the reverse is not true. Which means: p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 ) → E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0 E[y 1 y 2 ] – E[y 1 ]E[y 2 ] = 0 → p(y 1, y 2 ) = p 1 (y 1 )*p 2 (y 2 ) This is easy to prove…

Introduction Now comes the question: How to measure independence?

Canonical Correlation Canonical Correlation Analysis (CCA) is concerned with finding a pair of linear transformations such that one component within each set of transformed variables is correlated with a single component in the other set. We focus on the first canonical correlation which is defined as the maximum possible correlation between the two projections and of x 1 and x 2 : C is the covariance matrix of (x 1, x 2 )

Canonical Correlation Taking derivatives with respect to and, we obtain:

Canonical Correlation

So, it can be extended to more than two sets of variables: (find smallest eigenvalue)

Kernel Canonical Correlation Kernel trick: defining a map from X to a feature space F, such that we can find a kernel satisfying:

Kernel Canonical Correlation F-correlation -- canonical correlation between Φ 1 (x 1 ) and Φ 2 (x 2 )

Kernel Canonical Correlation Notes: X1 and x2 are independent implies value of is 0. Is the converse true? -- If F is ‘large’, it’s true. -- If F is the space corresponding to a Gaussian Kernel which is positive definite kernel on X = R

Kernel Canonical Correlation Estimation of the F-correlation -- kernelized version of canonical correlation We will show that depends only on Gram matrices K1 and K2 of these observations, we will use to denote this canonical correlation. Suppose the data are centered in feature space. (i.e. )

Kernel Canonical Correlation We want to know: Which means we want to know three things:

Kernel Canonical Correlation For fixed f1 and f2, the empirical covariance of the projections in feature can be written:

Kernel Canonical Correlation Similarly, we can get the following:

Kernel Canonical Correlation Put three expressions together, we get: Similar with the problem we talked before, this is equivalent to the following generalized eigenvalue problem:

Kernel Canonical Correlation Problem: suppose that the Gram matrices K1 and K2 have full rank, canonical correlation will always be 1, whatever K1 and K2 are. Let V 1 and V 2 denote the subspaces of R N generated by the columns of K1 and K2, then we can rewrite: If K1 and K2 have full rank, V 1 and V 2 would be equal to R N

Kernel Canonical Correlation Solution: regularization by penalizing the norm of f1 and f2, so we get the regularized F-correlation as following: where k is a small positive constant. We expand:

Kernel Canonical Correlation Now we can get regularized KCC:

Kernel Canonical Correlation Generalizing to more than two sets of variables, it’s equivalent to the generalized eigenvalue problem:

Example Application Applications: -- ICA (Independent Component Analysis) -- Feature Selection See the demo for application in ICA…

Thank you!!! Questions?