A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
GRA 6020 Multivariate Statistics; The Linear Probability model and The Logit Model (Probit) Ulf H. Olsson Professor of Statistics.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Bregman Divergences in Clustering and Dimensionality Reduction COMS : Learning and Empirical Inference Irina Rish IBM T.J. Watson Research Center.
Independent Component Analysis (ICA) and Factor Analysis (FA)
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Continuous Latent Variables --Bishop
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Algorithm Evaluation and Error Analysis class 7 Multiple View Geometry Comp Marc Pollefeys.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Structure Computation. How to compute the position of a point in 3- space given its image in two views and the camera matrices of those two views Use.
Summarized by Soo-Jin Kim
Outline Separating Hyperplanes – Separable Case
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
EM and expected complete log-likelihood Mixture of Experts
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Gu Yuxian Wang Weinan Beijing National Day School.
Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Parameter estimation. 2D homography Given a set of (x i,x i ’), compute H (x i ’=Hx i ) 3D to 2D camera projection Given a set of (X i,x i ), compute.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Optimal Bayes Classification
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Linear Models for Classification
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
LDA (Linear Discriminant Analysis) ShaLi. Limitation of PCA The direction of maximum variance is not always good for classification.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Information geometry.
Data Transformation: Normalization
ML estimation in Gaussian graphical models
Deep Feedforward Networks
Stat 223 Introduction to the Theory of Statistics
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Generalized Linear Models
Ch3: Model Building through Regression
Lecture 04: Logistic Regression
Machine Learning Basics
Parameter estimation class 5
Special Topics In Scientific Computing
Simple Linear Regression - Introduction
Probabilistic Models for Linear Regression
Machine Learning for Signal Processing Linear Gaussian Models
Linear Regression.
Sampling Distribution
Sampling Distribution
Statistical Assumptions for SLR
POINT ESTIMATOR OF PARAMETERS
Principal Component Analysis
Biointelligence Laboratory, Seoul National University
Generally Discriminant Analysis
Maximum Likelihood Estimation (MLE)
Presentation transcript:

A Generalization of PCA to the Exponential Family Collins, Dasgupta and Schapire Presented by Guy Lebanon

Two Viewpoints of PCA Algebraic Given data find a linear transformation such that the sum of squared distances is minimized (over all linear transformation ) Statistical Given data assume that each point is a random variable. Find the maximum likelihood estimator under the constraint that are in a K dimensional subspace and are linearly related to the data.

The Gaussian assumption may be inappropriate – especially if the data is binary valued or non-negative for example. Suggestion: replace the Gaussian distribution by any exponential distribution. Given data such that each point comes from an exponential family distribution, find the MLE for under the assumption that it lies in a low dimensional subspace.

The new algorithm finds a linear transformation in the parameter space but a nonlinear subspace in the original coordinates. The loss functions may be cast in terms of Bregman distances. The loss function is not convex in the general case. The authors use the alternating minimization algorithm (Csiszar and Tsunadi) to compute the transformation.