Presentation is loading. Please wait.

Presentation is loading. Please wait.

Julian Center on Regression for Proportion Data July 10, 2007 (68)

Similar presentations


Presentation on theme: "Julian Center on Regression for Proportion Data July 10, 2007 (68)"— Presentation transcript:

1 Julian Center on Regression for Proportion Data July 10, 2007 (68)

2 MaxEnt2007 Regression For Proportion Data Julian Center Creative Research Corp. Andover, MA, USA

3 MaxEnt2007Julian Center Overview Introduction Introduction What is proportion data? What is proportion data? What do we mean by regression? What do we mean by regression? Examples Examples Why should you care? Why should you care? Coordinate Transformation to Facilitate Regression. Coordinate Transformation to Facilitate Regression. Measurement Models Measurement Models Multinomial Multinomial Laplace Approximation to Multinomial Laplace Approximation to Multinomial Log-Normal Log-Normal Regression Models Regression Models Kernal Regression (Nadaraya-Watson Model) Kernal Regression (Nadaraya-Watson Model) Gaussian Process Regression Gaussian Process Regression With Log Normal Measurements With Log Normal Measurements With Multinomial Measurements – Expectation Propagation With Multinomial Measurements – Expectation Propagation Conclusion Conclusion

4 MaxEnt2007Julian Center What is Proportion Data?

5 MaxEnt2007Julian Center What is Regression? Regression = Smoothing + Calibration + Interpolation. Regression = Smoothing + Calibration + Interpolation. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Relates data gathered under one set of conditions to data gathered under similar, but different conditions. Accounts for measurement “noise”. Accounts for measurement “noise”. Determines p(r|x). Determines p(r|x).

6 MaxEnt2007Julian Center Examples Geostatistics: Composition of rock samples at different locations. Geostatistics: Composition of rock samples at different locations. Medicine: Response to different levels of treatment. Medicine: Response to different levels of treatment. Political Science: Opinion polls across different demographic groups. Political Science: Opinion polls across different demographic groups. Climate Research: Climate Research: Infer climate history from fossil pollen samples. Infer climate history from fossil pollen samples. Calibrate model using present day samples from known climates. Calibrate model using present day samples from known climates. Typically, examine 400 pollen grains and sort into 14 categories Typically, examine 400 pollen grains and sort into 14 categories

7 MaxEnt2007Julian Center Why Should You Care? Either, you have proportion data to analyze. Either, you have proportion data to analyze. Or, you want to do pattern classification. Or, you want to do pattern classification. Or, you want to use a similar approach to your problem. Or, you want to use a similar approach to your problem. Transform constrained variables so that a Laplace approximation makes sense. Transform constrained variables so that a Laplace approximation makes sense. Two different regression techniques. Two different regression techniques. Expectation Propagation for improving model fit. Expectation Propagation for improving model fit.

8 MaxEnt2007Julian Center Coordinate Transformation Well-known regression methods can’t deal with the pesky constraints of the simplex. Well-known regression methods can’t deal with the pesky constraints of the simplex. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. We need a one-to-one mapping between the d-simplex and d-dimensional real vectors. Then we can model probability distributions on real vectors and relate them to distributions on the simplex. Then we can model probability distributions on real vectors and relate them to distributions on the simplex.

9 MaxEnt2007Julian Center Coordinate Transformation The rows of T span the orthogonal Complement of 1 (d+1) Symmetric Softmax Activation Function Centered Log Ratio Linkage Function We can always find T by the Gram-Schmidt Process

10 MaxEnt2007Julian Center ln(y 1 )=- ln(y 2 ) f Softmax is insensitive to this direction. Coordinate Transformation ln(y 2 ) ln(y 1 ) Image of Simplex Under ln y1y1 y2y2 Simplex

11 MaxEnt2007Julian Center Measurement Models Multinomial Multinomial Log-Normal Log-Normal

12 MaxEnt2007Julian Center Measurement Model - Multinomial -

13 MaxEnt2007Julian Center Multinomial Measurement Model R1= S=400

14 MaxEnt2007Julian Center Measurement Model - Laplace Approximation - Some regression methods assume a Gaussian measurement model. Some regression methods assume a Gaussian measurement model. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Therefore, we are tempted to approximate each Multinomial measurement with a Gaussian measurement. Let’s try a Laplace approximation to each measurement. Let’s try a Laplace approximation to each measurement. Laplace Approximation: Laplace Approximation: Find the peak of the log-likelihood function. Find the peak of the log-likelihood function. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick a Gaussian centered at the peak with covariance matrix that matches the negative second derivative of the log- likelihood function at the peak. Pick an amplitude factor to match the height of the peak. Pick an amplitude factor to match the height of the peak.

15 MaxEnt2007Julian Center Measurement Model - Laplace Approximation -

16 MaxEnt2007Julian Center Laplace Approximation to Multinomial

17 MaxEnt2007Julian Center Laplace Approximation to Multinomial

18 MaxEnt2007Julian Center Laplace Approximation to Multinomial

19 MaxEnt2007Julian Center Laplace Approximation to Multinomial

20 MaxEnt2007Julian Center Laplace Approximation to Multinomial

21 MaxEnt2007Julian Center Laplace Approximation to Multinomial

22 MaxEnt2007Julian Center Measurement Model - Log-Normal - e.g. Over-dispersion or under-dispersion

23 MaxEnt2007Julian Center Regression Models Way of relating data taken under different conditions. Way of relating data taken under different conditions. Intuition: Similar conditions should produce similar data. Intuition: Similar conditions should produce similar data. The best to use methods depends on the problem. The best to use methods depends on the problem. Two methods considered here: Two methods considered here: Nadaraya-Watson model. Nadaraya-Watson model. Gaussian Process model. Gaussian Process model.

24 MaxEnt2007Julian Center Nadaraya-Watson Model Based on applying Parzen density estimation to the joint distribution of f and x Based on applying Parzen density estimation to the joint distribution of f and x

25 MaxEnt2007Julian Center x f All Data Points

26 MaxEnt2007Julian Center x f Nadaraya-Watson Model

27 MaxEnt2007Julian Center Nadaraya-Watson Model

28 MaxEnt2007Julian Center Nadaraya Watson Model

29 MaxEnt2007Julian Center Nadaraya-Watson Model Problem: We must compare a new point to every training point. Problem: We must compare a new point to every training point. Solution: Solution: Choose a sparse set of “knots”, and center density components only on knots. Choose a sparse set of “knots”, and center density components only on knots. Adjust weights and covariances by “diagnostic training”. Adjust weights and covariances by “diagnostic training”. Mixture model training tools apply. Mixture model training tools apply.

30 MaxEnt2007Julian Center x f Sparse Nadaraya-Watson Model

31 MaxEnt2007Julian Center Gaussian Process Model Probability distribution on functions. Probability distribution on functions. Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). Specified by mean function m(x) and covariance kernel k(x 1,x 2 ). For any finite collection of points, the corresponding function values are jointly Gaussian. For any finite collection of points, the corresponding function values are jointly Gaussian.

32 MaxEnt2007Julian Center x f Gaussian Process Model

33 MaxEnt2007Julian Center Applying Gaussian Process Regression to Proportion Data Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Prior – Model each component of f(x) as a zero-mean Gaussian process with covariance kernel k(x 1,x 2 ). Assume that the components of f are independent of each other. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Posterior – Use the Laplace approximations to the measurements and apply Kalman filter methods. Use Expectation Propagation to improve fit. Use Expectation Propagation to improve fit.

34 MaxEnt2007Julian Center Sparse Gaussian Process Model

35 MaxEnt2007Julian Center Sparse Gaussian Process Model

36 MaxEnt2007Julian Center Sparse Gaussian Process Model

37 MaxEnt2007Julian Center Sparse Gaussian Process Model

38 MaxEnt2007Julian Center GP– Log-Normal Model

39 MaxEnt2007Julian Center GP– Log-Normal Model

40 MaxEnt2007Julian Center GP – Log-Normal Model 1 1

41 MaxEnt2007Julian Center GP Multinomial Model

42 MaxEnt2007Julian Center Expectation Propagation Method

43 MaxEnt2007Julian Center Expectation Propagation Method

44 MaxEnt2007Julian Center Expectation Propagation Method

45 MaxEnt2007Julian Center Expectation Propagation Method

46 MaxEnt2007Julian Center Expectation Propagation Method

47 MaxEnt2007Julian Center Expectation Propagation Method

48 MaxEnt2007Julian Center Choosing the Regression Model If you have two samplings taken under the same conditions, do you want to treat them as coming from a bimodal distribution (NW Model) or combine them into one big sampling (GP Model)?

49 MaxEnt2007Julian Center Conclusion A coordinate transformation makes it possible to analyze proportion data with known regression methods. A coordinate transformation makes it possible to analyze proportion data with known regression methods. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The Multinomial distribution can be well approximated by a Gaussian on the transformed variable. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit. The choice of regression model depends on the effect that you want – multimodal vs unimodal fit.

50 MaxEnt2007Julian Center


Download ppt "Julian Center on Regression for Proportion Data July 10, 2007 (68)"

Similar presentations


Ads by Google