Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Components Analysis

Similar presentations


Presentation on theme: "Principal Components Analysis"— Presentation transcript:

1 Principal Components Analysis
Get gifts for the audience. Eric Vaagen, FCAS Assistant Actuary September 5, 2008

2 Agenda Motivation What is PCA? Background Simple example
Is PCA right for you?

3 Motivation Forecast average premium by coverage Explanatory variables
Vehicle use, territory, driving record Breakdown of change in average premium Multicollinearity exists Forecast of average premium by coverage is done as a part of our ratemaking exercise.

4 Average Premium We needed to understand what is driving average premium and come up with an accurate forecast for 2007. A $1 difference in premium corresponds to $3M which is x% on our rate indication. A model exercise is undertaken -> next slide.

5 Modeling Procedure Explanatory Variables Variable Selection Response
This is a generic model flowchart -> taking into account average premium and vehicle use, territory, and driving experience, the flowchart looks like… Chosen Variables Model

6 Modeling Procedure Vehicle Use Territory Drv. Record Variable
Selection Average Premium Chosen Variables Multiple Regression

7 Variable Selection Methods
Stepwise regression Forward, backward PCA Unsupervised Partial least squares Supervised GLM

8 Background First described in 1901 by Karl Pearson
Find the best lines and planes to fit a set of points What else did he discover? Pearson’s χ² Linear regression Classification of distributions (exponential family) Regression of sons heights upon that of their fathers.

9 PCA Example Vehicle use Territory Pleasure Commute Business Rural
Explanatory Variables Vehicle use Pleasure Commute Business Territory Rural Suburban Urban

10 Vehicle Use 2002-2006 These are the explanatory variables.
Example: in 2002, there was 30% Pleasure, …

11 Territory Explanatory variable proportions, along with the low, medium, and high rates, produce the following average premium graph…

12 Example – Average Premium
Response Variable Does this look familiar? It is the same as the 2007 rate filing.

13 Modeling Procedure Vehicle Use Territory PCA Average Premium Multiple
Here is the modeling flow chart with PCA as the variable selection method. We will now focus on the PCA method itself. Chosen PCs Multiple Regression

14 PCA Procedure PCs Output No multicollinearity
The 1st PC has the most variance Output Weights to create the PCs Variability of each PC The 1st PC has the most variance of any linear combination of the explanatory variables. The 2nd PC has the most of the remaining variance. Technical: The PCs are an orthogonal linear transformation of the original variables to a new coordinate system. An eigenvalue decomposition is done on the covariance matrix of the explanatory variables -> the eigenvectors are the weights for the PCs and the eigenvalues are the variability of each PC. The variability of each PC is used to choose which variables to keep in the model.

15 Modeling Procedure Vehicle Use Territory 5 years x 6 variables Weights
PCA 5 years x 6 variables Here is the modeling flow chart with PCA as the variable selection method. We will now focus on the PCA method itself. Variability Chosen PCs

16 Example – Scree Plot Scree – broken rock that appears at the bottom of crags and cliff faces. Keep PCs with more variance than average (16.66%) Keep a certain % of total variance (90% or 95%) Now that we have decided to keep the first 3 PCs, let’s see how each one is calculated.

17 PC Calculation Pleasure Commute Business Rural Suburban Urban PC #1
Chosen Variables PC Calculation Pleasure Commute Business Rural Suburban Urban PC #1 -0.19 0.54 -0.40 0.56 -0.45 -0.03 PC #2 -0.54 0.14 0.48 -0.20 -0.31 0.58 PC #3 -0.55 0.36 0.23 -0.02 0.47

18 PC Calculation Pleasure P Rural R Commute C Suburban S Business B
PC1 = P C B + 0.56R S U PC12002 = -0.19(30%)+0.54(50%)-0.40(20%) +0.56(20%)-0.45(30%)-0.03(50%)

19 Example - Modeling Procedure
Vehicle Use Territory PCA Average Premium I have explained how to get from the explanatory variables to the chosen PCs. Now, let’s look at the modeling results. The modeling part is outside the scope of this presentation, but I am including it so that we can see results. Chosen PCs Multiple Regression

20 Example – Results Multiple Regression
Stepwise is worse – because it doesn’t capture the changes from variables not selected in the forecast period. Outside the scope - economic variables are included in the forecasts of the explanatory variables.

21 ICBC Personal TPB

22 Advantages Eliminates multicollinearity
Most of the original variance is captured in a few principal components More refined selection method Also, you can determine the impact of each rating variable on the overall change in average premium because all of the explanatory variables are included.

23 Disadvantages Can be hard to interpret the PCs
PC weights may not be stable from year to year Difficult to explain

24 Is PCA Right For You? Concerned about multicollinearity?
Confident in the set of explanatory variables? Want to reduce dimensionality, without throwing away variables?

25 For More Information 2008 Discussion Paper Predictive modeling seminar
PCA and Partial Least Squares: Two Dimension Reduction Techniques for Regression Predictive modeling seminar Oct 6-7, 2008 in San Diego, CA PCA and Partial Least Squares


Download ppt "Principal Components Analysis"

Similar presentations


Ads by Google