Download presentation

Presentation is loading. Please wait.

Published byBraulio Hipps Modified about 1 year ago

1
R In Actuarial Pricing Teams Chibisi Chima-Okereke Mango Solutions E-mail: cchima-okereke@mango-solutions.com

2
Agenda Current software in actuarial analysisWhat is R?R as a functional languageBasic ExamplesActuarial pricingGLM ExampleChallenges and opportunities

3
UK Actuaries & CAS (Casualty Actuarial Society) Source Palisade ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf Actuarial Survey Geographical Area

4
Main Areas Of Work UK Actuaries & CAS (Casualty Actuarial Society) Source Palisade 2006 ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf

5
UK Actuaries & CAS (Casualty Actuarial Society) Source Palisade ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf Main area of work in which software is used

6
Percentage of respondents using each package UK Actuaries & CAS (Casualty Actuarial Society) Source Palisade ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf

7
Percentage of statistical package users using individual packages UK Actuaries & CAS (Casualty Actuarial Society) Source Palisade ( @Risk ): http://www.palisade.com/downloads/pdf/Pryor.pdf Use of Statistical Packages

8
R is the programming language of statistics Why should it not be the programming language of Actuaries? Inadequate current incumbents VBA: huge versioning issues and inadequate data manipulation and statistical function capabilities Excel: Inappropriate for analysis Proprietary Actuarial Software: No Granular Access To Processing Outputs R offers so much in terms of data manipulation, statistical models Spreadsheets are unstructured computer programs: The Risks Of Using Spreadsheets for Statistical Analysis (IBM White Paper): http://public.dhe.ibm.com/common/ssi/ecm/en/imw14297usen/IMW14297USEN.PDF http://public.dhe.ibm.com/common/ssi/ecm/en/imw14297usen/IMW14297USEN.PDF

9
Excel Very labour intensiveExcel spreadsheets are unstructured computer programsProblems with checking calculations and types of errors which can be silent and unknownDo your spreadsheets start to grind to a halt with rather moderate sets of data? Versioning excel files could be over 50MB each relative to script versions few KB. Imagine this across your network and the waste of space this encourages Linking spreadsheets stability issues etc VBA versioning problems, inadequate for data analysis and most useful purposes – harsh but true?

10
What is R? A big calculator? A programming language? A rapid prototyping tool? A free SAS? Statistical Analysis Tool? People have described R as:

11
Useful R Features Open source object oriented and functional programming language based on S+ designed for manipulating data/objects and carrying out statistical analysis Easy connections to external programs databases, e.g. RODBC - very stable, dynamic SQL queries etc Massive library of tools >>3400 packages GUIs can be created in a straightforward way, gWidgets (GTK+, RGTK) package Easy output formats, all picture files, data formats, even Excel!

12
Current Actuarial R Packages actuar (loss distributions)ChainLadderlifecontingenciesLifeTables http://cran.r-project.org/web/packages/

13
Reference: http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction- to-apply-in-r/ apply(data, index, function)lapply(list, function)aggregate(data, by, FUN)mapply(function(arg1, arg2), vector(arg1), vector(arg2),...)by(data, indices, function) More “advanced/powerful” {plyr} package extends the apply functionality (Hadley Wickham) Functional Programming

14
{plyr} Author: Hadley Wickham http://www.jstatsoft.org/v40/i01/paper I/OArrayData FrameListDiscarded Arrayaaplyadplyalplya_ply Data Framedaplyddplydlplyd_ply Listlaplyldplyllplyl_ply a*ply(.data,.margins,.fun,...)d*ply(.data,.variables,.fun,...)l*ply(.data,.fun,...)

15
Example Data Data Source (Simulated): Modern Actuarial Risk Theory Using R: Kaas, Goovaerts, Dhaene, and Denuit.

16
Dynamic SQL Query Example require(RODBC) doMyAnalysis <- function(myYear = 2001){ sqlString <- paste("SELECT * FROM policyClaims WHERE Year='",myYear,"'", sep = "") myData <- sqlQuery(channel = odbcConnect(dsn = "InsuranceData"), query = sqlString) odbcCloseAll() myGlm <- glm(noclaims ~ age + bonusmalus + region + mileage, data = myData, offset = log(exposure), family = poisson(link = "log")) myCoeffs <- summary(myGlm)$coeff theNames <- colnames(myCoeffs) myCoeffs <- data.frame(myCoeffs) myCoeffs <- data.frame(rownames(myCoeffs), myYear, myCoeffs) colnames(myCoeffs) <- c("Coeff", "Year", theNames) print(myYear) return(myCoeffs[1,]) } analysisOutPut <- lapply(2001:2010, doMyAnalysis) analysisOutPut <- do.call(rbind, analysisOutPut) rownames(analysisOutPut) <- 1:nrow(analysisOutPut) require(RODBC) doMyAnalysis <- function(myYear = 2001){ sqlString <- paste("SELECT * FROM policyClaims WHERE Year='",myYear,"'", sep = "") myData <- sqlQuery(channel = odbcConnect(dsn = "InsuranceData"), query = sqlString) odbcCloseAll() myGlm <- glm(noclaims ~ age + bonusmalus + region + mileage, data = myData, offset = log(exposure), family = poisson(link = "log")) myCoeffs <- summary(myGlm)$coeff theNames <- colnames(myCoeffs) myCoeffs <- data.frame(myCoeffs) myCoeffs <- data.frame(rownames(myCoeffs), myYear, myCoeffs) colnames(myCoeffs) <- c("Coeff", "Year", theNames) print(myYear) return(myCoeffs[1,]) } analysisOutPut <- lapply(2001:2010, doMyAnalysis) analysisOutPut <- do.call(rbind, analysisOutPut) rownames(analysisOutPut) <- 1:nrow(analysisOutPut)

17
Dynamics SQL Query Analysis Combination Example CoeffYearEstimateStd. Errorz valuePr(>|z|) Intercept2001-0.760.03-24.680.00 Intercept2002-0.770.03-24.920.00 Intercept2003-0.800.03-25.650.00 Intercept2004-0.780.03-25.170.00 Intercept2005-0.800.03-25.910.00 Intercept2006-0.760.03-24.920.00 Intercept2007-0.700.03-23.030.00 Intercept2008-0.760.03-24.670.00 Intercept2009-0.790.03-25.300.00 Intercept2010-0.750.03-24.460.00

18
Plotting Analysis myFun <- function(x){ hist(x$GrossIncurred, col = "blue", xlab = "GIC", main = paste("Histogram of GIC for bonus malus \n group ", x$BonusMalus[1], " and year ", x$Year[1], sep = "")) } pdf(file = paste(myFolder, "myPlots.pdf", sep = ""), width = 7, height = 7) by(policyTable, list("Year" = policyTable$Year, "BonusMalus" = policyTable$BonusMalus), FUN = myFun) dev.off() myFun <- function(x){ hist(x$GrossIncurred, col = "blue", xlab = "GIC", main = paste("Histogram of GIC for bonus malus \n group ", x$BonusMalus[1], " and year ", x$Year[1], sep = "")) } pdf(file = paste(myFolder, "myPlots.pdf", sep = ""), width = 7, height = 7) by(policyTable, list("Year" = policyTable$Year, "BonusMalus" = policyTable$BonusMalus), FUN = myFun) dev.off() C:\Users\cchima-okereke\Documents\R\RScripts\ActuarialPricing\tmp\myPlots.pdf

19
Plotting Analysis

20
GUI In R (claimsExploreR)

21

22

23
Pricing GLM Software Very limited choice of off the shelf pricing tools basically one: EMBLEM (Towers Watson) Current tools do not allow enough access to underlying functionality and tend to be very labour intensive Other statistical software e.g. SAS are extremely expensive to extend across organisations or do not have the range of statistical functionality required

24
Actuarial Pricing Teams 4-7 Actuarial Analysts+ Manager + Off The Shelf Solutions => Inefficiencies + Repetitive Tasks Typical Team 2-4 Actuarial Analysts + Manager + R => Efficient Automated System + Creative Analysis Team Using R +

25
GLM Models in Pricing Poisson – FrequencyGamma – SeverityNegative Binomial for frequency {MASS} Tweedie combines frequency and severity {statmod}

26
Variable Selection Criteria Information Criteria AIC BIC (Multiple flavours) Significance of variable: Chi- Squared/F-Test Consistency measures Other Measures What metrics shall we use to include/exclude variables?

27
Automation Algoritms Forward Algorithm Backward Algorithm Some other bespoke method What mechanics will we use to select/exclude variables?

28
Actuarial Pricing in R Any statistical or data analysis process can be implemented in R but we will think specifically about GLMs glm(Claims ~ Location + CarType + Age +..., data = myData, family = poisson(link = “log”), offset = log(Exposure)) Example: But actuarial pricing is also the whole decision making process around the GLM...

29
Automated pricing Process Structure in R Claim Counts analysis Load data from database Carry out pre-specified step algorithm with variable aggregation Variable selection criteria Check variable consistency Decide to reject/accept variable Severity analysisObtain Final Models Continuously writing desired outputs, PDF, log files, documentation, model plots, coefficients etc

30
Automated Actuarial Pricing We need to defined the consolidation structure for categorical variables e.g. Location 1Location 2Location 3Location 4 North N.EastNorth N.West North S.West South S.East South

31
Outputting Results R has perhaps the most extensive choices for outputs of analysisLink to ExcelText files, e.g. CSV etcCharting Output: picture files: jpeg, tiff, png, pdf, etc..Report generation: PDF(Sweave - Latex), WordPowerPoint direct outputPrinting log reports of process

32
Example Process

33

34

35
Effects package effects package from John Fox: http://www.jstatsoft.org/v08/i15/paper

36
Example Process

37
Example Process: Final Model

38
Final Charts

39
Final Model

40
Potential Scheme for analytical process Data residing in some database Connect to R, RODBC, RPostgreSQL, RODM etc. Carry out analysis in R Write results to PDF, any picture format, push to Latex, Excel, CSV, etc

41
My R Actuarial Experiences GLM for risk ranking of policiesGeo-Mapping applications & reporting Ad-hoc simple Bayesian Analysis of claims probability based on previous experiences Extreme value analysis to obtain large claims thresholds for various classesLocation based risk analysisGeneral reporting toolBut there is so much more potential! I was barely scratching the surface R is much more stable that common analytical tools e.g. Database connections and other functionality

42
Advantages of R for GLM Analysis Standard actuarial GLM techniques are available, e.g. splines, interaction terms etc. The best plotting functions of any statistical package More advanced techniques are available, GAM, GMM, GNM, GHMM, MCMC methods – too many packages to list here! Bespoke methods and new actuarial techniques can be readily implemented in R while they are unavailable in standard actuarial software Easy to integrate and fully customisable in any analytical environment Complete array of statistical/analysis tools, clustering, neural nets, GRM, tree models, bootstrapping, Bayesian techniques, ODE/PDE, HMMs, contingency tables, survival analysis, copulas, extreme value analysis, geospatial analysis and visualisation R offers a complete statistical, data processing, and analysis environment

43
Challenges & Opportunities If you are new to R, do something small to begin with test R outIT support for R There is great need for training and generation of material to enable actuarial analysts to use R For mere mortals (like me) the learning curve is tough and the documentation appears ambiguous R & Hadoop and R & OracleSee me later for live R demos

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google