Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Cleaning in Finance and Insurance Risk Modeling Presented by Ulrich Müller Conference “Data in Complex Systems” (GIACS) Palermo, Italy, April 7-9,

Similar presentations


Presentation on theme: "Data Cleaning in Finance and Insurance Risk Modeling Presented by Ulrich Müller Conference “Data in Complex Systems” (GIACS) Palermo, Italy, April 7-9,"— Presentation transcript:

1 Data Cleaning in Finance and Insurance Risk Modeling Presented by Ulrich Müller Conference “Data in Complex Systems” (GIACS) Palermo, Italy, April 7-9, 2008

2 2 Overview  Finance and insurance industry  Risk management, solvency tests, valuation  New requirements for quantitative models and data  Accounting data not always directly applicable to fair- market valuation and risk assessment  Data sources, data errors  Analysis, elimination, substitution of input data  Case study 1: insurance reserving risk  Case study 2: economic scenario generator  Case study 3: high-frequency, real-time data

3 3 Quantitative risk modeling  Finance and insurance industry: quantitative risk management becomes more important, with increased requirements  Risk management, solvency tests, valuation: Regulation of banks: Basel II (Re-)insurance: Solvency II; Swiss Solvency Test (SST) Rating agencies: Enterprise Risk Management (ERM) Analysts (but profits may matter more than quantitative risks)  Quantitative risk modeling Based on historical data (directly or indirectly through calibration) High demand for data quality Risk is about extreme events  statistical measures with high weight of extreme observations “Outliers” matter, methods of robust statistics hardly applicable.

4 4 The problem: Reliable risk estimates from data with possible errors Data source 1 Data source 2 ??? Basic historical data set, with data errors Reliable risk estimates based on reliable data →→ → →  Different data sources for quantitative risk management: Accounting figures Industry data from databases Publicly available market data  Most data sources have errors, every data source needs validation  What is an „error“?  Data may be correct from a certain point of view and yet not be a accurate basis for quantitative risk assessment.

5 5 Why can correct accounting data lead to wrong risk measurement?  Accounting figures: Well-defined standards; audited processes (Sarbanes-Oxley) Correctness in the legal sense (not economic) Future risks are often excluded, “off balance sheet” Book values often deviate from market values False bookings may not be reversed before the next quarter  Market values (fair values), relevant for risk assessment: Preferably derived from known transaction prices, then mark-to- market, then reality-based “mark-to-model“ All factors included: price of risk (market, credit, liquidity risk)  The two worlds occasionally meet: valuation for impairment test (market value ≥ book value); new accounting standards

6 6 Case study 1: Data in the risk analysis of insurance loss reserves  Loss reserves of a (re)insurance company: Amount of reserves = Expected size all of claims to be paid in the future, given all the existing “earned” (≈ old) contracts Reserves are best estimates. Estimates may need correction based on new claim information Upward correction of reserves  loss, balance sheet hit Reserve risk = risk of upward correction of loss reserves  Reserve risk is a dominating risk type, often exceeding the risks due to new business (e.g. future catastrophes) and invested asset risk  Reserve risks can be assessed quantitatively.  For assessing reserve risks, we use historical claim data

7 7 Triangle analysis of cumulative insurance claims Development year (years since the underwriting of contracts)  Under- writing year (when contracts were written) ↓  This triangle is the basis of further analysis. Here: cumulative reported claims. There are other types (claims paid, …).

8 8 Measuring the stochastic behavior of historical claim reserves: Mack’s method  Chain-ladder method: computing the average development of claims over the years  Result: Typical year-to-year development factors for claims (  patterns)  Method by Mack (1993): computing local deviations from these average development factors  Variance of those local deviations  estimate of reserve risk  Very sensitive to local data errors  overestimation of risk  As most quantitative risk estimates, this cannot be robust statistics  Correctness of data is very important, data cleaning needed  Many triangles for different lines of business  thousands of data points to be analyzed in the search for data errors

9 9 Chain ladder and Mack's method: some formulas  Development factor: factor between values of subsequent development years in the triangle, f i, j = L i, j+1 / L i, j  Chain ladder factor: the mean development factor f j for all underwriting years i  Local deviation from the chain ladder factor: f i, j - f j  This local deviation can be large in case of data errors.  Method by Mack (1993): Estimate of reserve risk = Weighted variance σ² of the local deviations, over the whole triangle σ² = Σ L i, j ( f i, j - f j )² = Σ L i, j ( L i, j+1 / L i, j - f j )² Bad values of f i, j - f j enter this calculation in squared form very sensitive to data errors

10 10 Data errors = “outliers”?  Simple error detection: identifying the largest local deviations from typical development factors  The largest deviations are sometimes aberrant: “outliers”  These outliers may reflect reality. Do not automatically filter them away!  Outlier search is just good enough for finding suspicious cases  Further confirmation needed to identify a suspicious case as an error.  Further error identification through human effort, e.g. investigation of historical booking errors sophisticated mathematical criteria  automatic data cleaning

11 11 Development of cumulative reported claims for one underwriting year of one line of business False booking in development year 11, corrected in subsequent year 12. All claim reports are cumulative (since underwriting of contracts). ↑↓

12 12 Data error criterion: booking errors immediately corrected in the subsequent period?  A too high reserve booking for a contract cannot be easily corrected, once a quarterly or an annual report has been finalized and audited  Such errors will typically be corrected in the subsequent period   Data filtering criterion: A suspicious claim booking was most probably wrong if it was reversed in the subsequent period  This has been confirmed in cases investigated by experts   An automated data cleaning procedure based on this criterion  Development factors affected by this error are eliminated from Mack’s reserving risk analysis  data gap  Mack’s method works with data gaps. In other cases: replace bad observations by best estimates for the correct ones.

13 13 Case study 2: Economic Scenario Generator (ESG)  Generating economic scenarios for risk management and other purposes.  Non-parametric ESG: Direct algorithm for simulation of future economic developments Directly using historical data (not just in prior calibration) Method used here: bootstrapping (or resampling) SCOR’s ESG based on bootstrapping has some secondary elements with parameters: “semi-parametric”  This scenario generator heavily relies on historical economic data  Completeness of historical data is vital.  Correctness of historical data is vital, especially for risk analysis which is strongly affected by extreme values in the data.

14 14 Economic Scenario Generator (ESG): Motivation, purpose, features Consistent scenarios for the future of the economy, needed for:  Modeling assets and liabilities affected by economy  Expected returns, risks, full distributions  Business decisions (incl. asset allocation, hedging of risks)  Many economic variables: yield curves, asset classes, inflation, …  6 currency zones (flexible)  Correlations, dependencies between all economic variables  Heavy tails of distributions  Realistic behavior of autoregressive volatility clusters  Realistic, arbitrage-free yield-curve behavior  Short-term and long-term scenarios (month/quarter … 40 years)

15 15 Influence of the economy on an insurance company Economy Interest Rates Value of bond investments government bonds corporate bonds (Re)Insurance business life business credit (re)insurance … Investment Indices Value of equity investments hedge fund investments real estate investments … Inflation Severity of (re)insurance losses prices of houses and goods prices of services value of stabilisation (index) clauses in reinsurance treaties … Credit cycle Severity of the credit and surety business Value of corporate bonds (defaults and credit spreads) Defaults of reinsurers or retro- cessionaires

16 16 Using economic scenarios to measure risks and determine an asset management strategy Economic Indicator (EI) Investments GDPFX Equity indices Yield curves... LoB1 LoB2 LoB3 Cash flow Accounting Liabilities Assets Economy LoB4 LoB5 LoB6 LoB7 LoB8 LoB9 LoB10 LoB11

17 17 ESG based on bootstrapping Our implementation: Economic Scenario Generator (ESG) based on bootstrapping Bootstrapping the behavior of historical data for simulating the future Bootstrapping is a method that automatically fulfills many requirements, e.g. realistic dependencies between variables. Some variables need additional modeling (“filtered bootstrap”): Tail correction for modeling heavy tails (beyond the quantiles of historical data) GARCH models for autoregressive clustering of volatility Yield curve preprocessing (using forward interest rates) in order to obtain arbitrage-free, realistic behavior. Weak mean reversion of some variables (interest rates, inflation, …) in order to obtain realistic long-term behavior.

18 18 Economic variables: historical data vectors as a basis Data vector economic variables USD Equity USD CPI USD GDP USD IR 3m USD IR 6m USD IR 1y USD IR 30y USD Hedge Funds EUR Equity EUR CPI EUR GDP EUR IR 3m EUR IR 6m EUR IR 1y EUR IR 30y EUR FX Rate GBP Equity GBP... GBP FX Rate JPY Equity JPY... JPY FX Rate AUD Equity AUD... AUD FX Rate USD IR.. EUR IR.. CHF Equity CHF... CHF FX Rate CHF Real Estate 3337.41 120.9 11262 0.90% 0.99% 1.22% 5.01% 286.9 2498.85 114.2 1584316 2.04% 2.06% 2.18% 4.95% 1.2591 5577.71... 1.7858 1064.61... 0.0093088 2617.79... 0.75200.. 1540.54... 0.80694 176.55 time 31.12.2003 3384.48 123.0 11448 0.92% 1.01% 1.11% 4.70% 296.7 2578.43 115.0 n.a. 1.89% 1.85% 1.87% 4.78% 1.2316 5531.21... 1.8463 1190.33... 0.0095946 2744.92... 0.76695.. 1589.48... 0.79005 178.14 31.03.2004  Currencies: AUD, CHF, EUR, GBP, JPY, USD  Economic variables: Equity index (MSCI) Foreign exchange rate Inflation (CPI) Gross domestic product Risk-free yield curves Hedge funds indices (for USD) Real Estate (for CHF, USD) MBS and bond indices (derived from yield curves) Credit cycle index (derived from yield curves incl. corporate)

19 19 The bootstrapping method: data sample, innovations, simulation time Historic data vectors economic variables Future simulated data vectors economic variables time Innovation vectors Last known vector scenarios time economic variables USD equity EUR FX rate GBP 5 year IR

20 20 Economic Scenario Generator Application: Functionality ALM Information Backbone FED Non--Bloomberg time series Economic raw data Enhanced time series Economic Scenarios Bloomberg Manual Input Analysis, inter - and extrapolation, statistical tests ESG Scenario Igloo TM Import Igloo TM interface Reporting Simulation Post-processing ↑ Data cleaning as one module in the architecture of the application

21 21 ESG Application: Choice of time series

22 22 ESG Application: Many time series values, analyzing completeness and correctness

23 23 Input data validation and completion in the ESG Application  ~50 quarterly observations for hundreds of time series  thousands of historical values to be validated, with regular addition of new data  Simple detection of suspicious values, identifying large deviations from expectation. Criterion: deviation > 4 times the standard deviation  Suspicious values have to be evaluated by human experts here  Eliminated bad values leave a data gap  Other data gaps originate from data sources (e.g.delayed reporting of inflation figures)  The method requires completeness of input data  gap filling  Need data interpolation and extrapolation univariate gap filling algorithms, using one time series only multivariate gap filling, e.g. estimating missing yield values of one maturity based on yield values of other maturities

24 24 Result example: Simulated yield curves, simulation 2007Q3  end 2008

25 25 Case study 3: High-frequency ticker data  Tick-by-tick data, irregularly spaced, produced and collected in real time by banks, trading platforms or data vendors  Example: foreign exchange ticks, intra-day trading  Why to analyze such high-frequency data? Measuring „Realized Volatility” with high precision High-frequency, real-time risk assessment Automated information services or trading algorithms Research (pioneer work by Olsen & Associates in the 1990s)  Time series with huge numbers of observations: thousands per day, millions in years  Too many data for human validation and cleaning, need automated algorithms

26 26 Special requirements for the analysis and the filtering of high-frequency data  Returns of market prices often have distributions with heavy tails and fluctuating volatility levels  problem: „outliers“ may be correct. Need to a good model of true behavior to identify wrong data  Real-time data cleaning is hard: decision on validity needed before the arrival of new confirming observations  Time matters: The first tick after a period of missing data is especially hard to validate  Validation is easier when shifts are rapidly confirmed by many new ticks  Introduce an adaptive filter that learns from the data while validating: The cleaning algorithm carries its own statistics calculator Data cleaning parameters continuously adapt to statistics Algorithm survives shifts in market behavior, structural breaks

27 27 Conclusion: Recommended procedure for data cleaning Data source 1Data source 2 Evaluation of Outliers by Human Experts  Error Identification Sophisticated Statistical Criteria, Adaptive Filtering  Automatic Error Detection Validated Data Set, Possibly with Data Gaps Data Gap Filling (Intrapolation, Extrapolation), if Required Basic Historical Data Set Simple Statistical Outlier Analysis  Suspicious Data Main Data Analysis: Robust Methods if Possible (Hardly Possible in Risk Assessment) ↓↓ ↓↓ ↓ ↓ →→ ↑

28 28 Thank you …... for your attention!


Download ppt "Data Cleaning in Finance and Insurance Risk Modeling Presented by Ulrich Müller Conference “Data in Complex Systems” (GIACS) Palermo, Italy, April 7-9,"

Similar presentations


Ads by Google