On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.

Slides:



Advertisements
Similar presentations
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.
Outline input analysis input analyzer of ARENA parameter estimation
Parametric/Nonparametric Tests. Chi-Square Test It is a technique through the use of which it is possible for all researchers to:  test the goodness.
Sampling Distributions (§ )
Objectives (BPS chapter 24)
Visual Recognition Tutorial
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Simulation Modeling and Analysis
Evaluating Hypotheses
Statistical inference form observational data Parameter estimation: Method of moments Use the data you have to calculate first and second moment To fit.
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
3-1 Introduction Experiment Random Random experiment.
1 Fundamentals of Reliability Engineering and Applications Dr. E. A. Elsayed Department of Industrial and Systems Engineering Rutgers University
Introduction Before… Next…
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.
Hypothesis Testing:.
Statistical inference: confidence intervals and hypothesis testing.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Statistical Decision Theory
Modeling and Simulation CS 313
Traffic Modeling.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
BIOL 582 Lecture Set 17 Analysis of frequency and categorical data Part II: Goodness of Fit Tests for Continuous Frequency Distributions; Tests of Independence.
1 Statistical Distribution Fitting Dr. Jason Merrick.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 Basic probability theory Professor Jørn Vatn. 2 Event Probability relates to events Let as an example A be the event that there is an operator error.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004.
Confidence intervals and hypothesis testing Petter Mostad
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Learning Simio Chapter 10 Analyzing Input Data
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Stracener_EMIS 7305/5305_Spr08_ Reliability Data Analysis and Model Selection Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Statistical Data Analysis 2011/2012 M. de Gunst Lecture 3.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 6 Sampling and Sampling Distributions
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution.
(5) Notes on the Least Squares Estimate
Chapter 2 Simple Comparative Experiments
Introduction to Instrumentation Engineering
Discrete Event Simulation - 4
Modelling Input Data Chapter5.
Sampling Distributions (§ )
Presentation transcript:

On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May 2012

OUTLINE Introduction Graphical Methods Likelihood Method Kolmogorov Test Chi-Squared Tests Tests based on Measures

After fitting a distribution model to a data set when performing life data analysis, we are often interested in diagnosing the model's fit or comparing the fit of different distributions. In addition to the engineering knowledge that should always govern the choice of a distribution model, there are many statistical tools that can help in deciding whether or not a distribution model is a good choice from a statistical point of view. These tools can also be used to compare the fit of different distributions.

Reliability Terms Mean Time To Failure (MTTF) for non-repairable systems Mean Time Between Failures for repairable systems (MTBF) Reliability Probability (survival) R(t) Failure Probability (cumulative density function) F(t)=1-R(t) Failure Probability Density f(t) Failure Rate (hazard rate) λ(t) Mean residual life (MRL)

Time Distributions (Models) of the Failure Density Exponential Distribution Very commonly used, even in cases to which it does not apply (simple); Applications: Electronics, mechanical components etc. Normal Distribution Very straightforward and widely used; Applications: Electronics, mechanical components etc. Lognormal Distribution Very powerful and can be applied to describe various failure processes; Applications: Electronics, material, structure etc. Weibull Distribution Very powerful and can be applied to describe various failure processes; Applications: Electronics, mechanical components, material, structure etc.

Probability Plots – Graphical Validation Probability plotting (e.g. Q-Q plot) is a graphical method that allows a visual assessment of the model fit. Once the model parameters have been estimated, the probability plot can be created. The next figure shows a comparison of the probability plots of the two choices (Weibulll & Exponential) using the data set.

Problems typical with reliability & survival data Censoring when the observation period ends, not all units have failed - some are survivors) Lack of Failures if there is too much censoring, even though a large number of units may be under observation, the information in the data is limited due to the lack of enough failures) Practical difficulty when planning reliability assessment tests and analyzing failure data.

Type I Censoring – Right Censoring n items are observed during a fixed time period [0, T]. The number of failures r is random. n-r items (also random) will be in operation (censored) at the end of the time period. Also called "right censoring" since the failure times to the right (i.e., larger than T) are missing.

Type II Censoring We run the test until we observe exactly r failures. The time period T is random. n-r units are in operation (nonrandom). In Type II censoring we know in advance how many failure times we have - this helps when planning adequate tests. However, an open-ended random test time is generally impractical from a management point of view and this type of testing is rarely seen.

Readout or Interval Censored Data Sometimes exact times of failure are not known; only an interval of time in which the failure occurred is recorded.

Likelihood Value Use the MLE (Maximum Likelihood Estimation) method to estimate the parameters. Then, the likelihood value can be used to assess the fit: The distribution with the largest L value is the best fit.

Table: Comparing the log-likelihood value for comparing the fit of two distributions. The log-likelihood value for the Weibull distribution is greater than that for the exponential distribution (i.e. the Weibull distribution is statistically a better fit).

Modified Kolmogorov-Smirnov (KS) Test The standard (KS) test is used for continuous distributions with known parameters. The Modified KS test is used when the parameters are unknown and need to be estimated. For N failure times, we define to be the empirical distribution function. The Modified KS test uses the maximum of the absolute difference between and the fitted cumulative distribution function, Q(t):

The distribution of the Modified KS test in the case of the null hypothesis (i.e. data set drawn from the fitted distribution) can be calculated. The test returns the probability that. A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set. The value for the Weibull distribution is smaller thus: the Weibull distribution is statistically a better fit.

Chi-Squared Test The chi-squared test relies on the idea of grouping the data into a suitable number of intervals. Grouping involves a loss of information, and there is also often considerable arbitrariness in how the intervals are chosen. The optimal number k of intervals for a sample of size N may be estimated from Sturges' Rule Let Ni be the number of data points in the i interval and ni the expected number according to the fitted distribution. The chi- squared statistic is

A high probability value, close to 1, indicates that there is a significant difference between the theoretical distribution and the data set. Table: Comparing two distributions using the chi-squared test The value for the Weibull distribution is smaller (i.e. the Weibull distribution is statistically a better fit).

Empirical model fitting – Distribution Free (Kaplan-Meier) approach No underlying model (Weibull, lognormal etc) is assumed K-M estimation is an empirical (non-parametric) procedure Exact times of failure are required

Kullback-Leibler: Matusita: Kagan: Csiszar: Hellinger: Cressie and Read: Methods based on Measures

21 The BHHJ Power Divergence [Basu et. al (1998)] The BHHJ family reduces to the Kullback-Leibler divergence for α↓0 and to the square of L 2 distance for α = 1. where (1.5.4)

22, (1.5.5) Discrete cases: Distance between 2 binomial/multinomial

The AIC Model Selection Criterion For the construction of AIC, Akaike used the K-L measure Akaike proposed the evaluation of the 2 nd term (expected LogLik) using minus twice the mean expected LogLik Finally, he provided an unbiased estimator of the expected LogLik:

The AIC Model Selection Criterion where p is the number of unknown parameters involved in the model/distribution. In our case: Weibull model: AIC=2x = Exponential : AIC=2x = The Weibull fit is better.

Other Model Selection Methods where p is the number of unknown parameters involved in the model/distribution.

The DIC criterion is derived based on the BHHJ measure. The DIC Model Selection

Modified Divergence Information Criterion (MDIC) where.

28 Tests based on Measures

29 Goodness of Fit Tests

30 Goodness of Fit Tests

31 Compare BHHJ test with the goodness of fit tests based on the Kullback measure (KL), the Kagan measure (Pearson chi-square test), the Matusita measure (Mat), and the Cressie and Read measure (CR). Three different values of the index α are used: α = 0.01, 0.05 & Both the power and the type I error are investigated. Simulated results: A trinomial distribution is used with n=150 and a number of simulations have been created.

32 Goodness of Fit Tests

33 Goodness of Fit Tests

34 % of rejections when H o : M(150, 0.2, 0.6, 0.2) holds % of rejections when H 1 : M(150, 0.2, 0.7, 0.1) holds POWER vs ‘SIZE’ of the TEST Goodness of Fit Tests