Two Distribution Families for Modelling Over- and Underdispersed Binomial Frequencies Feirer V., Hirn U., Friedl H., Bauer W. Institute for Paper, Pulp and Fiber Technology & Institute for Statistics Graz University of Technology

Agenda Motivation Generalized Linear Models Multiplicative Binomial Distribution Double Binomial Distribution Application of the Two Distributions Summary

Motivation consider the problem of successful ink transfer on paper explain occurrence of unprinted regions …part of a larger, industry-funded project at the IPZ. (No. of datapoints in sample: roughly 9 10 6 sample size: 3 6 mm²)

Predictor Variables TopographyFormation…the way fibres are arranged

Response true colour image

GENERALIZED LINEAR MODELS Basics

Distribution of the Response response model for here …part of the Exponential Family withthe probability for successful ink transmission

the Generalized Linear Model* model for is linked to the mean by * Nelder & Wedderburn (1972). Generalized Linear Models. Journal of the Royal Statistical Society, 135, 370-384 linear predictor advances over a linear model: distribution of the relative frequencies … member of the Exponential Family mean lies between 0 and 1

Model Deviance Deviance = -2 × ( maximized log-likelihood of considered model – maximized log-likelihood of saturated model ) under certain regularity conditions, …a test for goodness-of-fit ifUnderdispersion Variance of data smaller than assumed by the model ifOverdispersion Variance of data larger than assumed by the model

Deviances of the Printability Datasets distinct deviations from a binomial variance! few many unprinted areas …values from 11 different data sets

MULTIPLICATIVE BINOMIAL DISTRIBUTION A Generalization of the Binomial Distribution

Definition *Altham (1978). Two Generalizations of the Binomial Distribution. Journal of the Royal Statistical Society, 27, 162-197 considers litters of rabbits animals within one litter are treated with the same dosis of a certain drug n… litter size y… number of surviving animals outcomes from animals from within one litter are not mutually independent Altham introduces an interaction parameter ω introduced by Altham* as „multiplicative generalization of the binomial distribution"

Properties Member of the 2-parameter Exponential Family For ω=1, it corresponds to the Binomial Distribution For n=1, it reduces to the Bernoulli distribution

Comparison With Classic Binomial pdf n = 36 = 0.8 ω=1 gives the classic binomial distribution

Comparison of the Variances n = 36 ω=1 gives the classic binomial distribution

Integration into GLM Context log-likelihood function of distribution logit-link 0 < < 1 ω > 0 log-linear link

DOUBLE BINOMIAL DISTRIBUTION A Second Generalization of the Binomial Distribution

Definition *Efron (1986). Double Exponential Families and their Use in Generalized Linear Regression. Journal of the American Statistical Association, 81, 709-721 introduced by Efron* as part of the Double Exponential Family second parameter allows variation of variance: variance is smaller than binomial if 0< <1 and larger than binomial if >1 =1 gives the classic binomial distribution

Comparison With Classic Binomial pdf n = 36 = 0.8 =1 gives the classic binomial distribution

Comparison of the Variances n = 36 =1 gives the classic binomial distribution

Integration into GLM Context member of the 2-parameter exponential family log-likelihood function of distribution 0 < < 1 > 0 logit-link log-linear link

AN APPLICATION The Printability Dataset

Response and Explanatory Variables occurrrence of unprinted areas… ~ explained by… topography + formation

Comparison of Three Models Distributionclassic binomial multiplicative binomial double binomial 17071845211632 DoF24832482 661458364117 DoF24812480 AIC662058454125

Comparison of the Means

Comparison of the Means

Comparison of the Means The second parameter influences the mean, too.

Comparison of the Standard Deviations

Comparison of the Standard Deviations

Comparison of the Variances binomial Std. Dev. at n=36: cannot be larger than 3 empirical Std. Deviations: up to 11 Multiplicative and Double Binomial Standard Deviations fit much better to empirical results

Summary Two generalizations of the binomial distribution might compensate over- or underdispersion in the case of classic binomial distribution. Multiplicative Binomial Distribution (Altham, 1978) second parameter ω in GLM context:model with the logistic link and ω with the log-linear link function

Summary 2 Double Binomial Distribution (Efron, 1986) second parameter in GLM context:model with the logistic link and with the log-linear link function

Thank You for Your Attention

