Presentation is loading. Please wait.

Presentation is loading. Please wait.

R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.

Similar presentations


Presentation on theme: "R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables."— Presentation transcript:

1 R Programming/ Binomial Models Shinichiro Suna

2 Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.

3 Contents 1 Logit model 1.1 Fake data simulations 1.2 Maximum likelihood estimation 1.3 Bayesian estimation 2 Probit model 2.1 Fake data simulations 2.2 Maximum likelihood estimation 2.3 Bayesian estimation

4 1. Logit model (Logistic Regession Analysis) Logit model (Logistic regression analysis) uses the logistic function. When there are several explanatory variables, F(x) = 1 / 1+ exp( - (B0 + B1*X1 + B2*X2 + ….) )

5 1. Fake data simulations x <- 1 + rnorm(1000,1) xbeta <- -1 + (x* 1) proba <- exp(xbeta)/(1 + exp(xbeta)) y <- ifelse(runif(1000,0,1) < proba,1,0) table(y) df <- data.frame(y, x)

6 1.2. Maximum likelihood estimation The standard way to estimate a logit model is glm() function with family binomial and link logit. (Fitting Generalized Linear Models)

7 1.2. Maximum likelihood estimation # Fitting Generalized Linear Models res <- glm(y ~ x, family = binomial(link=logit)) names(res) summary(res) # results confint(res) # confindence intervals exp(res$coefficients) # odds ratio exp(confint(res)) # Confidence intervals for odds ratio (delta method)

8 1.2. Maximum likelihood estimation > summary(res) # results Call: glm(formula = y ~ x, family = binomial(link = logit)) Deviance Residuals: Min 1Q Median 3Q Max -2.1406 -1.0044 0.5417 0.8104 1.7770 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.3561 0.5691 -2.383 0.017179 * x 1.1287 0.2938 3.841 0.000122 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 125.37 on 99 degrees of freedom Residual deviance: 106.71 on 98 degrees of freedom AIC: 110.71 Number of Fisher Scoring iterations: 4

9 1. 3. Bayesian estimation # Data generating process x <- 1 + rnorm(1000,1) xbeta <- -1 + (x* 1) proba <- exp(xbeta)/(1 + exp(xbeta)) y <- ifelse(runif(1000,0,1) < proba,1,0) table(y) # Markov Chain Monte Carlo for Logistic Regression library(MCMCpack) res <- MCMClogit(y ~ x) summary(res) plot(res)

10 1. 3. Bayesian estimation > summary(res) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) -2.104 0.7199 0.007199 0.02239 x 1.491 0.3652 0.003652 0.01139 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) -3.6302 -2.570 -2.065 -1.618 -0.7416 x 0.8233 1.236 1.472 1.726 2.2805

11

12 2. Probit model The probit model is a type of regression where the dependent variable can only take two values. The name is from probability + unit.

13 2. Probit model Probit model uses the cumulative density function of a normal distribution.

14 2.1 Probit model - fake data simulation - # Generating Fake Data x1 <- 1 + rnorm(1000) x2 <- -1 + x1 + rnorm(1000) xbeta <- -1 + x1 + x2 proba <- pnorm(xbeta) y <- ifelse(runif(1000,0,1) < proba,1,0) mydat <- data.frame(y,x1,x2) table(y)

15 2. 2. Maximum likelihood # Fitting Generalized Linear Models res <- glm(y ~ x1 + x2, family = binomial(link=probit), data = mydat) names(res) summary(res) exp(res$coefficients) # odds ratio exp(confint(res)) # Confidence intervals for odds ratio (delta method)

16 2. 2. Maximum likelihood > summary(res) Call: glm(formula = y ~ x1 + x2, family = binomial(link = probit), data = mydat) Deviance Residuals: Min 1Q Median 3Q Max -2.06740 -0.17208 -0.00053 0.10700 1.96541 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.2324 0.4029 -3.059 0.00222 ** x1 1.1163 0.3495 3.194 0.00140 ** x2 1.5917 0.3751 4.244 2.2e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 135.372 on 99 degrees of freedom Residual deviance: 38.705 on 97 degrees of freedom AIC: 44.705 Number of Fisher Scoring iterations: 9

17 2. 2. Maximum likelihood library("sampleSelection") Res <- probit(y ~ x1 + x2, data = mydat) summary(res)

18 2. 2. Maximum likelihood > summary(res) -------------------------------------------- Probit binary choice model/Maximum Likelihood estimation Newton-Raphson maximisation, 7 iterations Return code 1: gradient close to zero Log-Likelihood: -19.35239 Model: Y == '1' in contrary to '0' 100 observations (59 'negative' and 41 'positive') and 3 free parameters (df = 97) Estimates: Estimate Std. error t value Pr(> t) (Intercept) -1.23237 0.41293 -2.9845 0.002841 ** x1 1.11631 0.36221 3.0819 0.002057 ** x2 1.59170 0.38771 4.1054 4.037e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Significance test: chi2(2) = 96.66693 (p=1.021042e-21) --------------------------------------------

19 2. 3. Bayesian estimation # Markov Chain Monte Carlo for Probit Regression library("MCMCpack") post <- MCMCprobit(y ~ x1 + x2, data = mydat) summary(post) plot(post)

20 2. 3. Bayesian estimation > summary(post) Iterations = 1001:11000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 10000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE (Intercept) -1.387 0.4304 0.004304 0.03018 x1 1.244 0.3825 0.003825 0.02649 x2 1.771 0.4310 0.004310 0.04322 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% (Intercept) -2.3091 -1.6606 -1.359 -1.092 -0.5912 x1 0.5472 0.9813 1.219 1.491 2.0493 x2 1.0402 1.4645 1.728 2.027 2.7537


Download ppt "R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables."

Similar presentations


Ads by Google