1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
The Use and Interpretation of the Constant Term
Choosing a Functional Form
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Linear Regression.
Multiple Linear Regression Model
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Chapter 4 Multiple Regression.
11 Prior Distribution Elicitation for Generalized Linear and Piecewise-Linear Models Paul Garthwaite and Fadlalla Elfadaly Open University.
Multiple Regression Models
Correlation 2 Computations, and the best fitting line.
Visual Recognition Tutorial
Topic 3: Regression.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Chapter 13: Inference in Regression
Correlation and Linear Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Quantitative Skills 1: Graphing
Benjamin Blandford, PhD University of Kentucky Kentucky Transportation Center Michael Shouse, PhD University of Southern Illinois.
Econ 3790: Business and Economics Statistics
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Statistical Applications Binominal and Poisson’s Probability distributions E ( x ) =  =  xf ( x )
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Environmental Modeling Advanced Weighting of GIS Layers (2)
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Generalized Linear Models (GLMs) and Their Applications.
fall ‘ 97Principles of MicroeconomicsSlide 1 This is a PowerPoint presentation on fundamental math tools that are useful in principles of economics. A.
Statistics and Nutrient Levels Julie Stahli Metro Wastewater Reclamation District March 2010.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
12 Inferential Analysis.
Slides by JOHN LOUCKS St. Edward’s University.
Lecture Slides Elementary Statistics Thirteenth Edition
Statistical Methods For Engineers
Undergraduated Econometrics
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
12 Inferential Analysis.
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Presentation transcript:

1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi

2 Introduction/Plan This work arose from a practical problem in logistic regression. The theory extends easily to elicit opinion about the link function of any glm. I will outline the method for glm’s in general. The motivating problem has some additional (commonly occurring) structure that the elicitation method exploits. Interactive computing is used to elicit opinion. Prior models can be formed that aim to allow a small amount of data to correct some potential systematic biases in assessments. Results for the practical problem will be given.

3 Motivating Example The task is to model the habitat distribution of fauna in south-east Queensland - bats, birds, mammals etc. Available information: Environmental attributes on a GIS database. Sample information of presence/absence at sites. Background knowledge of ecologists. The ecologists have seen the bat (say) in various locations but this information is difficult to use in a traditional statistical analysis because it has not been obtained from any sampling scheme. Prob(presence) = f (environmental attributes)

4 Continuous variables: elevation; quarterly rainfall and temperatures; canopy cover; slope; aspect. Factors: land type; vegetation; forest structure; logging; grazing; etc. A workshop with 15 ecologists indicated unimodal or monotic relationships independence between attributes in their effect on the probability of presence.

5 Generalised Linear Model (glm) The model has the form where g[.] is the link function. For logistic regression, and is the probability of presence. is the vector of predictor variables. From the ith predictor variable,, a vector of explanatory variables is constructed such that we have the linear equation

6 Define: and then is a linear function of

7 Factors:One factor level (the best one, say) is chosen as the reference level. Each other level is given a dummy 0/1 variable that equals 1 for that level and 0 for all other levels:

8 The sampling model is Let For the prior distribution we put The values of the parameters in red must be chosen by the expert to represent his or her opinions.

9 Assessing medians and quartiles. These are fundamental assessment tasks the expert performs. How far is it from Aberdeen to Southampton? 25% 25% 25% 25% 470m 525m 600miles The median (blue) is assessed first and then the lower and upper quartiles (red). Ecologists were given practice at performing these tasks in preparatory training and explanation.

10 Eliciting and and. Also, at the reference point. The expert assesses, the median of at this point. (For logistic regression is the probability of presence.) We put. The expert also assesses the lower and upper quartiles and. We put

11 Eliciting and is determined from the unconditional assessments. is determined from assessments conditional on. equalling.

12 Eliciting and for factors. Put. Then enabling to be estimated. [Go to program]

13 Assessments to obtain Conditional on the first three line segments being correct, the dashed lines are quartiles of where the line might continue.

14 Conditional Assessments for Factors The circles indicate conditions. Dotted horizontal bars are previous assessments. Solid bars are current assessments and must be within the dotted bars if is positive-definite. [Go to program]

15 Calculating Iterative calculations determine. Start by estimating the lower-right scalar element of, and call it. Then estimate the lower-right of and call it, etc. If and is positive-definite, then so is provided.

16 Alternative Prior Models Individuals can show systematic bias in their subjective assessments. The aim is to form prior models that allow a small amount of data to largely correct some potential biases. Prior 2 The marginal distribution of is diffuse, rather than. The conditional distribution of is assumed to be unchanged: This allows for error in specifying the origin of the Y-axis.

17 Prior 3 Prior 3 replaces the scale for Y with some other linear scale. is again given a diffuse distribution and the conditional distribution of is taken to be is also given a diffuse distribution. Prior 4 This is the same as Prior 3, except it allows for systematic bias in quartile assessments by putting are given diffuse distributions.

18 Cross-validation and scoring The usefulness of a prior distribution can be objectively examined by using cross-validation and a scoring rule. For the cross-validation the data for a species were divided into four sets. Each set in turn was omitted and the remaining sets used to form prediction equations. Prediction equations were applied to the omitted set and squared error loss determined: where the summation is over all sites in the omitted (validation) set, is the probability of presence given by the prediction equation, and is a 0/1 dummy variable indicating absence/presence. This defines a proper scoring rule.

19 Results for little bent-wing bat _______________________________________

20

21 ____________________________________________

22 Concluding Comments The elicitaion method described here is able to handle large problems by: (a) using interactive graphics (b) suggesting values to the expert that might represent his or her opinions. It is believed that the use of graphs can improve the quality of the assessed distributions. Cross-validation can demonstrate clearly the gain from using prior knowledge, when there is such gain. Additional parameters in the prior model can allow limited data to be used more effectively.