Presentation on theme: "1 Elicitation of Expert Opinion as a Prior Distribution Paul Garthwaite Open University, UK."— Presentation transcript:
1 Elicitation of Expert Opinion as a Prior Distribution Paul Garthwaite Open University, UK
2 Why use expert opinion? Cant opinion convey bias as well as knowledge? The expert may want to quantify his/her opinions for her own use. e.g. An industrial chemist wanting to design experiments. Available data may not be suitable as a statistical input. e.g. Sightings of a rare and endangered species. Data is scarce or absent and we must extrapolate from related information. e.g. risk of unlikely events.
3 In what form do we want expert opinion? A probability distribution is often the most useful. e.g. Normal(μ, Σ) or Beta(α, β). To estimate parameters the kind of quantities that must be determined from the experts opinions are: Point estimates: means, medians and/or modes. Quantiles or variances that give the experts confidence in her point estimates. Covariances that quantify relationships between the point estimates. Degrees of freedom parameters. (Others. e.g. point estimates of correlations)
4 How do we quantify our opinion? There are a few strategies or heuristics that we typically employ to quantify our uncertainty. The heuristics work fairly but have some well- documented problems.
5 Judgement by representativeness Mr X is meticulous, introverted, meek and solemn. What is the probability that his occupation is: Farmer, salesman, pilot, librarian, doctor? People ignore base rates. (Kahneman and Tversky, 1972). Linda is thirty-one years old, single, outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Is it more probable that: (a) Linda is a bank teller; or (b) Linda is a bank teller and is active in the feminist movement?
6 Anchoring and adjustment People start from one value (the anchor) and adjust that to form their estimate. Adjustment is usually insufficient. Anchor Estimate Actual Washington elected president Boiling point on Mount Everest ( o F) Freezing point of vodka ( o F) Lowest body temperature ( o F) Highest body temperature ( o F) Gestation period of elephant (months) 9 |7.4| 22 (Epley &Gilovich, 2004, 2006)
7 Another example (Jacowitz & Kahneman, 1995) Is the population of Chicago more or less than 200,000? Estimate the population. Is the population of Chicago more or less than 5 million? Estimate the population. The first question gave Chicago a smaller population.
8 Almost any information seems to distort judgements. Asked GPs to draw graph indicating disease-free survival time of patients undergoing treatment for prostate cancer. They gave assessments on a graph whose vertical scale extended to 40 years. 40 years was thought to be so obviously too big as to be non- informative. (Median:14.9 years). They were also asked to assess the probabilities of surviving more that 3 years, 5 years and 10 years. This suggested that 10 years is quite a long time. (Median: 6.2 years).
9 Conservatism: When people revise their opinions after getting data, they do so insufficiently. (Edwards and Phillips, 1964). Bookbags and poker-chips experiment. Choose a bag by spinning a coin. Sample of 15 contained 10 blue chips and 5 red chips. What is the probability we chose Bag A? BAG A 70% Blue 30% Red BAG B 70% Red 30% Blue
10 We have difficulty quantifying our opinion when small probabilities are involved. Alpert and Raiffa (1969). How many cars were imported into the U.S. in 1969? Make a high estimate such that you feel there is only a 1% probability the true answer would exceed this estimate. Make a low estimate such that you feel there is only a 1% probability the true answer would be below this estimate. 43% of assessments produced surprises. Training reduced the figure to 23%.
11 Elicitation methods for a proportion Winkler (1967) gave four methods. Example: Let p be the proportion of students at the University of Chicago who wear glasses. It is common to assume the prior distribution for p is a beta distribution, say beta(α, β). Quantifying opinion about p reduces to finding values for α and β.
12 Hypothetical future sample Elicit a point estimate of the proportion: let p 1 denote the assessment. Give a hypothetical sample: Suppose a random sample of 50 students were taken and 20 of them wear glasses. Elicit an updated estimate of the proportion, say p 2. Estimate the parameters α and β from p 1 and p 2. Could do this several times for different hypothetical data and form some average of the different estimates of α and β.
13 Equivalent prior sample Can you determine two numbers r and n such that your knowledge would be roughly equivalent to having observed exactly r students who wear glasses in random sample of n University of Chicago students, assuming you had very little knowledge about this before seeing the sample? People typically give a value for n that is too large.
14 Variable interval method This asks the expert to specify her median estimate of p and one or more other quantiles. Which quantiles? Lower and upper quartiles are the most common – they can be assessed by the method of bisection. L M U 25% 25% 25% 25% _________________________________
15 Tertile assessments Barclay and Peterson (1973) suggested assessing tertiles rather than quartiles. (They suggested this should be done before a median was assessed so that the median would not be an anchor.) 33% 33% 33% _____________________________________________________________________________________________ Garthwaite and OHagan (2000) found tertiles were better than quartiles, giving larger variances.
16 Fixed interval method With this method the endpoints of the intervals are fixed and the expert specifies the probability that should be attached to each interval. p 1 p 2 p 3 p 4 p 5 ________________________________________________________________________ 0% 20% 40% 60% 80% 100% Empirical evidence is unclear as to whether the variable interval or fixed interval method is better.
17 Probability density method Quartiles are assessed but also points on the pdf: the expert gives the point where their pdf is a maximum and values for which the pdf is half this value. Experts drew their pdf using their assessments as a guide. (An alternative to fitting a beta distribution).
18 Multiple Regression An important area where expct opinion could be of great value. In the main, experts should be questioned about observable quantities, rather than parameter values. y = β 0 + β 1 x 1 + …+ β k x k + ε, ε ~ N(0, σ 2 ). σ 2 is related to a chi-square distribution. β or β|σ is given a multivariate-normal distribution, MVN(b, R) or MVN(b, R σ 2 ). b, R and the parameters of the chi-square distribution are the quantities that must be elicited.
19 Kadane, Dickey, Winkler, Smith and Peters (1980) give a nice method. n design points x 1,…, x n are specified. The expert predicts the values of y 1,…, y n at these points. Median assessments yield b. [= (X t X) -1 X t y.50 ] A conditional set of assessments y o 1,…, y o n-1 is built up in stages. When the set consists of y o 1,…, y o i. Medians of y i+j | y o 1,…, y o i are assessed for j=1,…n-i. Quartiles of y i+1 | y o 1,…, y o i are assessed. The assessments are used to estimate R. Could use unconditional quartile assessments to obtain the diagonal elements of R.
20 Parameters for the error variance. One approach is to ask about two responses at the same design point. Suppose two patients have identical characteristics: What difference would you expect in their responses? or: If the first person had a response of 62, what response would you expect the second person to have? Hypothetical data, followed by re-assessment can be used to learn about degrees of freedom.
21 Kadane et al use the fact that the shape of a t- distribution depends upon its degrees of freedom. The 0.5, 0.75 and quantiles of Y at a design point are assessed using the method of bisection. The ratio (y y.50 ) / (y.75 - y.50 ) depends only on the degrees of freedom when Y has a t- distribution. This is done for many design points (overfitting) and the estimates of the degrees of freedom combined. A drawback is people are poor at assessing extreme quantiles.
22 Denham and Mengerson (2007) use the method to quantify opinion about the habitat distribution of an endangered species. GIS database provides information about vegetation, rock-type, rainfall, temperature, presence of roads, etc. Expert opinion was used to relate these features to the probability of presence/absence through a logistic regression. Rather than specifying design points as numbers, each design point was an actual location at a site in Queensland.
23 Garthwaite and Al-Awadhi (2006) also give a method that was used to quantify opinion about rare species in Queensland. The method asks about the relationship between the response and one explanatory variable at a time. A piecewise-linear relationship is assumed. This choice followed from discussions with the ecologists whose opinions would later be quantified. The method uses interactive graphics to elicit opinion.
24 This is the type of graph for assessing medians for a continuous variable.
25 This is the type of bar-chart formed for a factor.
26 Graph for eliciting conditional quartiles.
27 Various prior distributions were fitted to compensate for systematic biases in the experts assessments. 1.(β 0, β 1,…, β k ) multivariate normal. 2.β 0 diffuse, (β 1,…, β k ) ~ MVN(b, Σ). 3.θ, β 0 diffuse, (β 1,…, β k ) ~ MVN(θb, θ 2 Σ). 4.γ, θ, β 0 diffuse, (β 1,…, β k ) ~ MVN(θb, γΣ). Al-Awadhi and Garthwaite (2006) give results comparing the priors. Cross-validation and squared error loss were used.
28 Little bent- wing bat Common b-w bat Plumed frogmouth Powerful Owl Greater glider Prior Prior Prior Prior Stepwise logist. Regression Prior: no data
29 Quantifying opinion about unlikely events Coles and Tawn (1996) look at quantifying opinion about extreme rainfall. 1.Can experts be expected to have meaningful information about extremal behaviour? 2.How is prior information for extremes best elicited? 3.How sensitive are extrapolations to changes in prior specification? 4.How do the results compare with a classical likelihood-based analysis?
30 Data: daily rainfall at a site in south-west England Generalised extreme value (GEV) distribution for large rainfalls. (The distribution has 3 parameters.) Hydrologist was questioned on scale familiar to him: the 10, 100 and 1000 year return levels (The 10-year return level is the daily rainfall that is exceeded only once every 10 years.) He assessed the median and 0.9 quantile of : the 10 year return level (q 1 ); the difference between the 10 year and 100 year return levels (q 2 ); the difference between the 100 year and 1000 year return levels (q 3 ).
31 Over-fitting/feedback The median and 90 th percentile of the 30, 300 and 3000 year return levels were elicited. Assessments for 30 and 300 years were similar to those predicted by the GEV distribution, but not the 3000 year figures. The expert was happy with all his assessments and felt that the model did not extrapolate well to 3000 years.
32 Posterior distributions were entirely consistent with the prior estimates, though substantially more precise. Posterior distribution fitted the more extreme observations better than maximum likelihood. The prior dominated the data for return intervals that were longer than those observed in the data.
33 posterior distribution prior distribution – – – – likelihood
34 Over-fitting and Feedback Over-fitting and feedback aim to improve the quality of elicited distributions. Over-fitting – elicit more assessments than are necessary to determine a parameter and use some form of averaging to estimate the parameter. Feedback – give the expert some values that are implied by his assessments. For having determined an estimate of the regression coefficients, the expected values of the response at other design points can be calculated. Then ask whether they represent the experts opinion.
35 Feedback can be used in combination with over- fitting. After estimating the regression coefficient, calculate the expected value of the response at design points where medians were assessed. Compare the expected values with the assessments and flag any large differences. Over-fitting will almost always expose inconsistencies; simple feedback typically results in the expert confirming the proposed values. Feedback can check implications that are not direct assessments. e.g. Prior density functions for a regression coefficient can show whether it has the correct sign with high probability.
36 Making it acceptable to use assessed distributions. How should we conduct elicitation so that it is acceptable to use subjectively assessed probabilities and probability distributions? The UK National Health Service (NHS) initiated a study to estimate the benefits of current bowel cancer services in England and examine costs and benefits of alternative developments in service provision. The resulting report states, Owing to a lack of empirical evidence in a number of areas, several of the model parameter and details of the model structure were elicited from experts.
37 The experts were well chosen: they indeed had expert knowledge on the areas they were questioned about. In advance of the elicitation sessions, discussion took place with the experts to determine appropriate questions, such as identifying any covariates that the expert thought would affect the quantities of interest. Sets of questions were formed in such a way that requirements for statistical coherence were satisfied by elicited assessments without the expert focusing on these requirements, so the experts could concentrate on representing their opinions. Assessments for some quantities were validated, both by eliciting the opinions of more than one expert and comparing their answers, and by comparing an experts opinion with data.
38 Interval estimates for a quantity were normally assessed (as well as a point estimate) so as quantify an experts uncertainty. Slight differences between an experts opinions and a limited amount of data were recognised by increasing the uncertainty associated with the data estimate. Models allowed appropriately for uncertainty in the quantities that drove its outputs and confidence intervals for the outputs were determined, making it clear which conclusions were firm and which could only be tentative. The conduct of the elicitation process and the resulting assessments were reported in detail.
39 Further work needed How should we elicit very small probabilities? If the event of interest has never occurred, the expert must be extrapolating from his/her background knowledge. Are there benefits to trying to model that thought process? As elicitation methods have become more mathematical, psychologists have done less empirical work with them. There is a need for software that encourages psychologists to examine elicitation methods and compare different choices in their implementation.