Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Tests of Hypotheses Based on a Single Sample
1 Connections between MCMC and Likelihood Methods Donald A. Pierce with Ruggero Bellio Winter 2010 OSU Slides are at
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
Business Statistics for Managerial Decision
Likelihood ratio tests
Models with Discrete Dependent Variables
Sampling Distributions
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML) and likelihood ratio (LR) test
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Evaluating Hypotheses
Presenting: Assaf Tzabari
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
8-3 Testing a Claim about a Proportion
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Maximum likelihood (ML)
Chapter 6: Probability.
Sample Size Determination Ziad Taib March 7, 2014.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
M obile C omputing G roup A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chapter 13: Inference in Regression
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Inference for a Single Population Proportion (p).
STATISTICAL INFERENCE PART I POINT ESTIMATION
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian Analysis and Applications of A Cure Rate Model.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Modern Navigation Thomas Herring
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
CHAPTER 17: Tests of Significance: The Basics
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
UW Winter 07 1 IMPROVING ON FIRST-ORDER INFERENCE FOR COX REGRESSION Donald A. Pierce, Oregon Health Sciences Univ Ruggero Bellio, Udine, Italy These slides.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Sampling and estimation Petter Mostad
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 51 Lecture 5 Adaptive designs 5.1Introduction 5.2Fisher’s combination method 5.3The inverse normal method.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Inference for a Single Population Proportion (p)
Sampling and Sampling Distributions
Modern Likelihood-Frequentist Inference
Binomial Distribution & Bayes’ Theorem
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Effect of the Reference Set on Frequency Inference Donald A. Pierce Radiation Effects Research Foundation, Japan Ruggero Bellio Udine University, Italy Paper, this talk, other things at

2 Frequency inferences depend to first order only on the likelihood function, and to higher order on other aspects of the probability model or reference set That is, “other aspects” not affecting the likelihood function: e.g. censoring models, stopping rules Here we study to what extent, and in what manner, second-order inferences depend on the reference set Various reasons for interest in this. e.g. : Foundational: to what extent do frequency inferences violate the Likelihood Principle? Unattractiveness of specifying censoring models Practical effects of stopping rules

3 Example: Sequential Clinical Trials Patients arrive and are randomized to treatments, outcomes Stop at n patients based on outcomes to that point. Then the data has probability model and the likelihood function is this as a function of, defined only up to proportionality. The likelihood function does not depend on the stopping rule, including that with fixed n. First-order inference based only on the likelihood function does not depend on the stopping rule, but higher-order inference does depend on this. How does inference allowing for the stopping rule differ from that for fixed sample size?

4 Example: Censored Survival Data Patients arrive and are given treatments. Outcome is response time, and when it is time for analysis some patients have yet to respond This involves what is called the censoring model. First-order inferences depend on only the likelihood function and not on the censoring model. In what way (how much) do higher-order inferences depend on the censoring model? It is unattractive that they should depend at all on this. The likelihood function based on data is but the full probability model involves matters such as the probability distribution of patient arrival times

5 Typical second-order effects Binomial regression: test for trend with 15 observations, estimate towards the boundary, P-values that should be 5% First-order: 7.3%, 0.8% Second-order 5.3%, 4.7% Generally, in settings with substantial numbers of nuisance parameters, and even for large samples, adjustments may be much larger than this --- or they may not be Testing, and when stopping at following n Sequential experiment: underlying data, stop when or

6 Starting point: signed LR statistic, first order N(0,1) so to first order To second order, modern likelihood asymptotics yield that Only the adjustment depends on the reference set, so this is what we aim to study MLE, constrained MLE, profile likelihood Model for data, parametric function of interest Some general notation and concepts

7 Consider integrated likelihoods of form where is any smooth prior on the nuisance parameter Computing a P-value for testing an hypotheses on requires only an ordering of datasets for evidence against the hypothesis Then, regardless of the prior, the signed LR statistic based on provides to second order the same ordering of datasets, for evidence against an hypothesis, as does Even to higher order, ideal inference should be based on the distribution of. Modifications of this pertain to its distribution, not to its inferential relevance

8 The theory for this is due to Barndorff-Nielsen (Bmtrka, 1986) and he refers to as Thinking of the data as, depends on notorious sample space derivatives Very difficult to compute, but Skovgaard (Bernoulli, 1996) showed they can be approximated to second order as Now return to the main point of higher-order likelihood asymptotics, namely

9 It turns out that each of these approximations has a leading term depending on only the likelihood function, with a next term of one order smaller depending on the reference set For example, Thus we need the quantity to only first order to obtain second-order final results A similar expansion gives the same result for the other sample-space derivative

10 This provides our first main result: If within some class of reference sets (models) we can write, without regard to the reference set, where the are stochastically independent, then second-order inference is the same for all of the reference sets The reason is that when the contributions are independent, the value of must agree to first order with the empirical mean of the contributions, and this mean does not depend on the reference set Thus, in this “independence” case, second-order inference, although not determined by the likelihood function, is determined by the contributions to it

11 A main application of this pertains to censoring models, if censoring and response times for individuals are stochastically independent Then the usual contributions to the likelihood, namely do not depend on the censoring model, and are stochastically independent So to second order, frequency inference is the same for any censoring model --- even though some higher-order adjustment should be made Probably should either assume some convenient censoring model, or approximate the covariances from the empirical covariances of contributions to the loglikelihood

12 Things are quite different for comparing sequential and fixed sample size experiments --- usually cannot have “contributions” that are independent in both reference sets But first we need to consider under what conditions second- order likelihood asymptotics applies to sequential settings We argue in our paper that it does whenever usual first- order asymptotics applies These conditions are given by Anscombe’s Theorem: A statistic asymptotically standard normal for fixed n remains so when: (a) the CV of n approaches zero, and (b) the statistic is asymptotically suitably continuous. Discrete n in itself does not invalidate (b)

13 In the key relation need to consider, following Pierce & Peters (JRSSB 1992), the decomposition Related to Barndorff-Nielsen’s modified profile likelihood NP pertains to effect of fitting nuisance parameters, and INF pertains to moving from likelihood to frequency inference --- INF is small when adj information is large

14 When and are chosen as orthogonal, we have that to second order depending only on the likelihood function Thus, in sequential experiments the NP adjustment and MPL do not depend on the stopping rule, but the INF adjustment does Except for Gaussian experiments with regression parameter, there is an INF adjustment both for fixed n and sequential, but they are different Parameters orthogonal for fixed-size experiments remain orthogonal for any stopping rule, since (for underlying i. i. d. observations) we have from the Wald Identity that

15 SUMMARY When there are contributions to the likelihood that are independent under each of two reference sets, then second-order ideal frequency inference is the same for these. In sequential settings we need to consider the nuisance parameter and information adjustments. To second order, the former and the modified profile likelihood do not depend on the stopping rule, but the latter does. This is all as one might hope, or expect. Inference should not, for example, depend on the censoring model but it should depend on the stopping rule

16 Appendix: Basis for higher-order likelihood asymptotics Transform from to, integrate out Provides a second-order approximation to the distribution of. The Jacobian and resultant from the integration are what comprise