IPSS Ch 2. Selection Problem 2.1. The Nature of the Problem Non-Response, Dropped from Census, Sample Attrition in Longitudinal Survey, Censored Data We.

Slides:



Advertisements
Similar presentations
Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Advertisements

Significance Tests Hypothesis - Statement Regarding a Characteristic of a Variable or set of variables. Corresponds to population(s) –Majority of registered.
Chapter 7 Statistical Inference: Confidence Intervals
Statistics Versus Parameters
Observational Studies Based on Rosenbaum (2002) David Madigan Rosenbaum, P.R. (2002). Observational Studies (2 nd edition). Springer.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Chapter 11: Inference for Distributions
BPS - 3rd Ed. Chapter 131 Confidence intervals: the basics.
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
BCOR 1020 Business Statistics
Random Variables and Probability Distributions
Analysis of Complex Survey Data
Standard Error of the Mean
© 2010 Pearson Prentice Hall. All rights reserved Chapter Estimating the Value of a Parameter Using Confidence Intervals 9.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 2 – Slide 1 of 25 Chapter 11 Section 2 Inference about Two Means: Independent.
Inference for a Single Population Proportion (p).
T tests comparing two means t tests comparing two means.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
ESTIMATION. STATISTICAL INFERENCE It is the procedure where inference about a population is made on the basis of the results obtained from a sample drawn.
Statistics 101 Chapter 10. Section 10-1 We want to infer from the sample data some conclusion about a wider population that the sample represents. Inferential.
Copyright © 2009 Cengage Learning Chapter 10 Introduction to Estimation ( 추 정 )
Sampling Class 7. Goals of Sampling Representation of a population Representation of a population Representation of a specific phenomenon or behavior.
Statistical Methods Introduction to Estimation noha hussein elkhidir16/04/35.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Agresti/Franklin Statistics, 1 of 87  Section 7.2 How Can We Construct a Confidence Interval to Estimate a Population Proportion?
Medical Statistics as a science
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
7 - 1 © 1998 Prentice-Hall, Inc. Chapter 7 Inferences Based on a Single Sample: Estimation with Confidence Intervals.
Extending Group-Based Trajectory Modeling to Account for Subject Attrition (Sociological Methods & Research, 2011) Amelia Haviland Bobby Jones Daniel S.
Chapter 8: Confidence Intervals based on a Single Sample
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
1 BA 275 Quantitative Business Methods Quiz #2 Sampling Distribution of a Statistic Statistical Inference: Confidence Interval Estimation Introduction.
1 Study Design Issues and Considerations in HUS Trials Yan Wang, Ph.D. Statistical Reviewer Division of Biometrics IV OB/OTS/CDER/FDA April 12, 2007.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
1 If we live with a deep sense of gratitude, our life will be greatly embellished.
BPS - 3rd Ed. Chapter 191 Comparing Two Proportions.
Week 7 Chapter 6 - Introduction to Inferential Statistics: Sampling and the Sampling Distribution & Chapter 7 – Estimation Procedures.
Ch 8 Estimating with Confidence 8.1: Confidence Intervals.
1 ES Chapters 14 & 16: Introduction to Statistical Inferences E n  z  
Week 111 Review - Sum of Normal Random Variables The weighted sum of two independent normally distributed random variables has a normal distribution. Example.
Hypothesis Tests for 1-Proportion Presentation 9.
Sampling Class 7. Goals of Sampling Representation of a population Representation of a population Representation of a specific phenomenon or behavior.
Essential Statistics Chapter 191 Comparing Two Proportions.
Comparing Two Proportions Chapter 21. In a two-sample problem, we want to compare two populations or the responses to two treatments based on two independent.
Inference for a Single Population Proportion (p)
Instrumental Variables Regression
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
One-Sample Inference for Proportions
CHAPTER 4 Designing Studies
Bootstrap Confidence Intervals using Percentiles
Section 10.1: Confidence Intervals
Section 3: Estimating p in a binomial distribution
CHAPTER 4 Designing Studies
Inference for Distributions
LESSON 18: CONFIDENCE INTERVAL ESTIMATION
Comparing Two Populations
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Random Variables and Probability Distributions
CHAPTER 4 Designing Studies
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Chapter 9 Lecture 4 Section: 9.4.
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

IPSS Ch 2. Selection Problem 2.1. The Nature of the Problem Non-Response, Dropped from Census, Sample Attrition in Longitudinal Survey, Censored Data We (Social Scientists) are interested in Treatment-Effects, e.g., What is the effect of Treatment on Y ? Schooling Market Wages Welfare Labor supply Sentencing Policy Crime commission New Drug AIDS patients Surgery Life span Chemotherapy Life span We can’t observe the differences. 1

IPSS Ch 2. Selection Problem 2 Selection Problem Example: Market Wage depends on, Schooling, Work Experience, Demographic Background (covariates) Note: The selection problem is logically separate from the extrapolation problem. (New Challenge) Extrapolation Problem -arises from the fact that random sampling does not yield observations of y off the support of x. Selection Problem - arises when a censored random sampling process does not fully reveal the behavior of y on the support of x.

IPSS Ch 2. Selection Problem Binary outcome z (y,z,x) z = 1 if y is observed, z = 0 if not observed Observe y only when z =1. Example: y : Market Wage x : Education, Work Experience, Race, Sex, …(covariates) z : observation z =1 observed, z = 0 not observed 3

IPSS Ch 2. Selection Problem P(y| x) (2.1) P(y| x) = P(y| x, z = 1) P(z = 1| x) + P(y| x, z = 0) P(z = 0| x) Law of Total Probability Selection probability P(z = 1| x) Censoring probabilityP(z = 0| x) Conditional probability P(y| x)? because P(y| x, z = 0) is unobservable (2.2)P(y| x) = P(y| x, z = 1) P(z = 1| x) +  P(z = 0| x) 4

IPSS Ch 2. Selection Problem Outline of Chapter worst case scenario: no information on  2.3 an empirical illustration 2.4 identifying power of prior information 2.5 – 2.8 problems of identifying treatment effects 5

IPSS Ch 2. Selection Problem Identification from Censored Samples Alone Two Negative Facts Fact 1. Conditional Probability Assume exogenous or ignorable selection (2.3)P(y| x, z = 0) = P(y| x, z = 1)  P(y| x) = P(y| x, z = 1) Can we refute validity of (2.3)? No! Assumption (2.3) is necessarily consistent with the empirical evidence.

IPSS Ch 2. Selection Problem 7 Fact 2 Conditional Expectation E(y| x) (2.4) E(y| x) = E(y| x, z = 1) P(z = 1| x) + E(y| x, z = 0) P(z = 0| x) E(y| x, z = 1), P(z = 1| x), P(z = 0|x) are identifiable, E(y| x, z = 0) isn’t.

IPSS Ch 2. Selection Problem 8 Bounds on conditional probabilities Selection problem is not fatal in the absence of prior information. We still find informative and interpretable bounds. B: set of outcome (e.g., “success”) P(y ∈ B| x) (2.5) P(y ∈ B| x) = P(y ∈ B| x, z = 1) P(z = 1| x) + P(y ∈ B| x, z = 0) P(z = 0| x). P(y ∈ B| x, z = 1), P(z = 1|x), P(z = 0|x) are identifiable. but no information on P(y ∈ B| x, z = 0)

IPSS Ch 2. Selection Problem 9 Can we say anything about it? Yes! We can find bounds, [Lower Limit, Upper Limit]. (2.6) P(y ∈ B| x, z = 1) P(z =1|x) lower(  =0) P(y ∈ B| x) ≤ P(y ∈ B| x) ≤ P(y ∈ B| x, z = 1) P(z =1| x) + P(z = 0| x)upper(  =1) B: event (y ≤ t) (2.7) P(y ≤ t| x, z = 1) P(z = 1| x) P(y ≤ t| x) ≤ P(y ≤ t| x) ≤ P(y ≤ t| x, z = 1) P(z =1| x) + P(z = 0| x).

IPSS Ch 2. Selection Problem 10 Statistical Inference ・ The selection problem is a failure of identification. The bounds are functions of P(y| x, z = 1) and P(z| x). We can estimate the features of these distributions, and obtain estimates of the bounds. P(y ∈ B| x) Example: to estimate the bound (2.6) on P(y ∈ B| x) Estimate P(y ∈ B| x, z = 1) and P(z = 1| x) as in Section 1.3. The precision of an estimate of the bound can be measured by confidence interval around the estimate.

IPSS Ch 2. Selection Problem 11 Distinction between the bound and the confidence interval (around its estimate) The bound on P(y ∈ B| x) is a population concept. what could be learned about P(y ∈ B | x) if one knew P(y ∈ B| x, z = 1) and P(z| x). The confidence interval is a sampling concept. the precision with which the bound is estimated when estimates of P(y ∈ B| x, z = 1) and P(z| x) are obtained from a sample of fixed size. The confidence interval is typically wider than the bound but narrows to match the bound as the sample size increases.

IPSS Ch 2. Selection Problem Bounding the Probability of Exiting Homelessness Population: Homeless People at time t 0 Outcome (y): y = 1 Home y = 0 Still Homeless Background (x): race, sex, education, etc Selection: z = 1 interviewed, z = 0 not interviewed Conditioning Variable: Sex Male Sample size at t 0 : 106 Sample size at t 1 : out of HL P(y=1| male, z = 1) = 21/64 P(z=1| male) = 64/106 Bound of P(y=1| male) [21/106, 63/106] = [0.20,0.59]

IPSS Ch 2. Selection Problem 13 Female Sample size at t 0 : 31 Sample size at t 1 : 14 3 out of HL Bound of P(y = 1| female) [3/31, 20/31] = [0.10, 0.65] Point : Without restrictions on the attrition process, we have got meaningful bounds Continuous case Condition: Sex, Income Income : What was the best job you ever had? ($/week) Sample size Male 89 Female 22

IPSS Ch 2. Selection Problem 14 Fig.2.1 Attrition Probabilities P(z=0| x)

IPSS Ch 2. Selection Problem 15 Fig.2.2 Estimated Bounds P(y=1| x) Lower Bound: P(y = 1| x, z = 1) P(z = 1| x) Upper Bound: P(y = 1| x, z = l) P(z = 1| x) + P(z = 0| x)

IPSS Ch 2. Selection Problem 16 ・ The estimated bound is tightest at the low end of the income domain and spreads as income increases. The interval : [.24,.55] at income $50 [.23,.66] at income $600. ・ This spreading reflects the fact that the estimated probability of attrition increases with income. Is the Cup Part Empty or Part Full? P(male exits HL) = P(y = 1|male) : [.20,.59] Improvement from [0.0, 1.0] Can we narrow the interval? Can we pin down the P(y = 1| male)?