Lecture 2 Outline: Thu, Jan 15

Slides:



Advertisements
Similar presentations
Traps and pitfalls in medical statistics Arvid Sjölander.
Advertisements

Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Sample Data Population Inference A very common paradigm in statistical studies:
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Correlation AND EXPERIMENTAL DESIGN
 What are evaluation criteria?  What are step3 and step 4?  What are the step3 and step4 output report? S519.
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Review: What influences confidence intervals?
Class 8: Tues., Oct. 5 Causation, Lurking Variables in Regression (Ch. 2.4, 2.5) Inference for Simple Linear Regression (Ch. 10.1) Where we’re headed:
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Lecture 10 Outline: Tue, Oct 7 Resistance of two sample t-tools (Chapter 3.3) Practical strategies for two-sample problem (Chapter 3.4) Review Office hours:
Lecture 2 Outline: Tue, Sep 9 Chapter 1.2: Statistical Inference and Study Design –Types of Inference –Observational Studies vs. Randomized Experiments.
Chapter 51 Experiments, Good and Bad. Chapter 52 Experimentation u An experiment is the process of subjecting experimental units to treatments and observing.
Lecture 5 Outline: Thu, Sept 18 Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m.
Today’s Agenda Review Homework #1 [not posted]
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Class 7: Thurs., Sep. 30. Outliers and Influential Observations Outlier: Any really unusual observation. Outlier in the X direction (called high leverage.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Lecture 4 Outline: Tue, Sept 16 Chapter 1.4.2, Chapter 1.5, additional material on sampling units and meaningful comparisons –Review of probability models.
The Practice of Statistics
EXPERIMENTS AND OBSERVATIONAL STUDIES Chance Hofmann and Nick Quigley
Chapter 2: The Research Enterprise in Psychology
Chapter 2: The Research Enterprise in Psychology
POSC 202A: Lecture 1 Introductions Syllabus R Homework #1: Get R installed on your laptop; read chapters 1-2 in Daalgard, 1 in Zuur, See syllabus for Moore.
Chapter 4 Gathering data
Chapter 1: Introduction to Statistics
4.2 Statistics Notes What are Good Ways and Bad Ways to Sample?
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
Collection of Data Chapter 4. Three Types of Studies Survey Survey Observational Study Observational Study Controlled Experiment Controlled Experiment.
Chapter 7: Data for Decisions Lesson Plan Sampling Bad Sampling Methods Simple Random Samples Cautions About Sample Surveys Experiments Thinking About.
Agresti/Franklin Statistics, 1 of 56 Chapter 4 Gathering data Learn …. How to gather “good” data About Experiments and Observational Studies.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Chapter 15 Sampling and Sample Size Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
The Normal Curve Theoretical Symmetrical Known Areas For Each Standard Deviation or Z-score FOR EACH SIDE:  34.13% of scores in distribution are b/t the.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Psych 230 Psychological Measurement and Statistics
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
©2010 John Wiley and Sons Chapter 2 Research Methods in Human-Computer Interaction Chapter 2- Experimental Research.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Inference: Probabilities and Distributions Feb , 2012.
BPS - 5th Ed. Chapter 151 Thinking about Inference.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Chapter 3 Producing Data. Observational study: observes individuals and measures variables of interest but does not attempt to influence the responses.
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Lecture 1 Outline: Thu, Sep 4 Introduction/Syllabus Course outline Some useful guidelines Case studies and
1 Chapter 11 Understanding Randomness. 2 Why Random? What is it about chance outcomes being random that makes random selection seem fair? Two things:
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Ten things about Experimental Design AP Statistics, Second Semester Review.
Producing Data 1.
A very common paradigm in statistical studies:
4.3: Using Studies Wisely.
Producing Data, Randomization, and Experimental Design
Producing Data, Randomization, and Experimental Design
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Lesson Using Studies Wisely.
Basic Probability Lecture 9.
Chapter 5: Producing Data
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Presentation transcript:

Lecture 2 Outline: Thu, Jan 15 Chapter 1.2: Statistical Inference and Study Design Types of Inference Observational Studies vs. Randomized Experiments Confounding Variables Randomized Experiments Inference to Populations: random sampling studies vs. non-random sampling studies

Drawing Conclusions An inference is a conclusion from the data about some broader context that the data represent. e.g., one egg in a container is rotten -- the rest are rotten; when we flick on a light switch, the light turns on -- flicking on the light switch will generally cause the light to turn on. A statistical inference is an inference justified by a probability model linking the data to a broader context. Statistical inferences include measures of uncertainty about the conclusions (e.g., p-values, confidence intervals)

Two “broader contexts” in statistics Population inference: an inference about population characteristics, like the difference between two population means Causal inference: an inference that a subject would have received a different numerical outcome had the subject belonged to a different group.

Causal Questions Medicine: How effective is a new drug? What is the effect of smoking on one’s chance of developing cancer? Psychology: What change in an individual’s normal solitary performance and behavior occurs when people are present? What changes in an individual’s moral behavior occur when the individual is commanded by authority? Economics: What is the effect of a change in taxes on labor supply and investment behavior? What is the effect of a change in the minimum wage on employment? Education: What is the effect of smaller class sizes on achievement?

Types of Causal Studies Observational study: Study in which group status is observed, i.e., beyond the control of the researcher. Controlled experiment: Study in which group status is controlled by the researcher. Randomized experiment: Study in which group status is assigned by a chance mechanism.

Examples of Causal Studies Motivation and creativity study (case study 1.1.1) Sex discrimination study (case study 1.1.2)? How much health damage does atomic bomb cause? -- comparison of chromosomal aberrations of Japanese atomic bomb survivors near blast and those far from blast. How many deaths does being a solider in a war (prevent or) cause? -- comparison of death rates in Navy and out of Navy during Spanish American war. How many heart attacks does taking estrogen (prevent or) cause? -- Comparison of heart attack rates of menopausal women taking estrogen and women not taking estrogen

Causal Inference Main point: statistical inferences of causation can be made from randomized experiments, but not from observational studies. In an observational study, one cannot rule out the possibility that confounding variables are responsible for group differences in the observed outcome. In an observational study, one cannot rule out the possibility of reverse causality or simultaneous causality. Which came first – the chicken or the egg? Beta-carotene intake and morbidity.

Confounding Variables A confounding (lurking) variable is a variable that is related to both group membership and the outcome. Its presence makes it hard to establish the outcome as being a direct consequence of group membership. Examples: Sex discrimination study Death rates in and out of Navy study Estrogen study Although it is possible to control for known confounding variables (via multiple regression), in an observational study we can never be sure that there are not unknown confounding variables that are responsible for group differences in outcome.

Association Is Not Causation There is a close relationship between the salaries of Presbyterian ministers in Massachusetts and the price of rum in Havana. Are the ministers benefiting from the rum trade or supporting it? A study showed that cigarette smokers have lower college grades than non-smokers. Does the road to good grades lie in giving up smoking?

Do Observational Studies Have Value – Yes! Establishing causation is not always the goal (prediction may be the goal) Establishing causation may be done in other ways. Experiments not always practical or ethical Analysis of observational data may lend evidence toward causal theories and suggest the direction of future research.

Criteria for Establishing Causation From Obs. Studies The association is strong. The association is consistent. Higher doses are associated with stronger responses. The alleged cause precedes the effect in time. The alleged cause is plausible. Examples: Smoking and lung cancer Radiation from atomic bomb and chromosomal aberrations

Randomized Experiments In a randomized experiment, an impersonal chance mechanism is used to assign the units to groups. In a randomized experiment, any relationship between important variables and group membership can only arise through chance. Suppose that there is a treatment and control group and the treatment group has a higher observed response than the control group. In a randomized experiment, the difference must be due to either to the treatment or to the play of chance in the random assignment of units to the groups. Statistical inference provides a method for describing how confident we can be that an observed difference between the treatment and control groups did not arise due to chance.

Statistical Inference in the Motivation-Creativity Study The creativity scores tended to be larger in the “intrinsic” than in the “extrinsic” group. Either the intrinsic questionnaire caused a higher score or else the more creative writers happened to be placed in the “intrinsic” group. The probability (p-value) associated with this latter possibility is 0.011. Moderate to convincing evidence that taking the intrinsic question in fact caused writers to be more creative.

Law of Large Numbers and Replication The Law of Large Numbers: Draw independent observations at random from any population with finite mean . Decide how accurately you would like to estimate . As the number of observations drawn increases, the mean of the values eventually comes as close to the mean as you specified and stays that close. The law of large numbers guarantees that if enough units are used, any important variables (whether we are aware of them or not) will be divided roughly equally between the two groups.

Random Assignment in JMP To randomly assign units to two groups of size n_1 and n_2 in JMP: Create a column called “random.” Right click on the top of this column, click on formula, click on the random function and then click on Random Uniform. Click on Tables, Sort and then sort by random. Create a column “group.” Label the first n_1 units in the table as Group I and the rest of the units as Group II

Inference to Populations Goal: Make conclusions about aspects of a population (e.g., mean income in U.S.) based on a sample. Two types of sampling designs. Random sampling study: Units are selected by the investigator from a well-defined population through a chance mechanism with each unit having a known (>0) chance of being selected. Non-random sampling study: Units selected in way other than through chance (e.g., units selected by taking volunteers)

Simple Random Sample Simple random sample of size n: Subset of population of size n selected in such a way that every subset of size n has same chance of being selected. Equivalent to drawing units out of a hat without replacement. Simple random sample in JMP: Click on Tables, Subset, then put the number n in the box “Sampling Rate or Sample Size.”

Random Samples and Statistical Inference Statistical inference about the population can be made based on the sample for random sampling studies (Ch. 1.4.1) by using the sampling design. The sample might turn out to be nonrepresentative (i.e., have markedly different characteristics than the population) but: We can describe the uncertainty due to the chance mechanism of random sampling accurately because we know how the sample was generated (Chapter 1.4.1). The law of large numbers guarantees that if we take a large enough sample, the sample mean (and other characteristics) will be very close to the population mean.

The Literary Digest Poll The Literary Digest Poll. In the 1936 presidential election, the Literary Digest predicted an overwhelming victory for Landon over Roosevelt. Roosevelt won the election by a landslide – 62% to 38%. What went wrong? The sample was taken by mailing questionnaires to 10 million people whose names and addresses came from sources like telephone books and club membership lists. 2.4 million peopled returned the samples.

Biased Samples Selection Bias: When the procedure for selecting a sample results in samples that are systematically different from the population. When a selection procedure is biased, taking a large sample does not help. This just repeats the basic mistake on a large scale. Causes of selection bias: Voluntary Response Sample Undercoverage Nonresponse

Statistical Inferences Permitted by Study Designs Display 1.5 Examples: Motivation and Creativity study Sex Discrimination study Researchers measured the lead content in teeth and IQ scores for all 3,229 children attending first and second grade between 1975 and 1978 in Chelsea and Somerville, Mass. IQ scores for those with low lead concentrations found to be significantly higher than for those with high lead concentrations. Conceptual Exercises 1-12 in Ch. 1 relate to statistical inferences permitted by study designs.

Summary “Random samples and randomized experiments are representative in the same sense that flipping a coin to see who takes out the garbage is fair.” (Chapter 1.5.5) Two key advantages of random samples and randomized experiments over nonrandom samples and observational studies are the following: Uncertainty about representativeness can be incorporated into the statistical analysis. If randomization were abandoned, there would be no way to express uncertainty accurately. Law of large numbers guarantees that if we have a large sample size, we will get accurate results. Without using randomization, our results could be systematically biased even for a large sample.