Lecture 8 Resistance of two-sample t-tools and outliers (Chapters 3.3-3.4) Transformations of the Data (Chapter 3.5)

Slides:



Advertisements
Similar presentations
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
Advertisements

CHAPTER 25: One-Way Analysis of Variance: Comparing Several Means ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner.
Lecture 6 Outline – Thur. Jan. 29
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Chapter 7: Statistical Analysis Evaluating the Data.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 7 Outline Levene’s test for equality of variances (4.5.3) Interpretation of p-values (2.5.1) Robustness and resistance of t-tools ( )
Lecture 13 – Tues, Oct 21 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The One-Way.
Lecture 10 Outline: Tue, Oct 7 Resistance of two sample t-tools (Chapter 3.3) Practical strategies for two-sample problem (Chapter 3.4) Review Office hours:
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Homework Chapter 11: 13 Chapter 12: 1, 2, 14, 16.
Lecture 5 Outline: Thu, Sept 18 Announcement: No office hours on Tuesday, Sept. 23rd after class. Extra office hour: Tuesday, Sept. 23rd from 12-1 p.m.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 10 Review Rank Sum test (Chapter 4.2) Welch t-test for comparing two normal populations with unequal spreads (Chapter 4.3.2) Practical and statistical.
Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.
Copyright © 2010 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks.
Chapter 11: Inference for Distributions
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Lecture 8 Outline: Tue, Sept 30
Lecture 13: Tues., Feb. 24 Comparisons Among Several Groups – Introduction (Case Study 5.1.1) Comparing Any Two of the Several Means (Chapter 5.2) The.
7.1 Lecture 10/29.
Copyright © 2009 Pearson Education, Inc. Chapter 28 Analysis of Variance.
F-Test ( ANOVA ) & Two-Way ANOVA
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 28 Analysis of Variance.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Ch 11 – Inference for Distributions YMS Inference for the Mean of a Population.
Comparing Two Population Means
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 10 Comparing Two Means Target Goal: I can use two-sample t procedures to compare two means. 10.2a h.w: pg. 626: 29 – 32, pg. 652: 35, 37, 57.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 24, Slide 1 Chapter 24 Paired Samples and Blocks.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CHAPTER 11 SECTION 2 Inference for Relationships.
Ch. 2 – Modeling Distributions of Data Sec. 2.2 – Assessing Normality.
Nonparametric Tests IPS Chapter 15 © 2009 W.H. Freeman and Company.
STA 2023 Module 11 Inferences for Two Population Means.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 6 Hypothesis Tests with Means.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Copyright (c) Bani Mallick1 STAT 651 Lecture 8. Copyright (c) Bani Mallick2 Topics in Lecture #8 Sign test for paired comparisons Wilcoxon signed rank.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Essential Statistics Chapter 171 Two-Sample Problems.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Statistics 25 Paired Samples. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 9 Introduction to the t Statistic
The Practice of Statistics, 5 th Edition1 Check your pulse! Count your pulse for 15 seconds. Multiply by 4 to get your pulse rate for a minute. Write that.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.2.
CHAPTER 10 Comparing Two Populations or Groups
Statistical Data Analysis - Lecture 05 12/03/03
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Paired Samples and Blocks
Comparing Two Populations
CHAPTER 10 Comparing Two Populations or Groups
Chapter 24 Comparing Two Means.
Data Transformation, T-Tools and Alternatives
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Presentation transcript:

Lecture 8 Resistance of two-sample t-tools and outliers (Chapters ) Transformations of the Data (Chapter 3.5)

Outliers and resistance Outliers are observations relatively far from their estimated means. Outliers may arise either –(a) if the population distribution is long-tailed. –(b) they don’t belong to the population of interest (come from contaminating population) A statistical procedure is resistant if one or a few outliers cannot have an undue influence on result.

Resistance Illustration for understanding resistance: the sample mean is not resistant; the sample median is. –Sample: 9, 3, 5, 8, 100 –Mean with outlier: 25, without: 6.2 –Median with outlier: 8, without: 6.5 t-tools are not resistant to outliers because they are based on sample means.

Strategy for Dealing With Outliers Follow Display 3.6 Important aspect of strategy: An outlier does not get swept under the rug simply because it is different from the other observations. To warrant its removal, an explanation for why it is different must be established.

Excluding Observations from Analysis in JMP for Investigating Outliers Click on row you want to exclude. Click on rows menu and then click exclude/unexclude. A red circle with a line through it will appear next to the excluded observation. Multiple observations can be excluded. To include an observation that was excluded back into the analysis, click on excluded row, click on rows menu and then click exclude/unexclude. The red circle next to observation should disappear.

Conceptual Question #6 (a) What course of action would you propose for the statistical analysis if it was learned that Vietnam veteran #646 (the largest observation in Display 3.6) worked for several years, after Vietnam, handling herbicides with dioxin? (b) What would you propose if this was learned instead for Vietnam veteran #645 (second largest observation)?

Rules of thumb for validity of t- tools Assumptions and rules of thumb for validity of t- tools in the face of violations –Normality: Look for gross skewness. Okay if both sample sizes greater than 30. –Equal spread: Validity okay if ratio of larger sample standard deviation to smaller sample standard deviation is less than 2 and ratio of larger group size to smaller group size is less than 2. Consider transformations. –Outliers: Look for outliers in box plots, especially very extreme points (more than 3 box-lengths away from box). Apply the examination strategy in Display 3.6. –Independence: If indep. not appropriate, apply matched pairs if appropriate or other tools later in course.

Case Study 3.1.1: Cloud Seeding A random experiment was conducted to test a hypothesis that massive injection of silver iodide into cumulus clouds can lead to increased rainfall. On each of 52 days that were deemed suitable for cloud seeding, a random mechanism was used to decide whether to seed target cloud on that day or leave it unseeded as a control. Airplane flew through cloud in both cases, experimenters were blind to whether seeding was used – double blind trial. Question of interest: Did cloud seeding cause higher rainfall in this experiment?

The log transformation Let log denote the logarithm to the base e, ln, log(x)=c means log(2.718)=1, log( )=2, etc. Procedure: –Transform to get two new columns: –Graphically examine to see if the t-tools are appropriate for –If appropriate, use t-tools on –Interpret results on original scale

Cloud seeding data after log transformation

Interpretation – Causal Inference If the randomized experiment model with additive treatment effect is thought to hold for the log- transformed data, then an experimental unit that would respond to treatment 1 with a logged outcome of log(Y) would respond to treatment 2 with a logged outcome of log(Y)+ i.e., experimental unit responds to treatment 1 with an outcome of Y and treatment 2 with an outcome of Y Multiplicative treatment effect model: The effect of the treatment 2 is to multiply the treatment 1 outcome by

Inference for multiplicative treatment effects To test whether there is any treatment effect, perform the usual t-test for with the log transformed data To describe the treatment effect, “back- transform” the estimate of and the endpoints of the confidence interval for from the log-transformed data.

Log Transformation for Population Inference Consider comparing means of two populations. If the populations appear skewed with the larger population having the larger spread, using the t- tools to analyze the log transformed data might be more appropriate. Using the t-tools on the log transformed data is appropriate (i.e., produces approximately valid results) if and are approximately normally distributed.

Inference for Population Medians If distributions of Z 1 =log(Y 1 ) and Z 2 =log(Y 2 ) appear approximately normal with equal SD, then we can make inferences about the ratio of population medians for Y 1 and Y 2 as follows: –To test if population medians are the same, test the null hypothesis that the means of Z 1 and Z 2 are the same –An estimate of the ratio of the population 2 median to the population 1 median is exp( ). –To form a confidence interval for the ratio of population medians, form a confidence interval for the difference in the means of Z 1 and Z 2, (U,L). A confidence interval for the ratio of the population 2 median to the population 1 median is

When to use log transformation What indicates that log might work? –Distributions are skewed –Spread is greater in the distribution with larger center –The data values differ by orders of magnitude, e.g., as a rough guide, the ratio of the largest to the smallest is >10 (or perhaps >4) –Multiplicative statement is desirable

Other transformations Square root transformation - applies to data that are counts and to measurements of area Reciprocal transformation - applies to data that are waiting times (e.g., time to failure of lightbulbs), reciprocal of time measurement can often be interpreted directly as a rate or a speed Goals of transformation: Establish a scale on which two groups have roughly the same spread. –Inferences from log transformation are directly interpretable when converted back to original scale of measurement. Other transformations are not so easily interpretable, e.g., square of difference between means of and is not so easily interpretable.