TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER

Slides:

Advertisements

Similar presentations

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Advertisements

VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.

Inference for Regression

Objectives (BPS chapter 24)

Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,

State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.

PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.

PPA 415 – Research Methods in Public Administration

Hypothesis Testing: Type II Error and Power.

Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.

1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.

PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.

Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.

Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.

Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.

PAGE # 1 STARMAP OUTREACH Scott Urquhart Department of Statistics Colorado State University.

REGRESSION AND CORRELATION

Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.

1 Learning Materials for Surface Water Monitoring Gerald Scarzella.

Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.

1 Learning Materials for Surface Water Monitoring Gerald Scarzella.

Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.

Correlation and Regression

1 Virtual COMSATS Inferential Statistics Lecture-17 Ossam Chohan Assistant Professor CIIT Abbottabad.

CORRELATION & REGRESSION

+ Chapter 12: Inference for Regression Inference for Linear Regression.

Introduction to Linear Regression

1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.

Hypothesis Testing – A Primer. Null and Alternative Hypotheses in Inferential Statistics Null hypothesis: The default position that there is no relationship.

DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Type I and Type II Errors. An analogy may help us to understand two types of errors we can make with inference. Consider the judicial system in the US.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.

VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

CHAPTER 12 More About Regression

The simple linear regression model and parameter estimation

Department of Mathematics

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Inference for Regression (Chapter 14) A.P. Stats Review Topic #3

Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3

Sections Review.

Review and Preview and Basics of Hypothesis Testing

CHAPTER 12 More About Regression

Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2016 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.

When we free ourselves of desire,

The Practice of Statistics in the Life Sciences Fourth Edition

Chapter 12 Inference on the Least-squares Regression Line; ANOVA

Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Spring 2016 Room 150 Harvill.

Chapter 12 Regression.

Chapter 9 Hypothesis Testing

Statistics for the Social Sciences

Correlation and Regression

CHAPTER 12 More About Regression

Statistics II: An Overview of Statistics

Product moment correlation

CHAPTER 12 More About Regression

Making Inferences about Slopes

Algebra Review The equation of a straight line y = mx + b

Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.

Chapters Important Concepts and Terms

Presentation transcript:

TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER N. Scott Urquhart Director, STARMAP Department of Statistics Colorado State University Fort Collins, CO 80523-1877

STARMAP FUNDING Space-Time Aquatic Resources Modeling and Analysis Program The work reported here today was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in these presentation. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreement # CR - 829095

INTENT FOR TODAY To discuss two topics which have given some of you a bit of confusion r2 in regression Power in the context of tests of hypotheses Thanks for Ann Brock and Harriett Bassett for suggesting these topics Approach: Visually illustrate the idea, Then talk about the concepts illustrated The sequences of graphs are available on the internet right now (address is at the end of this handout) Questions are welcome

r2 IN REGRESSION r2 provides a summary of the strength of a (linear) regression which reflects: The relative size of the residual variability, The slope of the fitted line, and How good the observed values of the predictor variable are for prediction Mainly the range of the Xs Let’s see these features in action, then Look at the formulas

WHAT MAKES r2 TICK? varying one thing, leaving the remaining things fixed r2 increases as residual variation decreases r2 increases as the slope increases r2 increases the range of x increases

WHAT IS r2? r2 provides A measure of the fit of a line to a set of data which incorporates The amount of residual variation, The strength of the line (slope), and How good the set of values of “x” are for estimating the line Some areas of endeavor tend to overuse it!

HOW DOES r2 TELL US ABOUT VARIATION? The following graph illustrates this: The data scatter has r2 = 0.5 (approximately) The red points have the same values, but all concentrated at X = 5. {Strictly speaking the above formulas apply only in the case of bivariate regression.} {Estimation formulas involve factors of n-1 and n-2.}

r2

FORMULAS FOR r2 But these have little intuitive appeal ! We’ll decompose observations into parts: Mean Regression Residual

DECOMPOSING REGRESSION This is really n equations Square each of these equations and add them up across i. The three cross product terms will each add to zero. (Try it!)

DECOMPOSING REGRESSION (continued)

POWER OF A TEST OF HYPOTHESIS Power = Prob(“Being right”) = Prob(Rejecting false hypothesis) Power depends on two main things The difference in the hypothesized and true situations, and The strength of the information for making the test Sample size is very important factor In regression it depends on the same factors as the ones which increase r2. Again, see it, then talk about it Power increases as D = m1 - m2 increases

POWER VARIES WITH DIFFERENCE (D = m1 - m2) and SAMPLE SIZE (n)

ON TESTS OF HYPOTHESES ( ON THE WAY TO POWER) TRUE SITUATION HYPOTHESIS FALSE HYPOTHESIS TRUE ACTION FAIL TO REJECT THE NULL HYPOTHESIS CORRECT ACTION TYPE II ERROR REJECT THE NULL HYPOTHESIS TYPE I ERROR CORRECT ACTION Tests of hypotheses are designed to control a = Prob (Type I Error) While getting Power = 1- Prob (Type II Error) as large as possible

ON TESTS OF HYPOTHESES (AN ASIDE) Which is worse, a type I error, or a type II error? It depends tremendously on perspective Consider the criminal justice system Truth: Accused is innocent (HO) or guilty (HA) Action: Accused is acquitted or convicted Type I error = Convict an innocent person Type II error = Acquit a guilty person Which is worse? Consider the difference in view of the Accused Society – especially if accused is terrorist

COMPUTING THE CRITICAL REGION Consider a simple case X ~ N( m, 1) HO: m = 4 versus HA: m ¹ 4 Critical Region (CR) is X £ l and X ³ u , so 0.025 = P(X £ l ) = P((X-4)/1 £ ( l - 4)/1) = P(Z £ -1.96) l = 2.04, similarly, u = 5.96

COMPUTING POWER Consider a simple case X ~ N( m, 1) HO: m = 4 versus HA: m ¹ 4 Power (at m = 5) = ? = Prob(XA in CR| m = 5) XA ~ N( 5, 1) Prob(XA £ 2.04) + Prob(XA ³ 5.96) = Prob(Z £ -2.96) + Prob(Z ³ 0.96) = 0.0015 + 0.1685 = 0.1700

POWER VARIES WITH DIFFERENCE (D = m1 - m2) and SAMPLE SIZE (n)

COMPUTING POWER USING A MEAN BASED ON n = 2 OBSERVATIONS Consider a simple case: When the mean of two observations follows: HO: m = 4 versus HA: m ¹ 4 Power (at m = 5) = ? Critical Region (CR) is £ l and ³ u , so 0.025 = P( £ l ) = P(( -4)/0.707 £ ( l - 4)/0.707) = P(Z £ -1.96) So l = 4 – (1.96)(0.707) = 2.61, similarly, u = 5.39

COMPUTING POWER USING A MEAN BASED ON n = 2 OBSERVATIONS (continued)

POWER VARIES WITH DIFFERENCE (D = m1 - m2) and SAMPLE SIZE (n)

COMPUTING POWER USING A MEAN BASED ON n = 4 OBSERVATIONS (continued) (This page is not in the handout – so it all would fit on one page)

POWER VARIES WITH DIFFERENCE (D = m1 - m2) and SAMPLE SIZE (n)

POWER VARIES WITH DIFFERENCE (D = m1 - m2) and SAMPLE SIZE (n)

DIRECTIONAL NOTE As the alternative has been two-sided throughout this presentation, the power curves are symmetric about the vertical axis. By examining only the positive side, we can see the curves twice as large.

YOU HAVE ACCESS TO THESE PRESENTATIONS You can find each of the slide shows shown here today at: http://www.stat.colostate.edu/starmap/learning.html Each show begins with authorship & funding slides You are welcome to use them, and adapt them But, please always acknowledge source and funding You are free to reorder the graphs if it makes more sense for r2 to decrease than increase. Urquhart is available to talk to AP Stat classes about statistics as a profession. See content on the web site above.