Critical Analysis. Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Section 1-1 Review and Preview. Preview Polls, studies, surveys and other data collecting tools collect data from a small part of a larger group so that.
Correlation and regression Dr. Ghada Abo-Zaid
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Introduction to Hypothesis Testing
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Discovering and Describing Relationships
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Educational Research: Correlational Studies EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
1 CHAPTER M4 Cost Behavior © 2007 Pearson Custom Publishing.
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Correlation and Linear Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Linear Regression and Correlation
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
10.3: Use and Abuse of Tests Carrying out a z-confidence interval calculation is simple. On a TI-83, it’s STAT-->TESTS-->7 Similarly, carrying out a z-signficance.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Scientific Inquiry & Skills
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 3 – Statistics of Two Variables
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Bivariate Data Analysis Bivariate Data analysis 4.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
3.5 Critical Analyses. Critical Analyses You may think that graphs and stats appear in print (newspapers, magazines, journal articles, are true and unbiased.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Section 1-1 Review and Preview
Correlation and Linear Regression
Correlation and Linear Regression
Regression Analysis.
Critical Analysis.
Unit 2 Exploring Data: Comparisons and Relationships
Determining How Costs Behave
7.3 Best-Fit Lines and Prediction
Elementary Statistics
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Educational Research: Correlational Studies
7.3 Best-Fit Lines and Prediction
© 2017 by McGraw-Hill Education
Correlation and Causality
Linear Regression and Correlation
Presentation transcript:

Critical Analysis

Key Ideas When evaluating claims based on statistical studies, you must assess the methods used for collecting and analysing the data. Some critical questions are –Is the sampling process free from intentional and unintentional bias? Could any outliers or extraneous variables influence the results? –Are there any unusual patterns that suggest the presence of a hidden variable? –Has causality been inferred with only correlational evidence?

Example 1: Sample Size and Technique A manager wants to know if a new aptitude test accurately predicts employee productivity. The manager has all 30 current employees write the test and then compares their scores to their productivities as measured in the most recent performance reviews. The data is ordered alphabetically by employee surname. In order to simplify the calculations, the manager selects a systematic sample using every seventh employee. Based on this sample, the manager concludes that the company should hire only applicants who do well on the aptitude test. Determine whether the manager's analysis is valid.

Test ScoreProductivity

Analysis A linear regression line of best fit with the equation y = 0.552x r = 0.98 strong linear correlation between productivity and scores on the aptitude test. calculations seem to support the manager's conclusion. However, the manager has made the questionable assumption that a systematic sample will be representative of the population. The sample is so small that statistical fluctuations could seriously affect the results.

Instead…Examine all the data… A scatter plot with all 30 data points does not show any clear correlation at all A linear regression yields a line of best fit with the equation y = 0.146x + 60 and a correlation coefficient of only 0.154

Conclusion The new aptitude test will probably be useless for predicting employee productivity. The sample was far from representative. The manager's choice of an inappropriate sampling technique has resulted in a sample size too small to make any valid conclusions. The manager should have done an analysis using all of the data available. Even then the data set is still somewhat small to use as a basis for a major decision such as changing the company's hiring procedures. Small samples are also particularly vulnerable to the effects of outliers.

Example 2: Extraneous Variables and Sample Bias An advertising blitz by SuperFast Computer Training Inc. features profiles of some of its young graduates. The number of months of training that these graduates took, their job titles, and their incomes appear prominently in the advertisements. Graduate Months of Training Income ($000s) Sarah (software developer)985 Zack (programmer)663 Eli (systems analyst)872 Yvette (computer technician)552 Kulwinder (web-site designer)666 Lynn (network administrator)460

Question a) Analyze the company's data to determine the strength of the linear correlation between the amount of training the graduates took and their incomes. Classify the linear correlation and find the equation of the linear model for the data.

Analysis The scatter plot for income versus months of training shows a definite positive linear correlation The regression line is y = 5.44x correlation coefficient is 0.90 There appears to be a strong positive correlation between the amount of training and income

Question b) Use this model to predict the income of a student who graduates from the company's two-year diploma program after 20 months of training. Does this prediction seem reasonable?

Analysis y = 5.44x y = 5.44(20) y = 141 The linear model predicts that a graduate who has taken 20 months of training will make about $ a year. This amount is extremely high for a person with a two-year diploma and little or no job experience. The prediction suggests that the linear model may not be accurate, especially when applied to the company's longer programs

Question c) Does the linear correlation show that SuperFast's training accounts for the graduates' high incomes? Identify possible extraneous variables.

Analysis Although the correlation between SuperFast's training and the graduates' incomes appears to be quite strong, the correlation by itself does not prove that the training causes the graduates high incomes. A number of extraneous variables could contribute to the graduates' success, including –experience prior to taking the training –aptitude for working with computers –access to a high-end computer at home –family or social connections in the industry –physical stamina to work very long hours

Question d) Discuss any problems with the sampling technique and the data.

Analysis Sample is small and could have intentional bias. No indication that the individuals in the advertisements were randomly chosen from the population of SuperFast's students. The company may have selected the best success stories in order to give potential customers inflated expectations of future earnings. Also, the company shows youthful graduates, but does not actually state that the graduates earned their high incomes immediately after graduation. The amounts given are incomes, not salaries. The income of a graduate working for a small start-up company might include stock options that could turn out to be worthless. In short, the advertisements do not give you enough information to properly evaluate the data.

Example 3: Detecting a Hidden Variable An arts council is considering whether to fund the start-up of a local youth orchestra. The council has a limited budget and knows that the number of youth orchestras in the province has been increasing. The council needs to know whether starting another youth orchestra will help the development of young musicians. One measure of the success of such programs is the number of youth-orchestra players who go on to professional orchestras. The council has collected the following data.

Year Number of Youth OrchestrasNumber of Players Becoming Professionals

Question a) Does a linear regression allow you to determine whether the council should fund a new youth orchestra? Can you draw any conclusions from other analysis?

Analysis A scatter plot of the number of youth-orchestra members who become professionals versus the number of youth orchestras shows weak positive linear correlation. The correlation coefficient is 0.16 Conclusion: starting another youth orchestra will not help the development of young musicians. But, notice the two clusters in the scatter plot This pattern suggests the presence of a hidden variable You need more information to determine the nature and effect of the possible hidden variable.

Questions b) Suppose you discover that one of the country's professional orchestras went bankrupt in How does this information affect your analysis?

Analysis The collapse of a major orchestra means –there is one less orchestra hiring young musicians –about a hundred experienced players are suddenly available for work with the remaining professional orchestras. The resulting drop in the number of young musicians hired by professional orchestras could account for the clustering of data points you observed in part a). Because of the change in the number of jobs available for young musicians, it makes sense to analyze the clusters separately.

both sets of data exhibit a strong linear correlation. correlation coefficients are 0.93 for the data prior to 1997 and 0.94 for the data from 1997 on. The number of players who go on to professional orchestras is strongly correlated to the number of youth orchestras. funding the new orchestra may be a worthwhile project for the arts council. presence of a hidden variable, the collapse of a major orchestra, distorted the data and masked the underlying pattern. However, splitting the data into two sets results in smaller sample sizes, so you still have to be cautious about drawing conclusions.

Conclusions Although the major media are usually responsible in how they present statistics, you should be cautious about accepting any claim that does not include information about the sampling technique and the analytical methods used. Intentional or unintentional bias can invalidate statistical claims. Small sample sizes and inappropriate sampling techniques can distort the data and lead to erroneous conclusions. Extraneous variables must be eliminated or accounted for. A hidden variable can skew statistical results and yet be hard to detect.