Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS.

Similar presentations


Presentation on theme: "Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS."— Presentation transcript:

1 Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

2 Re-express Data ●We re-express data by taking logarithm, the square root, the reciprocal, or some other mathematical operation on all values in the data set.

3 Goals of Re-expression ●Goal 1: Make the distribution of a variable more symmetric: -It is best to summarize. To do this, we use the mean and SD. If unimodal though, we use the 68-95-99.7 rule. ●Goal 2: Make the spread of several groups more alike: -Groups that share a common spread are easier to compare.

4 Goals of Re-expression ● Goal 3: Make the form of a scatterplot more nearly linear: -The greater the value of re-expression is that we can fit a linear model once the relationship is straight ● Goal 4: Make the scatter in a scatterplot spread out evenly rather than following a fan shape

5 Ladder of Powers ●The Ladder of powers places in order the effects that many re expressions have on the data

6 Attack of the Logarithms ●Use when none of the data values is zero or negative ●Try taking the logs of both, the x-variable and y-variable ●Then re-express the data using some combination. Model NameX-axisY-axisComment Exponentialxlog(y)This model is “0” power in the ladder, useful when percent increase Logarithmiclog(x)y When a scatterplot descends rapidly at the left. Powerlog(x)log(y) When the ladder power is too big and the next is too small

7 Why Not a Curve? ●We can find “curves of best fit” using the same approach that led us to linear models ●For many reasons, it is usually better to re-express the data to straighten the plot.

8 What Can Go Wrong? ●Don’t expect to be be perfect ●Don’t choose a model based on R^2 alone ●Beware of multiple models ●Watch out for scatterplots that turn around ●Watch out for negative data values ●Watch out for data far from 1 ●Don’t stray too far from the ladder

9 Example 1 (#27) Problem: Researcher studying how a car’s gas mileage varies with its speed drove a compact car 200 miles at various speeds on a test track. Their data are shown in the table. Speed (mph) 35 40 45 50 55 60 6570 75 Miles per gal 25.927.7 28.5 29.5 29.2 27.4 26.4 24.2 22.8 Create a linear model for this relationship and report any concerns you may have about the model. Answer: Creating a straight relationship based upon this chapter is impossible.

10 Example 2 (#31) Problem: It’s often difficult to find the ideal model for the situations in which the data are strongly curved. The table below shows the rapid growth of the number of academic journals published on the Internet during the last decade. Year (L1) 1991 1992 1993 1994 1995 1996 1997 Number of Journals (L2) 27 36 45 181 306 1093 2459 a.Create a good model to describe this growth. log(journals) = -686.76 + 0.346(year) Step 1: Type in data in STAT > Edit > L1- Year (0-6) and L2-Journals Step 2: Check your residual: Type in Stat- Calc- LinREg (a+bx) L1, L2 Step 3: Start re-expressing: Find the log of journals. In your calculator type in log(L2) STO L3 (This store the Log) Step 4: Check scatterplot for the re-expressed data by changing STATPLOT specifications to Xlist:YR and Ylist: RESID. Then ZoomStat 9 Step 5: Test Residual- Perform the regression for the log of tuition vs. year with command Stat > Cal > LinReg8 (a+bx) LYR, L1, Y1 Step 6: In Stat Plot, Change Y List to RESID

11 Example 2 Continued a.Use your model to estimate the number of electronic journals in the year 2000. To estimate the year 2000 journals we must remember that in entering our data we designated 1991 as year 0. That means we’ll use 9 for the year 2001 and evaluate Y1(9) About 21497.04 Journals. a.Comment on your faith in this estimate. My calculation may be a bit too high because even though there is a rapid growth throughout the year. The model is still seemingly not correct.


Download ppt "Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS."

Similar presentations


Ads by Google