Ch. 12 More about regression Ch. 12-2 Transforming to Achieve Linearity Ch. 12 More about regression
L1 → Length L2 → Weight LinReg for L1 and L2 𝑤𝑒𝑖𝑔ℎ𝑡 =−299.04+25.2(𝐿𝑒𝑛𝑔𝑡ℎ) 𝑤𝑒𝑖𝑔ℎ𝑡 =−299.04+25.2 24 =305.82 grams Look at the data, is this a good prediction? No
Resid (Obs − Exp): L4 → L2 − L3 Pred: L3 → −299.04+25.2(L1) Res Resid (Obs − Exp): L4 → L2 − L3 Resid Plot: stat plot → L1 vs L4 Length curved not linear transform powers roots logarithms Weight is related to volume. weight = a ⋅ (length)3 Volume is usually proportional to (length)3.
L3 → L1 3 stat plot → L3 vs L2 𝑤𝑒𝑖𝑔ℎ𝑡 =4.07+.0147 𝑙𝑒𝑛𝑔𝑡ℎ 3 Weight stat plot → L3 vs L2 (Length)3 𝑤𝑒𝑖𝑔ℎ𝑡 =4.07+.0147 𝑙𝑒𝑛𝑔𝑡ℎ 3 LinReg for L3 and L2
Much better prediction 𝑤𝑒𝑖𝑔ℎ𝑡 =4.07+.0147 24 3 =207.3 grams 𝑤𝑒𝑖𝑔ℎ𝑡 =4.07+.0147 24 3 =207.3 grams 𝑟=.997 What does this mean? correlation coefficient Strong positive relationship for predicted weight and (length)3. Pred: L4 → 4.07+.0147(L3) Res Resid (Obs − Exp): L5 → L2 − L4 (Length)3 Resid Plot: stat plot → L3 vs L5 Random scatter = good!
Now use LinRegTTest to check each value 𝑤𝑒𝑖𝑔ℎ𝑡 =4.07+.0147 𝐿𝑒𝑛𝑔𝑡ℎ 3 use (length)3 list 𝑏−𝛽 S E 𝑏 = .0147−0 .00024 𝑠 𝑠 𝑥 𝑛−1 𝑠 𝑥 =17983.9 4.07 .563 always two-sided .0147 .00024 61.07 18.84 99.52% 𝑡cdf .59, 9999, 18 ×2 𝑟𝑒𝑠𝑖 𝑑 2 𝑛−2 𝑟 2 𝑡cdf(61.07, 9999, 18)×2 Now use LinRegTTest to check each value
Same for ln (natural log) Linear function. Already in linear form. Logarithmic function. Already in linear form. Exponential function. Power function. log 𝑦 = log (𝑎 𝑏 𝑥 ) log 𝑦 = log (𝑎 𝑥 𝑏 ) log 𝑦 = log 𝑎 + log 𝑏 𝑥 log 𝑦 = log 𝑎 + log 𝑥 𝑏 log 𝑦 = log 𝑎 + 𝑥log 𝑏 log 𝑦 = log 𝑎 + 𝑏log 𝑥 linear form linear form
Use your calculator and do LinReg for log 𝑥 and log 𝑦 lists length weight log(L1) log(L2) 𝑥 𝑦 log(𝑥) log(𝑦) no no no yes! power 𝑦=𝑎 𝑥 𝑏 log 𝑤𝑒𝑖𝑔ℎ𝑡 =−1.899+3.05 log 𝑙𝑒𝑛𝑔𝑡ℎ Use your calculator and do LinReg for log 𝑥 and log 𝑦 lists Yes! Random scatter means there’s a linear relationship. Pred: L5 → −1.899+3.04(L3) Res Resid (Obs − Exp): L6 → L4 − L5 log(length) Resid Plot: stat plot → L3 vs L6
log 𝑤𝑒𝑖𝑔ℎ𝑡 =−1.899+3.04 log 𝑙𝑒𝑛𝑔𝑡ℎ 𝑤𝑒𝑖𝑔ℎ𝑡 =0.0126⋅ 𝑙𝑒𝑛𝑔𝑡ℎ 3.04 𝑦=𝑎 𝑥 𝑏 weight LSRL On your calculator, put this equation into Y1 and graph it with the original scatterplot (𝑥 vs 𝑦). length On the AP Exam, you don’t need to know the algebraic properties of logs. However, you should know how to use calculations to transform data using logs and how to make predictions using the LSRL.
𝑦 =𝑎+𝑏𝑥 𝑦 =2.3+4.1𝑥 𝑥 vs 𝑦 𝑦 =𝑎+𝑏𝑙𝑜𝑔 𝑥 𝑦 =2.3+4.1𝑙𝑜𝑔 𝑥 log(𝑥) vs 𝑦 ln(𝑥) 𝑦 =𝑎+𝑏𝑙𝑜𝑔 𝑥 𝑦 =2.3+4.1𝑙𝑜𝑔 𝑥 log(𝑥) vs 𝑦 ln(𝑦) 𝑦 =𝑎 𝑏 𝑥 𝑦 =2.3 4.1 𝑥 𝑥 vs log 𝑦 ln(𝑥) ln(𝑦) 𝑦 =𝑎 𝑥 𝑏 𝑦 =2.3 𝑥 4.1 log 𝑥 vs log 𝑦
However, this is the incorrect line of best fit. L1 → X L2 → Y Nope! Looks exponential 𝑡𝑟𝑎𝑛𝑠 =−437897523.2+39306478.98(𝑦𝑒𝑎𝑟𝑠) 𝑟=0.669 However, this is the incorrect line of best fit. We need to transform the data.
Since we plot 𝑥 vs ln(𝑦), we use the form ln 𝑦 =𝑎+𝑏𝑥. Yes! ln(transistors) years since 1970 LinReg for L1 and LNY Since we plot 𝑥 vs ln(𝑦), we use the form ln 𝑦 =𝑎+𝑏𝑥. ln( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=7.065+0.366(𝑦𝑒𝑎𝑟𝑠 𝑠𝑖𝑛𝑐𝑒 1970) 𝑟=0.994 much better!
ln( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=7.065+0.366(𝑦𝑒𝑎𝑟𝑠 𝑠𝑖𝑛𝑐𝑒 1970) 2045−1970=75 ln( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=34.515 𝑒 ln( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 ) = 𝑒 34.515 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 =9.77× 10 14 977 trillion What do we need to be careful about? Extrapolation!
Residual Plot (𝒙 vs resid): stat plot → L1 vs L4 𝑡= 𝑏−𝛽 S E 𝑏 Constant 7.065 # yrs since 1970 0.366 34.91 5.14× 10 −15 0.544 98.86% 𝑡cdf(34.91, 9999, 14)×2 𝑟𝑒𝑠𝑖 𝑑 2 𝑛−2 𝑟 2 Predicted values: L3 →7.065+0.366(L1) Res Residuals (Obs − Exp): years since 1970 L4 → LNY − L3 Residual Plot (𝒙 vs resid): stat plot → L1 vs L4
Make a list called LOGY → log(L1) Yes! Make a list called LOGY → log(L1) Plot L1 vs LOGY log( 𝑡𝑟𝑎𝑛𝑠 )=3.068+0.159(𝑦𝑒𝑎𝑟𝑠) LinReg → L1 and LOGY log( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=3.068+0.159(𝑦𝑒𝑎𝑟𝑠 𝑠𝑖𝑛𝑐𝑒 1970) log( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=3.068+0.159(75) log( 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 )=14.993 10 log 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 = 10 14.993 𝑡𝑟𝑎𝑛𝑠𝑖𝑠𝑡𝑜𝑟𝑠 =9.84× 10 14 984 trillion Very close to the prediction when using natural log