Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictions 3.9 Bivariate Data.

Similar presentations


Presentation on theme: "Predictions 3.9 Bivariate Data."— Presentation transcript:

1 predictions 3.9 Bivariate Data

2 Linear Model After we have assigned a linear model to our data, we need to interpret this model and make predictions from the equation given.

3 Last Years Internal Data

4 Interpreting The Equation
This is our explanatory variable. It is on the x axis. What you are predicting. This is what is on the y axis. Gradient (slope of the line). This means multiply. This is the y intercept. Where the regression line meets the y axis.

5 Interpreting The Equation
Don’t forget the units and to round the answer appropriately! Our equation becomes: Gross Profit = × Budget

6 Predictions Using the equation of the linear model we can make predictions for the gross profit of a movie, using the budget of the movie.

7 Predictions Using the equation of the linear model we can make predictions for the gross profit of a movie, using the budget of the movie. For example, predict the gross profit of a movie, given it has a budget of $60 million USD. Gross Profit = × = $ million USD “Using the linear model, I can predict that for a budget of $60 million USD, a movie can expect to earn a gross profit of $224.2 million USD.”

8 Predictions For example, predict the gross profit of a movie, given it has a budget of $250 million USD. Gross Profit = × = $589.1 million USD “Using the linear model, I can predict that for a budget of $250 million USD, a movie can expect to earn a gross profit of $589.1 million USD.”

9 Task Using the ‘Fast Food’ data, predict the energy for a product with: 50g of fat 90g of fat

10 Accuracy/Appropriateness Of Predictions
Possible ideas to comment on: How well the data points fit the linear model given (visual description) Effects of clusters and unusual points on the trend line Issues with interpolation and extrapolation Context and research into the data which could alter prediction Change in data shape (e.g. piecewise) Residual graph (optional) Models which fit better (visual proof) Compare to new model where 3rd variable is introduced

11 1. How Well The Points Follow The Line
You need to give a visual description of the points in relation to the regression line. How close are they to the line, to each other? Above or below the line? Is the relationship weak, moderate or strong? Is the scatter consistent, varied, evenly spread out, does it change in shape? This shows evidence/suggests that the model is good/reliable for predictions. Are there other data points with similar x values? How close are they to your predicted y value?

12

13 2. Effects Of Clusters and Unusual Values
Are there any clusters that have may influenced the regression line? Any big groups/clusters of data? Is the data split in to two clusters? Are these clusters above/below the line? Does this affect the line by moving it upwards/downwards? Are there any gaps where there is no data? Could collecting some data be more useful and perhaps improve the reliability of our predictions? Are there any unusual values that may have influenced the regression line? Where are these unusual values? What are their x and y values? How would they have affected the regression line? Moved/pulled it upwards/downwards?

14

15 3. Interpolation and Extrapolation
Interpolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is within the range of values of that variable for the data on which the estimation is based. Extrapolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is outside the range of values of that variable for the data on which the estimation is based.

16 3. Interpolation and Extrapolation
Interpolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is within the range of values of that variable for the data on which the estimation is based. Extrapolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is outside the range of values of that variable for the data on which the estimation is based. We have to be a lot more cautious when extrapolating because we are predicting values outside of the range of data. Therefore we have to rely on the validity of our model as we have no data values to compare our prediction to.

17 3. Interpolation and Extrapolation
Interpolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is within the range of values of that variable for the data on which the estimation is based. Extrapolation: The process of estimating the value of one variable based on knowing the value of the other variable, where the known value is outside the range of values of that variable for the data on which the estimation is based. We have to be a lot more cautious when extrapolating because we are predicting values outside of the range of data. Therefore we have to rely on the validity of our model as we have no data values to compare our prediction to. When we extrapolate, we are assuming that the trend continues at the same rate.

18

19

20

21 4. Context and Research There is a chance that after doing some research or investigating the context of the data, we may find something that may alter the validity of our predictions.

22 5. Change in Data Shape (e.g. Piecewise)
Is there a limit to one of our variables? Would it be sensible to split the graph into two different trend lines? Could our graph plateau at any stage?

23 6. Models Which Fit Better (Visual Proof)
Linear Model Quadratic Model

24 7. Residual Graph (Optional)
Dotted blue line is linear trend used to make predictions. Green lines can be ignored. Residuals are the distance between data points and the linear trend line. These are smoothed into the red line which represents the best fit for this data If the red residual line closely matches the blue linear trend line, then predictions in that range may be reliable. If blue linear trend line is above the red residual line, then predictions are likely to be overstated. If blue linear trend line is below the red residual line, then predictions are likely to be understated. Make comments about what you see as supporting the validity, or otherwise, of your predictions.

25

26 The residual line for budgets of less than $10m is considerably below the linear trend, pulled down by the cluster of low budget/low profit movies. For budgets in the $10m to $100m range, there are more movies with gross profit higher than the trend line. This would support my conclusion that the prediction of $157m profit for a movie with a $25m budget may not be reliable.

27 The residual line for budgets of less than $10m is considerably below the linear trend, pulled down by the cluster of low budget/low profit movies. For budgets in the $10m to $100m range, there are more movies with gross profit higher than the trend line. This would support my conclusion that the prediction of $157m profit for a movie with a $25m budget may not be reliable. At the end of data range, the residual line is heading upwards and away from the trend line, pulled up by the $200m budget/$500m profit movie. It is likely that this separation upwards of the residual line and the linear trend would continue to get wider at budgets of over $200m. This would support my conclusion that the prediction of $590m profit for a movie with a $250m budget would not be reliable.

28 8. Compare to New Model (3rd Variable)
For Excellence you need to introduce a 3rd variable that could be related to the relationship you are investigating. This can be done by colour coding variables or splitting the graph in to two scatter plots.

29 8. Compare to New Model (3rd Variable)
Colour Coding Instructions: Add To Plot Code More Variables (should already be selected) Colour Code by (choose a variable) Show Changes

30 Notice that the values will the highest BMI values are all Field athletes.

31 8. Compare to New Model (3rd Variable)
Sub-setting Instructions: Choose a categorical variable to subset by Click on the variable name and drag it to “drop name here” Copy and paste your graph into your assignment. Click “Get Summary” and get the results of the equations and correlation coefficient. These can be compared to the original linear model.

32 8. Compare to New Model (3rd Variable)
Sub-setting by a variable: Sex

33 Example Budget vs. Gross Profit, subset by Awards (Yes or No)

34 Example

35

36 8. Compare to New Model (3rd Variable)
Comment on what variable you are using to create a new model/subset by. Comment on why you chose this particular variable i.e. do you think it is related to the original two variables in some way? Would it help explain the relationship? Describe what you see for EACH scatter plot. Make sure you have a visual description of the points and COMPARE them i.e. comment on similarities and differences. Comment on the correlation coefficients. Is there a change in the coefficients from the original model? Is the relationship strong/moderate/weak? Would this give us better predictions? Research! Include and research about the third variable and its possible relationship with the original two variables. Any other interesting or relevant research.

37 9. Change The Independent Variable
Select a new independent variable (x axis) and keep the same dependent variable (response variable on the y axis). Write a brief paragraph describing the relationship (i.e. trend, direction, strength, scatter, unusual values, clusters). Does it give a better prediction for your response variable? Stronger correlation (visual description as well as coefficient)? Scatter closer to trend line? What are the limitations for this model?


Download ppt "Predictions 3.9 Bivariate Data."

Similar presentations


Ads by Google