Bi-Variate Data PPDAC

Types of data We are looking for a set of data that is affected by the other data sets in our spreadsheet. This variable is called dependent because its values are affected by the other data sets Sometimes it is called the response variable (because it responds) This variable must be “variable 2” on iNZight i.e. on the y-axis.

Types of data The x-axis is called the explanatory variable or the independent variable Explanatory or Independent Variable Dependant or Response Variable

We can only plot scatter diagrams of numerical data, non numerical data like colours, countries etc cannot be plotted Choosing your variables

You may like to look at ‘advanced’ – ‘Scatter Plot Matrix’ to get an overview of all the combinations of graphs. Look for areas where the fit isn’t good. ◦Clusters ◦Fanning out or in (data points are further away from the trendline) ◦Gaps in data

Drawing Graphs Import the correct CSV file into iNZight and click and drag variables into variable 1 and 2 positions. For each scatter diagram you must add a linear trend line and note the equation and ‘r’ value.

For Achieved For Achieved. 1. Problem: Write a question that clearly investigates the relationship between two variables. 2. Plan: I will use iNZight to produce a scatter plot and equation. I will observe the graph to decide if the equation is valid. 3. Data: Describe the data including the correct units and show some understanding of the context. 4. Analyse: Use iNZight to draw a scatter graph and produce the trend curve.

5. Analyse: Describe what you see in the scatter graph (use T.A.R.S.O.G. for this). 6. Analyse: Describe the relationship between the two variables in terms of "as xxx increases, yyy...“ 7. Predicition: Make a prediction (interpolation) using the iNZight equation, is it valid? Reliable? 8. Conclusion: Answer your problem question, is there a relationship? For Achieved

Purpose statement (Basic) Problem:This report considers the relationship between the stride length and the time to complete a marathon in minutes for the purpose of predicting the time to run a marathon. Plan: The independent variable is the stride length which is measured in centimeters. The dependent variable is the marathon minutes, measured in minutes. Data: The data is a sample taken from marathons in NZ.

4. Analysis

5. Analysis T is for trend, is it linear or not? A is for association, is it positive or negative? R is for relationship, is it strong or weak? S is for scatter, is it constant or not? Fan? O is for outliers, can you spot any? G is for groups, are there any?

6. Analysis As the carrot increases the price of the diamond increases. For every increase in carrot the price increases by approximately $7800.

7. Predictions Must include ◦A interpolation and extrapolation ◦A comment about the strength of the prediction (critique)

8. Conclusion Answer your purpose statement by highlighting the key points of the analysis.

Correlation Coefficient. The correlation coefficient is a number value between -1 and 1 The sign shows if it is positive or negative correlation These are only a guide (different books give different values.) R=1Perfect Correlation 1>r>0.9Very Strong 0.9>r>0.7Significant 0.7>r>0.3Weak 0.3>r>0No correlation

Correlation Coefficient r = -1r = 0 r = 1 It is only designed to measure linear relationships! (Not appropriate for curved relationships)

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

