Presentation is loading. Please wait.

Presentation is loading. Please wait.

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

Similar presentations


Presentation on theme: "26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting."— Presentation transcript:

1 26134 Business Statistics Mahrita.Harahap@uts.edu.au Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting associations using scatterplots 2. Dependent and independent variables 3. Bivariate regression 4. Difference between cause and effect between two variables vs. a relationship between two variables 5. Regression and correlation 1

2 In statistics we usually want to statistically analyse a population but collecting data for the whole population is usually impractical, expensive and unavailable. That is why we collect samples from the population (sampling) and make inferences about the population parameters using the statistics of the sample (inferencing) with some level of accuracy (confidence level). A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a subset of the population of interest.

3 Regression The linear regression line characterises the relationship between two numerical variables. Using regression analysis on data can help us draw insights about that data. It helps us understand the impact of one of the variables on the other. It examines the relationship between one independent variable (predictor/explanatory) and one dependent variable (response/outcome). The linear regression line equation is based on the equation of a line in mathematics. β0+β1Xβ0+β1X

4 X: Predictor Variable Explanatory Variable Independent Variable Variable one can control. Y: Outcome variable Response Variable Dependent Variable The outcome to be measured/predicted.

5 Correlation Correlation measures the association between two numerical variables with the strength of the relationship measured by the correlation coefficient r. A statistic that quantifies a linear relation between two variables Falls between -1.00 and 1.00 The sign of the number indicates the direction of relationship. The value of the number indicates the strength of the relation. NOTE: Regression examines the relationship between one independent variable and one dependent variable. That is the slope of the linear regression. Correlation indicates the association between two metric variables with the strength and direction of the relationship measured by the correlation coefficient.

6 Strength & Direction of Correlation DIRECTION: POSITIVE NEGATIVE STRENGTH: PERFECT STRONG MODERATE WEAK

7 Difference between cause and effect between two variables vs. a relationship between two variables Cause and effect implies that one variable directly causes change in the other. A relationship implies variables move in the same or opposite direction together, which may be caused by another variable not currently used in the model. If two variables are associated with each other it does not mean one variable directly affects or causes the other. https://www.youtube.com/watch?v=taA0DWqi_jM

8 8 1.On EXCEL to get the “Data Analysis” pack, click File>Options>Add-In>Manage: Go>Analysis toolpack>Ok>Data>Data Analysis>Regression>Ok 2.For the scatterplot graph, click insert>scatter>select data>select data range (make sure x is horizontal and y is vertical). Right-click on data points and click “add trendline”>click on “add regression equation” 3.d) For each additional employee, the average profit per dollar of sales increases by 2.14 cents Regression on Excel

9 9 Q1.1. In this bivariate analysis which is the dependent variable and which is the independent variable? Independent variable: advertisements in sports magazine Dependent: level of sales Q1.2. Which statistical technique should be used to establish the strength of association between these two variables? Correlation Q1.3. Draw a diagram representing the expected direction of the relationships described. Be sure to label axes.

10 10 Q1.4. Which statistical technique would be used to understand the impact of one of the variables on the other? Q1.5. What are some of the statistical assumptions being used in applying this statistical technique and how can these be verified? Q1.6. What is the benefit of using the statistical technique for understanding the relationship between two variables compared to understanding an association? Regression analysis. Assumptions are: a) the relationship between the two variables is linear (verified by using a scatter plot), b) it evaluates the magnitude of relationship and used for prediction, but no cause and effect can be attributed (verified by theory) c) variables are numeric variables (verified by interval or ratio metric scales) d) error terms are independent and are normally distributed (i.e. normal bell shaped curve) Correlation shows the direction and strength of association with a value between -1 (perfectly negative association) to +1 (perfectly positive association), whereas regression allows for the impact of one variable on the other to be established and a predictive model to be created. Regression measures the slope of the linear equation.

11 11 Q2.1. Which value would you use to determine the relationship between the two variables and does the direction of this relationship make sense? Beta Coefficient =.459., yes as we would expect higher food quality would lead to customer’s returning.

12 Hypothesis Testing We use hypothesis testing to infer conclusions about the population parameters based on analysing the statistics of the sample. In statistics, a hypothesis is a statement about a population parameter. 1. The null hypothesis, denoted H 0 is a statement or claim about a population parameter that is initially assumed to be true. Is always an equality. (Eg. H 0 : β 1 =0) 2. The alternative hypothesis, denoted by H 1 is the competing claim. What we are trying to prove. (Eg. H 1 : β 1 ≠ 0) 3. Test Statistic: a measure of compatibility between the statement in the null hypothesis and the data obtained. 4. Decision Criteria: The P-value is the probability of obtaining a test statistic as extreme or more extreme than the observed sample value given H 0 is true. If p-value≤0.05 reject H o If p-value>0.05 do not reject H o 5. Conclusion: Make your conclusion in context of the problem.

13 13 Q2.2. How do we use the t statistic and what does the significance tell us about these variables? This hypothesis test will tell us if there is enough evidence in our sample data to tell us if there is a significant linear relationship. H 0 : β 1 =0. There is no association between the dependent variable and the independent variable. (There is no significant linear relationship) i.e. y= β 0 + 0*x H 1 : β 1 ≠0. The independent variable will affect the dependent variable. (There is a significant linear relationship t) i.e y= β 0 + β 1 *x Test Statistic: The t-test tells us whether the INDIVIDUAL regression coefficient is different enough from zero to be statistically significant. P-value=0.009 Since p-value=0.009<0.05 (level of significance) we reject the null hypothesis and conclude that we have enough statistical evidence to prove that there is a significant linear relationship between the two variables.

14 R 2 Coefficient of Determination Tells us the amount of variation explained in the dependent variable that is accounted for by the independent variable. 14

15 15 Q2.3. What does the R 2 tell us about the relationship between food quality and customers returning? Q2.4. How much does perception of food quality not explain whether a customer would return to a restaurant? Q2.5. List three other variables that may explain whether a customer would return to Joe’s restaurant. An R 2.263 means food quality explains 26.3% of the variation in whether a customer will return. Food quality does not explain (1 -.263=)73.7% of the variation in customers returning. Other variables include price, location, food cuisine, quality of staff, ambience etc. This means this requires multivariate regression analysis (next week’s topic.)

16 16 Sons Height= 33.73 + 0.516 x Father’s Height Interpret the Coefficients:


Download ppt "26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting."

Similar presentations


Ads by Google