Multivariate Data Summary. Linear Regression and Correlation.

Slides:



Advertisements
Similar presentations
Displaying Data Objectives: Students should know the typical graphical displays for the different types of variables. Students should understand how frequency.
Advertisements

Regression Analysis Simple Regression. y = mx + b y = a + bx.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Concept Quiz Ch. 1-3 True/False
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Ch 2 and 9.1 Relationships Between 2 Variables
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Alok Srivastava Chapter 2 Describing Data: Graphs and Tables Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series.
Correlation & Regression Math 137 Fresno State Burger.
Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
Analyzing Data: Bivariate Relationships Chapter 7.
How do scientists show the results of investigations?
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
© The McGraw-Hill Companies, Inc., 2000 Business and Finance College Principles of Statistics Lecture 10 aaed EL Rabai week
Elementary Statistical Concepts
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
VCE Further Maths Chapter Two-Bivariate Data \\Servernas\Year 12\Staff Year 12\LI Further Maths.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Warm Up The number of motor vehicles registered (in millions) in the U.S. has grown as charted in the table. 1)Plot the number of vehicles against time.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Chapter 3: Displaying and Describing Categorical Data Sarah Lovelace and Alison Vicary Period 2.
Aim: How do we analyze data with a two-way table?
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
Multivariate Data. Descriptive techniques for Multivariate data In most research situations data is collected on more than one variable (usually many.
Multivariate data. Regression and Correlation The Scatter Plot.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
The p-value approach to Hypothesis Testing
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Chapter 0: Why Study Statistics? Chapter 1: An Introduction to Statistics and Statistical Inference 1
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Unit 6, Module 15 – Two Way Tables (Part I) Categorical Data Comparing 2.
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
Unit 2: Exploring Data with Graphs and Numerical Summaries Lesson 2-2a – Graphs for Categorical Data Probability & Stats Essential Question: How do we.
Stats Introduction to Statistical Methods. Instructor:W.H.Laverty Office:235 McLean Hall Phone: Lectures: M T W Th F 11:00am - 12:20pm Geol.
2.3 Other Types of Graphs Instructor: Alaa saud Note: This PowerPoint is only a summary and your main source should be the book.
Multivariate Data Summary. Linear Regression and Correlation.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Descriptive Statistics: Tabular and Graphical Methods
Elementary Statistics
Correlation & Regression
Graphical & Tabular Descriptive Techniques
Review 1. Describing variables.
Chapter 11 Chi-Square Tests.
Basic Statistics Overview
Chapter 2 Describing Data: Graphs and Tables
Comparing k Populations
Correlation and Regression
STEM Fair Graphs & Statistical Analysis
Multivariate Data Summary
Comparing k Populations
Chapter 2 Looking at Data— Relationships
Chapter 11 Chi-Square Tests.
Correlation and Regression
Comparing k Populations
Correlation and Regression
Displaying Data – Charts & Graphs
Chapter 11 Chi-Square Tests.
Displaying and Describing Categorical Data
Presentation transcript:

Multivariate Data Summary

Linear Regression and Correlation

Pearson’s correlation coefficient r.

Slope and Intercept of the Least Squares line

Scatter Plot Patterns r = 0.0 r = +0.7 r = +0.9r = +1.0

r = -0.7 r = -0.9r = -1.0

Non-Linear Patterns r can take on arbitrary values between -1 and +1 if the pattern is non-linear depending or how well your can fit a straight line to the pattern

The Coefficient of Determination

An important Identity in Statistics (Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)

It can also be shown: = proportion variability in Y explained by X. = the coefficient of determination

Categorical Data Techniques for summarizing, displaying and graphing

The frequency table The bar graph Suppose we have collected data on a categorical variable X having k categories – 1, 2, …, k. To construct the frequency table we simply count for each category (i) of X, the number of cases falling in that category (f i ) To plot the bar graph we simply draw a bar of height f i above each category (i) of X.

Example In this example data has been collected for n = 34,188 subjects. The purpose of the study was to determine the relationship between the use of Antidepressants, Mood medication, Anxiety medication, Stimulants and Sleeping pills. In addition the study interested in examining the effects of the independent variables (gender, age, income, education and role) on both individual use of the medications and the multiple use of the medications.

The variables were: 1.Antidepressant use, 2.Mood medication use, 3.Anxiety medication use, 4.Stimulant use and 5.Sleeping pills use. 6.gender, 7.age, 8.income, 9.education and 10.Role – i.Parent, worker, partner ii.Parent, partner iii.Parent, worker iv.worker, partner v.worker only vi.Parent only vii.Partner only viii.No roles

Frequency Table for Age

Bar Graph for Age

Frequency Table for Role

Bar Graph for Role

The pie chart An alternative to the bar chart Draw a circle (a pie) Divide the circle into segments with area of each segment proportional to f i or p i = f i /n

Example In this study the population are individuals who received a head injury. (n = 22540) The variable is the mechanism that caused the head injury (InjMech) with categories: –MVA (Motor vehicle accident) –Falls –Violence –Other VA (Other vehicle accidents) –Accidents (industrial accident) –Other (all other mechanisms for head injury)

Graphical and Tabular Display of Categorical Data. The frequency table The bar graph The pie chart

The frequency table

The bar graph

The pie chart

Multivariate Categorical Data

The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count x ij = the number of subjects for which R = i and C = j. R = rows, C = columns

Example Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: < The categories for Chlosterol are: <

Table: two-way frequency Serum Cholesterol Systolic Blood pressure < Total < Total

Example This comes from the drug use data. The two variables are: 1. Age (C) and 2.Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

Two-way Frequency Table Percentage antidepressant use vs Age

The  2 statistic for measuring dependence amongst two categorical variables Define = Expected frequency in the (i,j) th cell in the case of independence.

Columns 12345Total 1x 11 x 12 x 13 x 14 x 15 R1R1 2x 21 x 22 x 23 x 24 x 25 R2R2 3x 31 x 32 x 33 x 34 x 35 R3R3 4x 41 x 42 x 43 x 44 x 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 N

Columns 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n

Justification 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n Proportion in column j for row i overall proportion in column j

and 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n Proportion in row i for column j overall proportion in row i

The  2 statistic E ij = Expected frequency in the (i,j) th cell in the case of independence. x ij = observed frequency in the (i,j) th cell

Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent. Both were measured for a sample of n = 1237 cases

Serum Cholesterol Systolic Blood pressure < Total < Total Observed frequencies

Serum Cholesterol Systolic Blood pressure < Total < Total Expected frequencies In the case of independence the distribution across a row is the same for each row The distribution down a column is the same for each column

Standardized residuals The  2 statistic

Example This comes from the drug use data. The two variables are: 1. Role (C) and 2.Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

Two-way Frequency Table Percentage antidepressant use vs Role

Calculation of  2 The Raw data Expected frequencies

The Residuals The calculation of  2

Example In this example n = individuals who had been victimized twice by crimes Rows = crime of first vicitmization Cols = crimes of second victimization

Next Topic: Brief introduction to Statistical Packages