Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Data Summary

Similar presentations


Presentation on theme: "Multivariate Data Summary"— Presentation transcript:

1 Multivariate Data Summary

2 Linear Regression and Correlation

3 Pearson’s correlation coefficient r.

4 Slope and Intercept of the Least Squares line

5 Scatter Plot Patterns r = 0.0 r = +0.7 r = +0.9 r = +1.0

6 r = -0.7 r = -0.9 r = -1.0

7 Non-Linear Patterns r can take on arbitrary values between -1 and +1 if the pattern is non-linear depending or how well your can fit a straight line to the pattern

8 The Coefficient of Determination

9 An important Identity in Statistics
(Total variability in Y) = (variability in Y explained by X) + (variability in Y unexplained by X)

10 It can also be shown: = proportion variability in Y explained by X. = the coefficient of determination

11 Techniques for summarizing, displaying and graphing
Categorical Data Techniques for summarizing, displaying and graphing

12 The frequency table The bar graph
Suppose we have collected data on a categorical variable X having k categories – 1, 2, … , k. To construct the frequency table we simply count for each category (i) of X, the number of cases falling in that category (fi) To plot the bar graph we simply draw a bar of height fi above each category (i) of X.

13 Example In this example data has been collected for n = 34,188 subjects. The purpose of the study was to determine the relationship between the use of Antidepressants, Mood medication, Anxiety medication, Stimulants and Sleeping pills. In addition the study interested in examining the effects of the independent variables (gender, age, income, education and role) on both individual use of the medications and the multiple use of the medications.

14 Anxiety medication use, Stimulant use and Sleeping pills use. gender,
The variables were: Antidepressant use, Mood medication use, Anxiety medication use, Stimulant use and Sleeping pills use. gender, age, income, education and Role – Parent, worker, partner Parent, partner Parent, worker worker, partner worker only Parent only Partner only No roles

15 Frequency Table for Age

16 Bar Graph for Age

17 Frequency Table for Role

18 Bar Graph for Role

19 The pie chart An alternative to the bar chart Draw a circle (a pie)
Divide the circle into segments with area of each segment proportional to fi or pi = fi /n

20 Example In this study the population are individuals who received a head injury. (n = 22540) The variable is the mechanism that caused the head injury (InjMech) with categories: MVA (Motor vehicle accident) Falls Violence Other VA (Other vehicle accidents) Accidents (industrial accident) Other (all other mechanisms for head injury)

21 Graphical and Tabular Display of Categorical Data.
The frequency table The bar graph The pie chart

22 The frequency table

23 The bar graph

24 The pie chart

25 Multivariate Categorical Data

26 The two way frequency table
The c2 statistic Techniques for examining dependence amongst two categorical variables

27 Situation We have two categorical variables R and C.
The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count xij = the number of subjects for which R = i and C = j. R = rows, C = columns

28 Example Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: < The categories for Chlosterol are: <

29 Table: two-way frequency
Serum Cholesterol Systolic Blood pressure <127 167+ Total < 200 117 121 47 22 307 85 98 43 20 246 115 209 68 439 260+ 67 99 46 33 245 388 527 204 118 1237

30 Example This comes from the drug use data. The two variables are:
Age (C) and Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

31 Two-way Frequency Table
Percentage antidepressant use vs Age

32

33 The c2 statistic for measuring dependence amongst two categorical variables
Define = Expected frequency in the (i,j) th cell in the case of independence.

34 Columns 1 2 3 4 5 Total x11 x12 x13 x14 x15 R1 x21 x22 x23 x24 x25 R2 x31 x32 x33 x34 x35 R3 x41 x42 x43 x44 x45 R4 C1 C2 C3 C4 C5 N

35 Columns 1 2 3 4 5 Total E11 E12 E13 E14 E15 R1 E21 E22 E23 E24 E25 R2 E31 E32 E33 E34 E35 R3 E41 E42 E43 E44 E45 R4 C1 C2 C3 C4 C5 n

36 Justification Proportion in column j for row i
overall proportion in column j 1 2 3 4 5 Total E11 E12 E13 E14 E15 R1 E21 E22 E23 E24 E25 R2 E31 E32 E33 E34 E35 R3 E41 E42 E43 E44 E45 R4 C1 C2 C3 C4 C5 n

37 and Proportion in row i for column j overall proportion in row i 1 2 3
4 5 Total E11 E12 E13 E14 E15 R1 E21 E22 E23 E24 E25 R2 E31 E32 E33 E34 E35 R3 E41 E42 E43 E44 E45 R4 C1 C2 C3 C4 C5 n

38 The c2 statistic Eij= Expected frequency in the (i,j) th cell in the case of independence. xij= observed frequency in the (i,j) th cell

39 Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol
In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent. Both were measured for a sample of n = 1237 cases

40 Systolic Blood pressure
Observed frequencies Serum Cholesterol Systolic Blood pressure <127 167+ Total < 200 117 121 47 22 307 85 98 43 20 246 115 209 68 439 260+ 67 99 46 33 245 388 527 204 118 1237

41 Systolic Blood pressure
Expected frequencies Serum Cholesterol Systolic Blood pressure <127 167+ Total < 200 96.29 130.79 50.63 29.29 307 77.16 104.8 40.47 23.47 246 137.70 187.03 72.40 41.88 439 260+ 76.85 104.38 40.04 23.37 245 388 527 204 118 1237 In the case of independence the distribution across a row is the same for each row The distribution down a column is the same for each column

42

43 Standardized residuals
The c2 statistic

44 Example This comes from the drug use data. The two variables are:
Role (C) and Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

45 Two-way Frequency Table
Percentage antidepressant use vs Role

46

47 Calculation of c2 The Raw data Expected frequencies

48 The Residuals The calculation of c2

49 Example In this example n = individuals who had been victimized twice by crimes Rows = crime of first vicitmization Cols = crimes of second victimization

50

51

52 Brief introduction to Statistical Packages
Next Topic: Brief introduction to Statistical Packages


Download ppt "Multivariate Data Summary"

Similar presentations


Ads by Google