DTC Quantitative Research Methods Descriptive Statistics Thursday 16 th October 2014.

Slides:



Advertisements
Similar presentations
Quantitative Methods in Social Research 2010/11
Advertisements

Quantitative Methods in Social Research 2010/11 Week 5 (morning) session 11th February 2011 Descriptive Statistics.
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
QUANTITATIVE DATA ANALYSIS
Session 7.1 Bivariate Data Analysis
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Analysis of Research Data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Social Research Methods
Data observation and Descriptive Statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Measures of Central Tendency
Chapter 8: Bivariate Regression and Correlation
Math 116 Chapter 12.
LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Correlation and regression 1: Correlation Coefficient
Some Introductory Statistics Terminology. Descriptive Statistics Procedures used to summarize, organize, and simplify data (data being a collection of.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
Chapter 3 Statistical Concepts.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Research Methods Chapter 8 Data Analysis. Two Types of Statistics Descriptive –Allows you to describe relationships between variables Inferential –Allows.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
UTOPPS—Fall 2004 Teaching Statistics in Psychology.
Categorical vs. Quantitative…
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Chapter Eight: Using Statistics to Answer Questions.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
Introduction to statistics I Sophia King Rm. P24 HWB
Descriptive Statistics(Summary and Variability measures)
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Outline Sampling Measurement Descriptive Statistics:
Statistical analysis.
DTC Quantitative Methods Descriptive Statistics Thursday 26th January  
Statistical analysis.
Bi-variate #1 Cross-Tabulation
APPROACHES TO QUANTITATIVE DATA ANALYSIS
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Description of Data (Summary and Variability measures)
Social Research Methods
Basic Statistical Terms
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter Nine: Using Statistics to Answer Questions
Advanced Algebra Unit 1 Vocabulary
Presentation transcript:

DTC Quantitative Research Methods Descriptive Statistics Thursday 16 th October 2014

Some relevant online course extracts Cramer (1998) Chapter 2: - Measurement and univariate analysis. Diamond and Jefferies (2001) Chapter 5: - Measures and displays of spread. Sarantakos (2007) Chapter 5: - Graphical displays. Huizingh (2007) Chapter 12: - SPSS material.

Some basic terminology Quantitative measures are typically referred to as variables. Some variables are generated directly via the data generation process, but other, derived variables may be constructed from the original set of variables later on. As the next slide indicates, variables are frequently referred to in more specific ways.

Cause(s) and effect…? Often, one variable (and occasionally more than one variable) is viewed as being the dependent variable. Variables which are viewed as impacting upon this variable, or outcome, are often referred to as independent variables. However, for some forms of statistical analyses, independent variables are referred to in more specific ways (as can be seen within the menus of SPSS for Windows)

Levels of measurement (Types of quantitative data) A nominal variable relates to a set of categories such as ethnic groups or political parties which is not ordered. An ordinal variable relates to a set of categories in which the categories are ordered, such as social classes or levels of educational qualification. An interval-level variable relates to a ‘scale’ measure, such as age or income, that can be subjected to mathematical operations such as averaging.

How many variables? The starting point for statistical analyses is typically an examination of the distributions of values for the variables of interest. Such examinations of variables one at a time are a form of univariate analysis. Once a researcher moves on to looking at relationships between pairs of variables she or he is engaging in bivariate analyses. … and if they attempt to explain why two variables are related with reference to another variable or variables they have moved on to a form of multivariate analysis.

Looking at categorical variables For nominal/ordinal variables this largely means looking at the frequencies of each category, often pictorially using, say, bar- charts or pie-charts. It is usually easier to get a sense of the relative importance of the various categories if one converts the frequencies into percentages!

Example of a frequency table Frequency % At school, college or university At/through work In a pub/cafe/restaurant/ bar/club At a social event organised by friend(s) Other TOTAL Place met marital or cohabiting partner

Example of a pie-chart At school, college or university At/through work In a pub/cafe/restaurant/ bar/club At a social event organised by friend(s) Other

What are percentages? It may seem self-evident, but percentages are a form of descriptive statistic Specifically, they are useful in describing the distributions (of frequencies) for nominal or ordinal (i.e. categorical) variables When we consider interval-level variables or more than one variable, we need (somewhat) more sophisticated descriptive statistics

Descriptive statistics are data summaries which provide an alternative to graphical representations of distributions of values (or relationships)... aim to describe key aspects of distributions of values (or relationships)... are of most relevance when we are thinking about interval-level variables (scales)

Description or inference? Descriptive statistics summarise relevant features of a set of values. Inferential statistics help researchers decide whether features of quantitative data from a sample can be safely concluded to be present in the population. Generalizing from a sample to a population is part of the process of statistical inference One objective may be to produce an estimate of the proportion of people in the population with a particular characteristic, i.e. a process of estimation.

Types of (univariate) descriptive statistics Measures of location (averages)... spread... skewness (asymmetry)... kurtosis We typically want to know about the first two, sometimes about the third, and rarely about the fourth!

What is ‘kurtosis’ anyway? Increasing kurtosis is associated with the “movement of probability mass from the shoulders of a distribution into its center and tails.” (Balanda, K.P. and MacGillivray, H.L ‘Kurtosis: A Critical Review’, The American Statistician 42:2: 111–119.) Below, kurtosis increases from left to right...

Visualising ‘scale’ variables For interval-level data the appropriate visual summary of a distribution is a histogram, examining which can allow the researcher to assess whether it is reasonable to assume that the quantity of interest has a particular distributional shape (and whether it exhibits skewness). Unlike bar charts, distances along the ‘horizontal’ dimension of a histogram have a well-defined, consistent meaning: i.e. they represent differences between values on the interval-level scale in question.

Example of a histogram

Measures of location Mean (the arithmetic average of the values, i.e. the result of dividing the sum of the values by the total number of cases) Median (the middle value, when the values are ranked/ordered) Mode (the most common value)

... and measures of spread Standard deviation (and Variance) (This is linked with the mean, as it is based on averaging [squared] deviations from it. The variance is simply the standard deviation squared). Interquartile range / Quartile deviation (These are linked with the median, as they are also based on the values placed in order).

Measures of location and spread: an example (household size) Mean = 2.94, Median = 2, Mode = 2 Mean = 2.96, Median = 3, Mode = 2 s.d. = 1.93, skewness = 2.10; kurtosis = 5.54 s.d. = 1.58, skewness = 1.27; kurtosis = 2.24 West Midlands London

Why is the standard deviation so important? The standard deviation (or, more precisely, the variance) is important because it introduces the idea of summarising variation in terms of summed, squared deviations. And it is also central to some of the statistical theory used in statistical testing/statistical inference...

An example of the calculation of a standard deviation Number of seminars attended by a sample of undergraduates: 5, 4, 4, 7, 9, 8, 9, 4, 6, 5 Mean = 61/10 = 6.1 Variance = ((5 – 6.1) 2 + (4 – 6.1) 2 + (4 – 6.1) 2 + (7 – 6.1) 2 + (9 – 6.1) 2 + (8 – 6.1) 2 + (9 – 6.1) 2 + (4 – 6.1) 2 + (6 – 6.1) 2 + (5 – 6.1) 2 )/(10 – 1) = 36.9 /9 = 4.1 Standard deviation = Square root of variance = 2.025

The Empire Median Strikes Back! Comparing descriptive statistics between groups can be done graphically in a rather nice way using a form of display called a ‘boxplot’. Boxplots are based on medians and quartiles rather than on the more commonly found mean and standard deviation.

Example of a boxplot

Moving on to bivariate ‘descriptive statistics'... These are referred to as ‘Measures of association’, as they quantify the (strength of the) association between two variables The most well-known of these is the (Pearson) correlation coefficient, often referred to as ‘the correlation coefficient’, or even ‘the correlation’ This quantifies the closeness of the relationship between two interval-level variables (scales)

Positive and negative relationships Positive or direct relationships If the points cluster around a line that runs from the lower left to upper right of the graph area, then the relationship between the two variables is positive or direct. An increase in the value of x is more likely to be associated with an increase in the value of y. The closer the points are to the line, the stronger the relationship. Negative or inverse relationships If the points tend to cluster around a line that runs from the upper left to lower right of the graph, then the relationship between the two variables is negative or inverse. An increase in the value of x is more likely to be associated with a decrease in the value of y.

Working out the correlation coefficient (Pearson’s r) Pearson’s r tells us how much one variable changes as the values of another changes – their covariation. Variation is measured with the standard deviation. This measures average variation of each variable from the mean for that variable. Covariation is measured by calculating the amount by which each value of X varies from the mean of X, and the amount by which each value of Y varies from the mean of Y and multiplying the differences together and finding the average (by dividing by n-1). Pearson’s r is calculated by dividing this by (SD of x) x (SD of y) in order to standardize it.

Working out the correlation coefficient (Pearson’s r) Because r is standardized it will always fall between +1 and -1. A correlation of either 1 or -1 means perfect association between the two variables. A correlation of 0 means that there is no association. Note: correlation does not mean causation. We can only investigate causation by reference to our theory. However (thinking about it the other way round) there is unlikely to be causation if there is not correlation.

A scatterplot of the values of two interval-level variables

Example of calculating a correlation coefficient (corresponding to the last slide) X = 5, 4, 4, 7, 9, 8, 9, 4, 6, 5 Mean(X) = 6.1 Y = 8, 7, 9, 7, 8, 8, 8, 5, 5, 6 Mean(Y) = 7.1 ( )(8 – 7.1) + (4 – 6.1)(7 – 7.1)... etc = 7.9 (Covariation) S.D. (X) = 2.02 ; S.D. (Y) = 1.37 (7.9 / 9) / (2.02 x 1.37) = 0.316

Looking at the relationship between two categorical variables If two variables are nominal or ordinal, i.e. categorical, we can look at the relationship between them in the form of a cross-tabulation, using percentages to summarize the pattern. (Typically, if there is one variable that can be viewed as depending on the other, i.e. a dependent variable, and the categories of this variable make up the columns of the cross- tabulation, then it makes sense to have percentages that sum to 100% across each row; these are referred to as row percentages).

An example of a cross-tabulation (from Jamieson et al., 2002 # ) Both ‘permanent’ Both ‘try and see’ Different answers TOTAL Cohabiting without marriage 15 (48%) 4 (13%)12(39%)31 (100%) Cohabited and then married 16 (67%) 1 (4%) 7 (29%)24 (100%) Married without cohabiting 9 (100%) 0 (0%) 9 (100%) ‘When you and your current partner first decided to set up home or move in together, did you think of it as a permanent arrangement or something that you would try and then see how it worked?’ # Jamieson, L. et al ‘Cohabitation and commitment: partnership plans of young men and women’, Sociological Review 50.3: 356–377.

Alternative forms of percentage In the following example, row percentages allow us to compare outcomes between the categories of an independent variable. However, we can also use column percentages to look at the composition of each category of the dependent variable. In addition, we can use total percentages to look at how the cases are distributed across combinations of the two variables.

Example Cross-tabulation II: Row percentages Derived from: Goldthorpe, J.H. with Llewellyn, C. and Payne, C. (1987). Social Mobility and Class Structure in Modern Britain (2nd Edition). Oxford: Clarendon Press.

Example Cross-tabulation II: Column percentages

Example Cross-tabulation II: Total percentages

Percentages and Association It is possibly self-evident that the differences between the percentages in different rows (or columns) can collectively be viewed as measuring association In the case of a 2x2 cross-tabulation (i.e. one with two rows and two columns), the difference between the percentages is a measure of association for that cross-tabulation But there are other ways of quantifying the association in the cross-tabulation…

Odds ratios as a measure of association The patterns in the social mobility table examined in an earlier session can clearly be expressed as differences in percentages (e.g. the differences between the percentages of sons with fathers in classes I and VII who are themselves in classes I and VII. However, an alternative way of quantifying these class differences is to compare the odds of class I fathers having sons in class I as opposed to class VII with the odds of class VII fathers having sons in class I as opposed to class VII. The ratio of these two sets of odds is an odds ratio, which will have a value of close to 1.0 if the two sets of odds are similar, i.e. if there is little or no difference between the chances of being in classes I and VII for sons with fathers in classes I and VII respectively.

Odds Ratios vs. % Differences An Example: Gender and Higher Education Age Degree No Degree% difference= -0.8% Men 56 (13.0%) 374Odds ratio = ((56/374)/(70/438)) Women 70 (13.8%) 438= Age Degree No Degree% difference= 5.3% Men 56 (14.4%) 334Odds ratio = ((56/334)/(38/378)) Women 38 (9.1%) 378= Age Degree No Degree% difference= 4.7% Men 34 (9.9%) 308Odds ratio = ((34/308)/(18/329)) Women 18 (5.2%) 329= 2.018

Choice of measure can matter! The choice of differences between percentages versus odds ratios as a way of quantifying differences between groups can matter, as in the preceding example of the ‘effect’ of gender on the likelihood of having a degree, according to age. The % difference values of 4.7%, 5.3% and -0.8% suggest that inequality increased before it disappeared, whereas the odds ratios of 2.018, and suggest a small decrease in inequality before a larger decrease led to approximate equality! Evidently, there are competing ways of measuring association in a cross-tabulation. But neither differences between percentages nor odds ratios provide an overall summary of the association in a cross-tabulation…

Another measure of association If we need an overall measure of association for two cross-tabulated (categorical) variables, one standard possibility is Cramér’s V Like the Pearson correlation coefficient, it has a maximum of 1, and 0 indicates no relationship, but it can only take on positive values, and makes no assumption of linearity. It is derived from a test statistic (inferential statistic), chi-square, which we will consider in a later session…

An example of Cramér’s V Cramér’s V = 0.074

Other measures of association for cross-tabulations… In a literature review more than thirty years ago, Goodman and Kruskal identified several dozen of these: Goodman, L.A. and Kruskal, W.H Measures of association for cross classifications. New York, Springer-Verlag. … and I added one of my own, Tog, which measures inequality (in a particular way) where both variables are ordinal… One of Tog’s (distant) relatives

What if one variable is a set of categories, and the other is a scale? The equivalent to comparing percentages in this instance is comparing means… but there may be quite a lot of these! So one possible overall measure of association used in this situation is eta 2 (η 2 ) (eta-squared) But this is a less familiar measure (at least to researchers in some social science disciplines!)