Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Similar presentations


Presentation on theme: "Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."— Presentation transcript:

1 Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

2 Announcements Problem set 1 due next week!

3 Dispersion: The Variance Dispersion can be measured by adding up deviation –We square the deviation to avoid negative values –And, divide by “N-1” (instead of N) to get the average Result: The “variance”:

4 Dispersion: Standard Deviation Result: Standard Deviation –Simply the square root of the variance –Denoted by lower-case s –Most commonly used measure of dispersion Formula:

5 Example 1: s = 21.72

6 Example 2: s = 67.62

7 Example 3: s = 102.15

8 Thinking About Dispersion Suppose we observe that the standard deviation of wealth is greater in the U.S. than in Sweden… –What can we conclude about the two countries? Guess which group has a higher standard deviation for income: Men or Women? Why? The standard deviation of a stock’s price is sometimes considered a measure of “risk”. Why? Suppose we polled people on two political issues and the S.D. was much higher for one –How would you interpret that?

9 Other Univariate Stats: Skewness Is it a distribution symmetrical? Skewness refers to the symmetry of a distribution A “tail” is referred to as “skewness” Tail on left = skewed to left = negative skew Tail on right = skewed to right = positive skew Perfectly symmetrical distributions have no skew Interpretation: The side of the distribution with the tail has fewer cases More cases are on the other side of the mean…

10 Interpreting Skewness Skewness provides information about inequality –Example: Economic wealth of nations

11 Interpreting Skewness Skewness provides information about inequality in your data Example: Economic wealth of nations… Which way is it skewed? What is the social interpretation? What would be the interpretation if it were skewed in the opposite direction? What are some other social circumstances that might generate skewed distributions? Why?

12 Interpreting Skewness Skewness may reflect “floor” or “ceiling” effects Example: Number of crimes committed by individuals in a sample. Lower bound is zero. Mode is low. Few cases are high. Variable is skewed to right. Example: Country school enrollment ratio. Cannot exceed 100% enrollment in school. Can anyone think of other examples?

13 Calculating Skewness Often, skewness is merely used descriptively But, statisticians have created a measure Zero = perfectly symmetrical Higher number = increasing skew Based on distance from Mean to Median Remember, Mean moves more if there are extreme cases, as when there is a “tail” Formula:

14 Notes on Skewness Skewness is often assessed informally “by eye” rather than calculated as a value. Look at a histogram to identify skewness Some statistical techniques work properly only on variables that are not skewed. Thus, it can be very important to identify highly skewed variables.

15 Other Univariate Descriptions: Modes Modes = Peaks –Note: “the mode” also refers to a measure of central tendency – the value associated with the highest peak But, the term is also used more generally: –Uni-modal distribution: One peak –Bi-modal distribution: Two peaks –Multi-modal distribution: Multiple peaks.

16 Interpreting Multi-Modal Distributions Can you think of a reason for multiple modes? The sample is heterogeneous (i.e., made up of more than one group) Height forms a bell-shaped distribution for men and for women, but the peaks are different. A combined sample has two peaks The sample reflects some exogenous structural ordering process Years of education completed is peaked at 12 (high school), 16 (college)

17 Example: Mode, skew How would you describe this variable?

18 Example: Mode, skew How would you describe this variable?

19 Example: Mode, Skew How would you describe this variable?

20 More Univariate Tools Two other issues: –1. How many cases fall below or above a given value? –2. How can we describe a case’s value relative to other cases? Tools: –Cumulative frequency lists/plots –Quantiles (e.g., percentiles, quartiles) –Z-scores

21 Cumulative Frequency List Cumulative Frequency: Number of cases falling in or below a given interval Cumulative frequency graph = “ogive” Cumulative Percentage: Percentage of cases falling in or below a given interval Cumulative frequency lists, graphs can be generated in SPSS: frequency, histogram.

22 Cumulative Percentage List Years of Education (N=2904) Value Frequency Percent Cumulat % 7 or less 21 1.4 3.9 8 82 5.3 9.3 9 51 3.3 12.6 10 70 4.6 17.2 11 95 6.2 23.4 12 489 31.8 55.4 13 125 8.1 63.5 14 184 12.0 75.6 15 76 4.9 80.5 16 152 9.9 90.5 17 40 2.6 93.1 18 61 4.0 97.1 19 18 1.2 98.2 20 27 1.8 100.0 Q: How do you find the median? Indicates that 55% of students have 12 years of education or less

23 Cumulative % Graph

24 Quantiles Percentiles, quartiles, deciles, etc… General term = quantile Quantiles: Dividing cases up into fixed number of equal “bunches” –100 chunks = percentiles –10 chunks = deciles –5 = quintiles –4 = quartiles

25 Quartile: Example Example: Number of CD’s owned (N=12) 0 0 9 17 19 29 46 87 103 178 202 293 First Quartile Second Quartile Third Quartile Fourth Quartile Identifying quartile of a case is a powerful way of describing where a case falls relative to others –A person with 200 CDs is in the top quartile 75% have less Note: Don’t forget that quantiles are relative –A person of average height in the US would be in the bottom quartile in a dataset of basketball players.

26 Quantiles Also: Upper and lower bounds of quantiles are useful reference points that describe your data –The border of the 2 nd and 3 rd quartile is the median, the middle of your data –The border of the top quartile (178 CDs) gives you a sense of how many are owned by people toward the upper end of the distribution –Ex: Sometimes people report “interquartile range” The range of values that contains the middle 50% of cases.

27 Quantiles Useful questions that Quantiles help answer: 1. How does a particular case compare to others in the dataset? –Example: I scored 57 on a test… is that good? –Strategy: Determine the percentile –If 57 corresponds to the 22 th percentile, then the answer is NO! At least not compared to the others who took the test –Note: Percentiles indicate position relative

28 Quantiles Useful questions that Quantiles help answer: 2. How does a case’s value on one variable compare to another variable? –If I scored 51 on my math test and 78 on my English test, which is better? –Converting to percentiles allows a direct comparison Ex: 51 on math = 95 th percentile; 78 on English = 62 nd Conclusion: Math performance was better!

29 Quantiles Useful questions that Quantiles help answer: 3. What values of a variable are high or low for a given variable? –Ex: U.S. Census Income Statistics by Quintiles 2001: –Cutoffs: $17,970; $33,314; $53,000; $83,500 0 to $17,970 = lowest quintile $17,970 to $33,314 = second quintile $33,314 to $53,000 = third quintile $53,000 to $83,000 = fourth quintile $83,500 to “Bill Gates” = highest quintile –Typical starting salary of sociologist: $50,000

30 Computing Quantiles Calculating quantiles in SPSS: SPSS frequencies command Options under statistics button specifies –Or, you can rely on the Cumulative Percentage list to identify percentiles or other quantiles Example: Years of education completed (GSS) 95 th percentile falls at: 18 years of education Interpretation: 5% are more educated. 95% are less.

31 Z-Score (Standardized Score) The Z-score: Another way to assess relative placement of cases in a distribution Somewhat like a deviation And has other uses You can convert any or all values of a variable to a common scale Running approximately from –3 to +3, with mean = 0 Then you can easily compare across variables Ex: I’m a -.3 on math, a +1.2 on reading Negative = below mean, positive = above mean.

32 Formula for Z-Score For any case in your data, calculate: Start with the a cases value (Y i )… Then simply subtract the mean and divide by the standard deviation.

33 Z-Score Example Example: In the US, the mean level of education is 13 years, with a S.D. of 3 years Question 1: What is the Z-score of a person who has a high-school degree? (12 yrs) Question 2: What is the Z-score of an advanced graduate student? (22 yrs)

34 Properties of Z-Scores Z-scores are like deviations Cases on the mean score zero Positive values are above mean, negative below But, like quantiles, Z-scores can be compared across variables with different units or means Simple deviations can’t be compared if units of measurement are different: Ex: height and weight Units of Z-scores are “standard deviations” A Z-score of -1.83 indicates a case is nearly 2 standard deviations below the mean.

35 Z-scoring Whole Variables You can convert an entire variable (all cases) to Z-scores, creating a whole new variable With useful properties Converting to Z-scores preserves the shape of the distribution But, mean and standard deviation are altered Mean = zero Because it is based on deviations Standard Deviation (s y ) = 1 Because distance from mean = divided by s y.

36 Z-Score Example Number of CD’s: Mean = 32.5, s = 29.8 CaseNum CD’s (Y) Mean (Y bar) Deviation (d) Z-score (d i /s) 12032.5-12.5-.42 24032.57.5+.25 3032.5-32.5-1.1 47032.537.5+1.3

37 Converting Variables to Z-scores GSS Data, N=2904

38

39 Z-Scoring Whole Variables Properties of Z-scored variables 1. Mean = 0, S.D. = 1 –Unit of variable is literally “standard deviations” –If a value = 1, it means the cases is 1 S.D above mean 2. Z-scored variables are useful for comparing variables with very different units 3. However, the actual meaning of units is lost –Ex: a variable measured in # of CDs makes sense, but a variable in # of S.D.s is harder to interpret

40 Z-Scores and Index Construction Issue: It is often useful to combine several variables to create an “index” Example: Suppose you ask several similar questions on a survey (all on a scale from 1-5): Do you approve of President’s foreign policy? Do you approve of the President’s domestic policy? Do you approve of the President’s character? You can add all 3 together to make a scale from that reflects “overall approval” of Bush For each individual, the scale goes from 3 to 15.

41 Z-Scores and Index Construction Example: Constructing an index Case # ForeignDomesticCharacterIndex 11225 245514 32338 43418 54127 Index value is simply the sum of the three component variables

42 Z-Scores and Index Construction Suppose you wanted to make an index of the following variables: –1. Approval of foreign policy (measured 1-3) –2. Approval of domestic policy (measured 1-5) –3. Approval of character (measured 1-100) Question: What is the problem with constructing and index from these three measures? Answer: Value of index variable is almost wholly determined by the third variable –It is numerically much larger, and “dominates” the index

43 Z-Scores and Index Construction Calculating Z-scores of each variable (prior to adding them) can help make a better index Reason: Z-scoring variables “standardizes” the dispersion of each component of the index –All vars have same mean (0), standard deviation (1) –Thus, each variable contributes roughly equally to the index. None disproportionately influence it. –Final index of 3 vars: mean = 0, S.D. = 3 Note: There are many other ways to create indexes… but this is one quick solution

44 Z-Score: Final Remarks Z-scores help us locate cases within a distribution –Example: We know that if Z>0, case is above median Under normal circumstances, a case’s Z-score does not tell us exactly which percentile the case falls in… It depends on the shape of the distribution… However, if the a variable distribution takes on a predictable shape, we can make an accurate determination This will prove useful next week!


Download ppt "Sociology 5811: Lecture 4: Other Univariate Descriptives, Quantiles, and Z- Scores Copyright © 2005 by Evan Schofer Do not copy or distribute without permission."

Similar presentations


Ads by Google