The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

HS 67 - Intro Health Statistics Describing Distributions with Numbers
Measures of Variation Sample range Sample variance Sample standard deviation Sample interquartile range.
Numerically Summarizing Data
Measures of Dispersion or Measures of Variability
Chapter 3 Describing Data Using Numerical Measures
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
MEASURES OF SPREAD – VARIABILITY- DIVERSITY- VARIATION-DISPERSION
Intro to Descriptive Statistics
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Standard Deviation Interquartile Range (IQR)
QBM117 Business Statistics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 3 Describing Data Using Numerical Measures.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
GrowingKnowing.com © Percentile What ‘s the difference between percentile and percent? Percent measures ratio 90% percent on a test shows you.
Anthony J Greene1 Dispersion Outline What is Dispersion? I Ordinal Variables 1.Range 2.Interquartile Range 3.Semi-Interquartile Range II Ratio/Interval.
The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis.
Measures of Variability James H. Steiger. Overview Discuss Common Measures of Variability Range Semi-Interquartile Range Variance Standard Deviation Derive.
Chapter 3 Descriptive Statistics: Numerical Methods Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Review Measures of central tendency
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Descriptive Statistics: Numerical Methods.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
INVESTIGATION 1.
INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics Measures of Location Measures of Dispersion Measures.
Summary Statistics: Measures of Location and Dispersion.
Chapter 5: Measures of Dispersion. Dispersion or variation in statistics is the degree to which the responses or values obtained from the respondents.
Psychology 202a Advanced Psychological Statistics September 8, 2015.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Numerical descriptions of distributions
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
CHAPTER 2: Basic Summary Statistics
Descriptive Statistics(Summary and Variability measures)
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
StatisticsStatistics Unit 5. Example 2 We reviewed the three Measures of Central Tendency: Mean, Median, and Mode. We also looked at one Measure of Dispersion.
Statistics -Descriptive statistics 2013/09/30. Descriptive statistics Numerical measures of location, dispersion, shape, and association are also used.
Descriptive Statistics ( )
Business and Economics 6th Edition
Chapter 16: Exploratory data analysis: numerical summaries
Numerical descriptions of distributions
Chapter 3 Describing Data Using Numerical Measures
Measures of dispersion
2.5: Numerical Measures of Variability (Spread)
Descriptive Statistics (Part 2)
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
Measuring Variation – The Five-Number Summary
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
Mean, Median, Mode The Mean is the simple average of the data values. Most appropriate for symmetric data. The Median is the middle value. It’s best.
Chapter 1: Exploring Data
SYMMETRIC SKEWED LEFT SKEWED RIGHT
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Essential Statistics Describing Distributions with Numbers
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 2: Basic Summary Statistics
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Business and Economics 7th Edition
Presentation transcript:

The Scientific Study of Politics (POL 51) Professor B. Jones University of California, Davis

Fun With Numbers ► Some Univariate Statistics ► Learning to Describe Data

Useful to Visualize Data

Main Features ► Exhibits “Right Skew” ► Some “Outlying” Data Points? ► Question: Are the outlying data points also “influential” data points (on measures of central tendency)? ► Let’s check…

The Mean ► Formally, the mean is given by: ► Or more compactly:

Our Data ► Mean of Y is ► Mechanically…  ( … + 88)/67= ► Problems with the mean? ► No indication of dispersion or variability.

Variance ► The variance is a statistic that describes (squared) deviations around the mean: ► Why “N-1”? ► Interpretation: “Average squared deviations from the mean.”

Our Data ► Variance= 202,431.8 ► Mechanically:  [( ) 2 + ( ) ( ) 2 ]/66 ► Interpretation:  “The average squared deviation around Y is 202,431. ► Rrrrright. (Who thinks in terms of squared deviations??) ► Answer: no one. ► That’s why we have a standard deviation.

Standard Deviation ► Take the square root of the variance and you get the standard deviation. ► Why we like this:  Metric is now in original units of Y. ► Interpretation  S.D. gives “average deviation” around the mean.  It’s a measure of dispersion that is in a metric that makes sense to us.

Our Data ► The standard deviation is: ► Mechanically: {[( ) 2 + ( ) ( ) 2 ]/66} ½ {[( ) 2 + ( ) ( ) 2 ]/66} ½ ► Interpretation: “The average deviation around the mean of is ► Now, suppose Y=Votes… ► The average number of votes is “about 261 and the average deviation around this number is about 450 votes.” ► The dispersion is very large. ► (Imagine the opposite case: mean test score is 85 percent; average deviation is 5 percent.)

Revisiting our Data

Skewness and The Mean ► Data often exhibit skew. ► This is often true with political variables. ► We have a measure of central tendency and deviation about this measure (Mean, s.d) ► However, are there other indicators of central tendency? ► How about the median?

Median ► “50 th ” Percentile: Location at which 50 percent of the cases lie above; 50 percent lie below. ► Since it’s a locational measure, you need to “locate it.” ► Example Data: 32, 5, 23, 99, 54 ► As is, not informative.

Median ► Rank it: 5, 23, 32, 54, 99 ► Median Location=(N+1)/2 (when n is odd) ► =6/2=3 ► Location of the median is data point 3 ► This is 32. ► Hence, M=32, not 3!! ► Interpretation: “50 percent of the data lie above 32; 50 percent of the data lie below 32.” ► What would the mean be? ► (42.6…data are __________ skewed)

Median ► When n is even: -67, 5, 23, 32, 54, 99 ► M is usually taken to be the average of the two middle scores:  (N+1)/2=7/2=3.5  The median location is 3.5 which is between 23 and 32  M=(23+32)/2=27.5 ► All pretty straightforward stuff.

Median Voter Theorem (a sidetrip) ► One of the most fundamental results in social sciences is Duncan Black’s Median Voter Theorem (1948) ► Theorem predicts convergence to median position. ► Why do parties tend to drift toward the center? ► Why do firms locate in close proximity to one another? ► The theorem: “given single-peaked preferences, majority voting, an odd number of decision makers, and a unidimensional issue space, the position taken by the median voter has an empty winset.” ► That is, under these general conditions, all we need to know is the preference of the median chooser to determine what the outcome will be. No position can beat the median.

Dispersion around the Median ► The mean has its standard deviation… ► What about the median?  No such thing as “standard deviation” per se, around the median.  But, there is the IQR ► Interquartile Range  The median is the 50 th percentile.  Suppose we compute the 25 th and the 75 th percentiles and then take the difference.  25 th Percentile is the “median” of the lower half of the data; the 75 th Percentile is the “median” of the upper half.

IQR and the 5 Number Summary ► Data: -67, 5, 23, 32, 54, 99 ► 25 th Percentile=5 ► 50 th Percentile=54 ► IQR is difference between 75 th and 25 th percentiles: 54-5=49 ► Hence, M=27.5; IQR=49 ► “Five Number Summary” Max, Min, 25 th, 50 th, 75 th Percentiles: ► -67, 5, 27.5, 54, 99

Finding Percentiles ► General Formula ► p is desired percentile ► n is sample size ► If L is a whole number:  The value of the pth percentile is between the Lth value and the next value. Find the mean of those values ► If L is not a whole number:  Round L up. The value of the pth percentile is the Lth value

Example ► -67, 5, 23, 32, 54, 99 ► 25 th Percentile: L=(25*6)/100=1.5  Round to 2. The 25 th Percentile is 5. ► 75 th Percentile: L=(75*6)/100=4.5  Round to 5. The 75 th Percentile is 54.  50 th Percentile: L=(50*6)/100=3  Take average of locations 3 and 4  This is (23+32)/2=27.5.

Our Data ► Median=120 Votes (i.e. [50*67]/100) ► 25 th Percentile=46 Votes ► 75 th Percentile=289 Votes ► IQR: 243 Votes ► 5 number summary:  Min=9, 25 th P=46, Median=120, 75 th P=289, Max=3407 ► (massive dispersion!) ► Mean was Median=120. ► The Mean is much closer to the 75 th percentile. ► That’s SKEW in action.

Revisiting our Data: Odd Ball Cases

“Influential Observations” ► Two data points:  Y=(1013, 3407) ► Suppose we omit them (not recommended in applied research) ► Mean plummets to (drop of 60 votes) ► s.d. is cut by more than half: ► Med=114 (note, it hardly changed) ► Let’s look at a scatterplot

Useful to Visualize Data

Main Features? ►Y►Y►Y►Y and X are positively related. ►T►T►T►There are clearly visible “outliers.” ►W►W►W►With respect to Y, which “outlier” worries you most? ►I►I►I►Influence!

Simple Description ► You can learn a lot from just these simple indicators. ► Suppose that our Y was a real variable?

Palm Beach County, FL 2000 Election

Descriptive Statistics Help to Clarify Some Issues. ► Palm Beach County  Largely a Jewish community  Heavily Democratic  Yet an overwhelming number of Buchanan Votes ► The Ballot created massive confusion. ► Margin of Victory in Florida: 537 votes. ► Number of Buchanan Votes in PBC: 3407