Presentation on theme: "Technology, Data Collection, and Analysis Association of Private Enterprise Education April 6-8, 2008."— Presentation transcript:
Technology, Data Collection, and Analysis Association of Private Enterprise Education April 6-8, 2008
Most people understand that data can be mis- represented via visual sleight-of-hand.
Similar misrepresentation occurs when not enough data is collected, the wrong type of data is collected, or the data is aggregated.
Lesson #1: A single observation is meaningless. Corollary:An anecdote is both meaningless and dangerous.
On January 25, 1994, Bill Clinton gave his first State of the Union Address. The next day, the Dow-Jones Industrial Average rose. Pundits took this as evidence of the market’s approval of policies Clinton outlines in the Address.
A single data point contains no meaning.
A mean is what you get when you collect a bunch of individual data points. Lesson #2: A mean is meaningless. Corollary:A mean is dangerous because obtaining it involves simple math and people trust math they can do.
If a single data point is meaningless, then comparing a single data point to a mean is meaninglessness wrapped in the illusion of meaning.
Comparing a single observation to a time series reveals information because a time series reveals variance. Lesson #3: A variance is meaningful. Corollary:A variance is dangerous because obtaining it involves complicated math and people don’t trust math they can’t do.
Variance over time reveals the significance of a single observation.
Comparing a single observation to a cross-section reveals information because a cross-section reveals variance. Lesson #4: Variance can be revealed both in time series and in cross-sectional data.
Comparing a single observation to both a time series and a cross-section reveals a lot of information because the two dimensions give different information on variance Panel data. Lesson #5: Panel data is extremely meaningful. Corollary:If you thought variances were dangerous, panel data is downright witchcraft.
Variance over time reveals significance relative to the past. Variance over cross-section reveals significance relative to others.
Lesson #6: A time series data with few observations is as meaningless as a single observation. This is too complicated. Why not just use time series?
A comparison of two points in time reveals that greater trade is associated with greater unemployment.
The fact that trade reduces unemployment is only revealed after examining many observations.
Lesson #7:Even with many observations, time does not cure all ills.
Twenty years’ worth of data reveal a positive relationship between government spending and the HDI. Mali
Austria Twenty years’ worth of data reveal a positive relationship between government spending and the HDI.
Recall Lesson #1: A single observation is meaningless. If a single observation is meaningless, then perhaps so too is a single time series. Let’s look at the average time series across countries…
Mean Over All Countries The apparent relationship between HDI and the size of government is seen in a different light after examining many time series.
Recall Lesson #2: A mean is meaningless. How are all the individual countries behaving?
We get more information from looking at many individual countries than from looking at means. Standard Errors of Means Over All Countries
Panel data does not lend itself well to graphing. But, panel data contains rich information that is found in neither time series nor cross-sectional data. Econometric techniques can extract that data.
Panel data enables us to filter out noise that occurs across time and across countries to see underlying relationships. Government Spending that Maximizes HDI
Panel data can be visualized, but doing so requires animation. gapminder.org
Moral of the Story Data yields the greatest information when the data is: Disaggregated reporting averages hides information Time series reporting a snapshot hides trends Cross-sectionalreporting one instance of a time series hides atypical trends For discerning truth from noise, disaggregated panel data is the tool of choice.