Presentation on theme: "Part 1: Data Presentation 1-1/41 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics."— Presentation transcript:
Part 1: Data Presentation 1-1/41 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics
Part 1: Data Presentation 1-2/41 Statistics and Data Analysis Part 1 – Data Presentation Telling your story statistically
Part 1: Data Presentation 1-3/41 The Visual Data Do Tell the Story: Napoleon’s March to and from Moscow
Part 1: Data Presentation 1-4/41 What is the story?
Part 1: Data Presentation 1-5/41 Life Expectancy: Highest 15 Countries, 2010Disability Adjusted Life Expectancy 40
Part 1: Data Presentation 1-6/41 A Dynamic Picture
Part 1: Data Presentation 1-7/41 Healthcare ‘Efficiency:’ Source: Bloomberg. August 2013 What do we mean by ‘efficiency?’
Part 1: Data Presentation 1-9/41 Source: Bloomberg. August 2013
Part 1: Data Presentation 1-10/41 Probability of Survival to Age 50, Female at Birth U.S. and 20 Other Wealthy Countries It is possible to be mislead (slightly) by a presentation such as this one. Note the vertical axis. What does this graph tell you?
Part 1: Data Presentation 1-12/41 Does living longer make people happier? Or do people live longer because they are happier?
Part 1: Data Presentation 1-13/41 Does the Picture Tell the Story? New York Times, Page RE1, July 24, 2014 This is the only graphic in the article. The article compares default rates on VA vs. FHA mortgages. Is there anything wrong with this picture?
Part 1: Data Presentation 1-14/41 Data Presentation Agenda Data Types: Cross Section and Time Series Summarizing Data Graphically Pie chart, bar chart Box plot, histogram Summarizing Data with Descriptive Statistics Central tendency Spread Distribution (shape)
Part 1: Data Presentation 1-15/41 Data = A Set of Facts A picture of some aspect of the world Pizza Sales by Type What do the data tell you? How can you use the information? What additional information would make these data (more) informative?
Part 1: Data Presentation 1-16/41 Data Types and Measurement Quantitative Discrete = count: Number of car accidents by city by time Continuous = measurement: Housing prices Qualitative Categorical: Shopping mall, car brand, trip mode Ordinal: Survey data on attitudes; “How do you feel about…?” Strongly disagree Disagree Neutral Agree Strongly agree Moody’s bond ratings: Aaa, Aa, A, Bbb, Bb, B, and so on. Frameworks Cross section Time series
Part 1: Data Presentation 1-17/41 Discrete, Count Data
Part 1: Data Presentation 1-18/41 Discrete Data – US Crime Statistics; Counts of Occurrences.
Part 1: Data Presentation 1-19/41 Continuous Data Housing Prices and Incomes
Part 1: Data Presentation 1-20/41 Unordered Qualitative Data Travel Mode Between Sydney and Melbourne by 210 Travelers
Part 1: Data Presentation 1-21/41 Ordered Qualitative Data German Health Satisfaction Survey; 27,326 individuals. On a scale from 0 to 10, how do you feel about your health?
Part 1: Data Presentation 1-22/41 Bond Ratings Movie Ratings Ordered Qualitative Outcomes
Part 1: Data Presentation 1-23/41 A Problem with Ordered Survey Response Data “Differential Item Functioning” SafetyCountPercentCum Pct 11727.87 21524.5952.46 31727.8780.33 41016.3996.72 523.28100.00 61 Stern Students’ Ranking of Subway Safety (1994)* Very Unsatisfactory Unsatisfactory OK Satisfactory Very Satisfactory Is there an objective meaning to “3” on some standard scale? Does everyone’s “1” or “2” or “3” … mean the same thing? * Jeff Simonoff: Data Presentation and Summary, pp. 3-4
Part 1: Data Presentation 1-24/41 Quantitative vs. Qualitative Data Qualitative Data: No units of measurement Arithmetic manipulation is usually meaningless. The average of Air and Bus is not Train Quantitative Data: Units of measurement make sense. Arithmetic computations make sense.
Part 1: Data Presentation 1-25/41 Cross Section Data Housing Prices and Incomes
Part 1: Data Presentation 1-26/41 Time Series Data: Car Thefts
Part 1: Data Presentation 1-27/41 Representing Data In raw form Transformed to a visual form Summarized graphically Summarized statistically
Part 1: Data Presentation 1-28/41 Pie Chart vs. Frequency Table Pizza Pies Sold, by Type
Part 1: Data Presentation 1-29/41 Data Representation: Bar Chart vs. Pie Chart Same data. Which is easier to understand? BAR CHART PIE CHART
Part 1: Data Presentation 1-30/41 2013 data. Source: Bloomberg
Part 1: Data Presentation 1-31/41 Football Baseball 2013 Valuation of U.S. Sports Teams What story do these figures reveal?
Part 1: Data Presentation 1-32/41 A Box Plot Describes the Distribution of Values in a Set of Data Hawaii Box and Whisker Plot for House Price Listings
Part 1: Data Presentation 1-33/41 Raw Data on Housing Prices and Incomes
Part 1: Data Presentation 1-34/41 Making a Box Plot for Per Capita Income Maximum=31136 Median =22610 Minimum=17043 1 st Quartile = 21677 3 rd Quartile = 24933 Interquartile Range = IQR = 24933-21677 = 3256
Part 1: Data Presentation 1-35/41 Box and Whisker Plot Median 75 th Percentile 25 th Percentile Interquartile range=IQR Larger of (Minimum, Median – 1.5 IQR Smaller of (Maximum, Median + 1.5 IQR Outliers HOG, pp. 39-43 What is an outlier? Why do we believe a particular point is an outlier? = extreme observations
Part 1: Data Presentation 1-36/41 Histogram Showing Counts
Part 1: Data Presentation 1-37/41 A Frequency Distribution for Grouped Data
Part 1: Data Presentation 1-38/41 Histogram for House Price Listings HOG, pp. 16-18 A histogram describes the sample data and suggests the nature of the underlying data generating process. Note the “skewness” of the distribution of listings.
Part 1: Data Presentation 1-39/41 Distribution of House Price Listings Asymmetry (skewness) in the histogram of listing prices… … shows up in the box and whisker plot. Note the long whisker at the top of the figure.
Part 1: Data Presentation 1-40/41 A Caution About Graphical Data Summaries Graphical tools can be very badly behaved when: (1) The data have only a few observations. (2) There are wild observations in the data set. The box and whisker plot is distorted (and dominated) by one wildly errant observation.
Part 1: Data Presentation 1-41/41 Summary What story does the data presentation tell? Data in raw form tell no story. Visual representation of data tells something about the data The representation of the data may reveal something about the underlying process that the data measure. What tool is most informative? Reduction to a small number of features Visual displays of data Pie chart Box and whisker plots Bar charts Histograms Time series plots “There are lies, damned lies and statistics.” (Benjamin Disraeli)