2 Lecture Objectives You should be able to: Define Basic Terms Recognize Types of Data and Data ScalesDraw appropriate graphs based on type of data and type of analysis desired.Interpret the graphs
3 Basic Terms Data, Information, and Knowledge Populations and Samples Variables and ObservationsTypes of Data:Categorical and NumericalCross Sectional and Time Ordered
4 Data, Information, and Knowledge Data are building blocks of information. These are observations on entities (observation units). Variables are used to measure observations.Information is processed data (organized, summarized, analyzed and filtered) that are made meaningful and relevant to the situation/phenomenon being understood.Knowledge is the ability to apply/use information to decision situations. Meaning associated with information is knowledge …. Actionable Information!ProcessingAnalysisReportsApplicationMeaningRelevance
5 Populations and Samples Population: Collection of all possible entities of interestDescribed by ParametersSample: Subset of collectionDescribed by StatisticsStatistical InferenceArt and science of using samples to make conclusions about populations.Statistical Inference is the process by which a characteristics/aspects of a population are understood (known).Conclusions about the population are drawn (inferred) based in the knowledge gained from the sample.A sample should be a good representation of the population.
6 Variables and Observations EntityHeight(inches)Weight(pounds)Age(years)Sex(Category)Person 1Person 2Person 3*676172170120220333862MaleFemaleO B S E R V A T I O N SVariables are characteristics (aspects) of entities that are different for different entities. Observations on an entity are values of these characteristics that have been measured.So, a dataset is a collection of observations on a group (sample) of entities. Each row is an observation on a particular entity. Each column is an aspect or characteristic of individual entities (measured as variables).Measurement
7 Types of Data: Categorical and Numerical We can do arithmetic on numerical data (age and salary). These data are actual measurements.Categorical data is qualitative. Sometimes qualitative data is coded. For example, opinion can be coded 1-5 and arithmetic (calculations) can be performed. Such data is ordinal (has implied order). State is a categorical variable and cannot be used for calculations. Such data are nominal.CategoricalNumerical
8 Data Scales Data are generally classified into four types: Nominal – Categorical dataOrdinal – shows ranks, intervals may varyInterval – intervals are constant, arbitrary 0Ratio – Numeric data with a ‘real’ 0 value.Ordinal, Interval and Ratio scales are all Numeric data.
9 Types of Data: Time Series and Cross-sectional PopulationMonth(Millions)19005619105819206019306519407619508419609519701201970PopulationGDPGenderCountry(Millions)$ BillionRatioUSA1605750.998China8001551.105India600Nigeria100Japan120Canada30Variable(s) at one point in timeacross multiple entities (countries in this case)Variable(s) over time
10 Numeric Data (Interval or Ratio): Frequency Tables A Frequency Table showing a classification of the AGE of attendees at an event.RelativeClassFrequencyPercent10 to 2030.151520 to 3060.303030 to 4050.252540 to 5040.202050 to 6020.10101.00100Class is a range for the values of a variable.Frequency is the number of observations associated with a class.Relative Frequency is the proportion of observations (frequency) associated with a class.
11 Frequency HistogramsA graphical display of distribution of frequencies
12 Developing Frequency Tables and Histograms Sort Raw Data in Ascending Order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35,37, 38, 41, 43, 44, 46, 53, 58Find Range: = 46Select Number of Classes: 5 (usually between 5 and 15)Compute Class Interval (width): 10 (range/classes = 46/5 then round up)Determine Class Boundaries (limits): 10, 20, 30, 40, 50Compute Class Midpoints: 15, 25, 35, 45, 55Count Observations & Assign to Classes
13 Categorical Data: Bar Charts ObsAgeGenderStateSalary125MFL228FSC36331GA4443538556668742798516495588106171116292126554StateFreqFL3SC5GA4
14 Categorical Data: Pie Charts StateFreqFL3SC5GA4
15 Numeric Data by Category FMFL66.0025.00GA70.0074.67SC53.6767.50
17 Two variables, different units YearCONox1990154,18825,5271991147,12825,1801992140,89525,2611993135,90225,3561994133,55825,3501995126,77824,9551996128,85924,7861997117,91124,7061998115,38024,3471999114,54122,8432000114,46522,5992001106,26321,5462002109,23521,2772003107,06220,4762004104,89219,5642005102,72118,9472006100,55218,226Source:
18 Chapter Summary Categorization: Bar, Pie charts Distribution: Stem and Leaf, Histogram, Box PlotRelationships: Scatter Plots, Line ChartsMultivariate: Spider Plots, Maps, Bubble Charts