2Lecture Objectives You should be able to: Define Basic Terms Recognize Types of Data and Data ScalesDraw appropriate graphs based on type of data and type of analysis desired.Interpret the graphs
3Basic Terms Data, Information, and Knowledge Populations and Samples Variables and ObservationsTypes of Data:Categorical and NumericalCross Sectional and Time Ordered
4Data, Information, and Knowledge Data are building blocks of information. These are observations on entities (observation units). Variables are used to measure observations.Information is processed data (organized, summarized, analyzed and filtered) that are made meaningful and relevant to the situation/phenomenon being understood.Knowledge is the ability to apply/use information to decision situations. Meaning associated with information is knowledge …. Actionable Information!ProcessingAnalysisReportsApplicationMeaningRelevance
5Populations and Samples Population: Collection of all possible entities of interestDescribed by ParametersSample: Subset of collectionDescribed by StatisticsStatistical InferenceArt and science of using samples to make conclusions about populations.Statistical Inference is the process by which a characteristics/aspects of a population are understood (known).Conclusions about the population are drawn (inferred) based in the knowledge gained from the sample.A sample should be a good representation of the population.
6Variables and Observations EntityHeight(inches)Weight(pounds)Age(years)Sex(Category)Person 1Person 2Person 3*676172170120220333862MaleFemaleO B S E R V A T I O N SVariables are characteristics (aspects) of entities that are different for different entities. Observations on an entity are values of these characteristics that have been measured.So, a dataset is a collection of observations on a group (sample) of entities. Each row is an observation on a particular entity. Each column is an aspect or characteristic of individual entities (measured as variables).Measurement
7Types of Data: Categorical and Numerical We can do arithmetic on numerical data (age and salary). These data are actual measurements.Categorical data is qualitative. Sometimes qualitative data is coded. For example, opinion can be coded 1-5 and arithmetic (calculations) can be performed. Such data is ordinal (has implied order). State is a categorical variable and cannot be used for calculations. Such data are nominal.CategoricalNumerical
8Data Scales Data are generally classified into four types: Nominal – Categorical dataOrdinal – shows ranks, intervals may varyInterval – intervals are constant, arbitrary 0Ratio – Numeric data with a ‘real’ 0 value.Ordinal, Interval and Ratio scales are all Numeric data.
9Types of Data: Time Series and Cross-sectional PopulationMonth(Millions)19005619105819206019306519407619508419609519701201970PopulationGDPGenderCountry(Millions)$ BillionRatioUSA1605750.998China8001551.105India600Nigeria100Japan120Canada30Variable(s) at one point in timeacross multiple entities (countries in this case)Variable(s) over time
10Numeric Data (Interval or Ratio): Frequency Tables A Frequency Table showing a classification of the AGE of attendees at an event.RelativeClassFrequencyPercent10 to 2030.151520 to 3060.303030 to 4050.252540 to 5040.202050 to 6020.10101.00100Class is a range for the values of a variable.Frequency is the number of observations associated with a class.Relative Frequency is the proportion of observations (frequency) associated with a class.
11Frequency HistogramsA graphical display of distribution of frequencies
12Developing Frequency Tables and Histograms Sort Raw Data in Ascending Order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35,37, 38, 41, 43, 44, 46, 53, 58Find Range: = 46Select Number of Classes: 5 (usually between 5 and 15)Compute Class Interval (width): 10 (range/classes = 46/5 then round up)Determine Class Boundaries (limits): 10, 20, 30, 40, 50Compute Class Midpoints: 15, 25, 35, 45, 55Count Observations & Assign to Classes
13Categorical Data: Bar Charts ObsAgeGenderStateSalary125MFL228FSC36331GA4443538556668742798516495588106171116292126554StateFreqFL3SC5GA4
17Two variables, different units YearCONox1990154,18825,5271991147,12825,1801992140,89525,2611993135,90225,3561994133,55825,3501995126,77824,9551996128,85924,7861997117,91124,7061998115,38024,3471999114,54122,8432000114,46522,5992001106,26321,5462002109,23521,2772003107,06220,4762004104,89219,5642005102,72118,9472006100,55218,226Source:
18Chapter Summary Categorization: Bar, Pie charts Distribution: Stem and Leaf, Histogram, Box PlotRelationships: Scatter Plots, Line ChartsMultivariate: Spider Plots, Maps, Bubble Charts