# Describing Data Charts and Graphs.

## Presentation on theme: "Describing Data Charts and Graphs."— Presentation transcript:

Describing Data Charts and Graphs

Lecture Objectives You should be able to: Define Basic Terms
Recognize Types of Data and Data Scales Draw appropriate graphs based on type of data and type of analysis desired. Interpret the graphs

Basic Terms Data, Information, and Knowledge Populations and Samples
Variables and Observations Types of Data: Categorical and Numerical Cross Sectional and Time Ordered

Data, Information, and Knowledge
Data are building blocks of information. These are observations on entities (observation units). Variables are used to measure observations. Information is processed data (organized, summarized, analyzed and filtered) that are made meaningful and relevant to the situation/phenomenon being understood. Knowledge is the ability to apply/use information to decision situations. Meaning associated with information is knowledge …. Actionable Information! Processing Analysis Reports Application Meaning Relevance

Populations and Samples
Population: Collection of all possible entities of interest Described by Parameters Sample: Subset of collection Described by Statistics Statistical Inference Art and science of using samples to make conclusions about populations. Statistical Inference is the process by which a characteristics/aspects of a population are understood (known). Conclusions about the population are drawn (inferred) based in the knowledge gained from the sample. A sample should be a good representation of the population.

Variables and Observations
Entity Height (inches) Weight (pounds) Age (years) Sex (Category) Person 1 Person 2 Person 3 * 67 61 72 170 120 220 33 38 62 Male Female O B S E R V A T I O N S Variables are characteristics (aspects) of entities that are different for different entities. Observations on an entity are values of these characteristics that have been measured. So, a dataset is a collection of observations on a group (sample) of entities. Each row is an observation on a particular entity. Each column is an aspect or characteristic of individual entities (measured as variables). Measurement

Types of Data: Categorical and Numerical
We can do arithmetic on numerical data (age and salary). These data are actual measurements. Categorical data is qualitative. Sometimes qualitative data is coded. For example, opinion can be coded 1-5 and arithmetic (calculations) can be performed. Such data is ordinal (has implied order). State is a categorical variable and cannot be used for calculations. Such data are nominal. Categorical Numerical

Data Scales Data are generally classified into four types:
Nominal – Categorical data Ordinal – shows ranks, intervals may vary Interval – intervals are constant, arbitrary 0 Ratio – Numeric data with a ‘real’ 0 value. Ordinal, Interval and Ratio scales are all Numeric data.

Types of Data: Time Series and Cross-sectional
Population Month (Millions) 1900 56 1910 58 1920 60 1930 65 1940 76 1950 84 1960 95 1970 120 1970 Population GDP Gender Country (Millions) \$ Billion Ratio USA 160 575 0.998 China 800 155 1.105 India 600 Nigeria 100 Japan 120 Canada 30 Variable(s) at one point in time across multiple entities (countries in this case) Variable(s) over time

Numeric Data (Interval or Ratio): Frequency Tables
A Frequency Table showing a classification of the AGE of attendees at an event. Relative Class Frequency Percent 10 to 20 3 0.15 15 20 to 30 6 0.30 30 30 to 40 5 0.25 25 40 to 50 4 0.20 20 50 to 60 2 0.10 10 1.00 100 Class is a range for the values of a variable. Frequency is the number of observations associated with a class. Relative Frequency is the proportion of observations (frequency) associated with a class.

Frequency Histograms A graphical display of distribution of frequencies

Developing Frequency Tables and Histograms
Sort Raw Data in Ascending Order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find Range: = 46 Select Number of Classes: 5 (usually between 5 and 15) Compute Class Interval (width): 10 (range/classes = 46/5 then round up) Determine Class Boundaries (limits): 10, 20, 30, 40, 50 Compute Class Midpoints: 15, 25, 35, 45, 55 Count Observations & Assign to Classes

Categorical Data: Bar Charts
Obs Age Gender State Salary 1 25 M FL 2 28 F SC 36 3 31 GA 44 4 35 38 5 56 6 68 7 42 79 8 51 64 9 55 88 10 61 71 11 62 92 12 65 54 State Freq FL 3 SC 5 GA 4

Categorical Data: Pie Charts
State Freq FL 3 SC 5 GA 4

Numeric Data by Category
F M FL 66.00 25.00 GA 70.00 74.67 SC 53.67 67.50

Bivariate Numerical Data Scatter Plot

Two variables, different units
Year CO Nox 1990 154,188 25,527 1991 147,128 25,180 1992 140,895 25,261 1993 135,902 25,356 1994 133,558 25,350 1995 126,778 24,955 1996 128,859 24,786 1997 117,911 24,706 1998 115,380 24,347 1999 114,541 22,843 2000 114,465 22,599 2001 106,263 21,546 2002 109,235 21,277 2003 107,062 20,476 2004 104,892 19,564 2005 102,721 18,947 2006 100,552 18,226 Source:

Chapter Summary Categorization: Bar, Pie charts
Distribution: Stem and Leaf, Histogram, Box Plot Relationships: Scatter Plots, Line Charts Multivariate: Spider Plots, Maps, Bubble Charts