Organizing and Displaying Epidemiologic Data with Tables and Graphs

Organizing and Displaying Epidemiologic Data with Tables and Graphs

Learning Objectives Discuss the difference between tables and graphs for written reports versus oral presentations Create and interpret one and two variable tables Create and interpret a line graph Create and interpret an epidemic curve Create and interpret one and two variable bar charts Describe when to use each type of table, graph, and chart

Can you summarize the age and sex of the case-patients at a glance?
Case No. Date of Onset Age Sex 1 21 Nov 9 M 2 39 3 22 Nov 29 F Here is part of a list listing of 3 patients who developed symptoms of gastroenteritis in November. Can you summarize the age and sex of the case-patients at a glance? In other words, is it easy or difficult to summarize the age and sex of these 3 patients? (With only 3 patients, it should be easy.)

Case No. Date of Onset Age Sex 1 21 Nov 9 M 2 39 3 22 Nov 29 F 4 10 5 55 6 11 With 6 patients, can you still summarize the age and sex distribution relatively quickly and easily? (Still relatively quick and easy.)

Case No. Age Sex 1 9 M 2 39 3 29 F 4 10 5 55 6 11 7 8 17 Case No. Age Sex 11 10 M 12 6 13 9 14 40 15 F 16 17 18 43 19 71 20 Case No. Age Sex 21 38 F 22 34 23 9 M 24 10 25 6 26 11 27 28 41 29 30 Case No. Age Sex 31 10 M 32 F 33 8 34 9 35 36 11 37 38 39 7 40 16 With 40 patients, is it still easy to summarize the age and sex distribution? (No.)

Basic Methods for Organizing and Presenting Data
Data can be organized through creation of: Tables Graphs Charts Data represent observations that require organization in order to provide useful information. Each piece of data is like a puzzle piece: on its own, it doesn’t tell you much, but when it is put with the rest of pieces and they are arranged systematically, it can be part of a meaningful picture. To organize data and information, epidemiologists often choose to present their findings in the form of tables, charts, graphs and maps. In the slides that follow, we will discuss each different method of data organization in greater detail.

Why organize and present data?
To summarize when data set has too many records to look at individually To become familiar with the data before analysis, and to catch errors To look for (and display) Patterns Trends Relationships Exceptions / outliers To communicate findings to others The organization and presentation of data is essential for several different reasons. First, most data sets have too many records to summarize simply by looking at the line listing or the individual case report forms. Organizing and summarizing data condenses a larger amount of information into a smaller, more comprehensible amount of data. Second, organizing and summarizing data helps the investigator become familiar with the data, and helps identify problems with the data such as number of records with missing values, illegal values (if 1=Male and 2=Female, what does 3 mean?), and outliers (is weight = 440 pounds [200 kg] real or data entry error?) Third, summarizing data helps the investigator identify patterns (mostly children or mostly adults?), trends (increasing or decreasing over time or seasonal?), relationships (were the people who ate pastries more likely to get sick than people who did not eat pastries?), and exceptions or outliers. Finally, organizing and presenting data is a extremely useful way to communicate and share information with others.

Written vs. Oral Presentation
Time unlimited Details OK White, grey and black Oral Time < 1 min Less detail Colors possible Please note that this is a PowerPoint presentation, i.e., an oral presentation. We use similar methods for organizing, summarizing, and presenting data for manuscripts and other written reports, but the circumstances differ so the details can differ as well. For a written report, the reader can take as much time as he/she wants to review a table or graph. So details are okay. For example, the detailed tables in the weekly MMWR or a Demographic Health Survey (DHS) report or most annual reports are very detailed, and a local health director or program director can take the time to find the data relevant to him/her. However, during an oral presentation, the presenter controls the amount of time available, and a slide usually remains on the screeen for less than a minute. So a table or graph used in an oral presentation must be understandable almost instantly, and the audience does not have time to search for details buried in the slide. Finally, many printers and most photocopiers print in black and white, so printed colors are often lost. It is safer to use black and white and shades in between. For an oral presentation, colors can actually facilitate understanding, so appropriately selected colors are fine.

How to organize data Identify what data you have
Use tables and graphs to summarize; catch errors; identify patterns, relationships Decide how best to summarize the data to communicate the findings Use tables and graphs to communicate the findings effectively Really, this process should have started when the study was being designed – what do I want to know? That dictates what data needs to be collected and how it will be analyzed. Once the data have been collected, the next step is to identify the data available for analysis. We will talk about the various types of tables and graphs that are then used to summarize the data, to catch errors and clean the data, to characterize the demographics of the population that has been studied, and finally to identify patterns, trends, relationships, and exceptions / outliers within the data. These tables and graphs that you use to better understand the data may not be the best ones for presenting the key findings to others, so the next step is to decide which tables and graphs WOULD be best for communicating the findings, and finally to produce those tables and graphs in ways to most effectively communicate the findings to the readers or listening audience.

Tables Now we will talk about tables.

Tables Data are arranged in rows and columns Quantitative information
Usually, presents frequency of occurrence of some event or characteristic in different subgroups In the very simplest terms, a table is a set of data that is organized into rows and columns. In general and particularly in epidemiology, tables present quantitative data. The simplest tables present the Click before reading: Every row and column in a table should have corresponding labels that are clear and concise, and each row and column should be accompanied by totals. A successfully and correctly structured table should be self-explanatory, meaning that it is easily understandable and capable of standing alone. If the table is associated with a report or scientific manuscript, the information contained within the table should make sense when viewed all by itself, independent of the report or manuscript to which it pertains. Click before reading: If any codes, abbreviations, symbols, exclusions, or data sources are used, these should always be explained by a footnote. There are many different types of tables that can be used to organize and present data. Some of the more popular include: 1 variable tables 2 and 3 variable tables Tables of other statistical measures with cells containing rates, means, relative risks or other measures. Today, we will be focused solely on 2 variable tables, but it is helpful to be aware that other types of tables do exist. In many cases, tables can actually serve as the basis for other charts and graphs. What this means is that the data organized within a table can be depicted in other ways.

Earthquake-related injury
Tables Descriptive Title (What, where, when) Type of injury by sex, Port-au-Prince field hospital, Haiti, January 13 – May 28, 2010 Column Totals Earthquake-related injury Other injury Total Male 74 259 333 Female 85 151 236 Unknown 3 9 12 162 419 581 Clear, concise labels Row Unknown, if needed In the very simplest terms, a table is a set of data that is organized into rows and columns. <Click> <Click> The boxes within the table that contain data are called “cells.” Usually, and particularly in epidemiology, these cells contain quantitative data. <Click> Each row and column in a table should have corresponding labels that are clear and concise. In most simple tables, the categories should include all values that the variable can take. Note that the sex variable has category labels “Male,” “Female,” and it even has an Unknown category. Each row and column should be accompanied by totals. Include a row or column for unknown or missing values, if needed. A successfully and correctly structured table, particularly one for a written report or scientific manuscript, should be self-explanatory, meaning that it is easily understandable and capable of standing alone. The tile should describe the What (topic), Where, and When. So the table should make sense when viewed all by itself, independent of the report or manuscript to which it pertains. If any codes, abbreviations, symbols, exclusions, or data sources are used, these should always be explained by a footnote. Cell Row totals Column Footnote, source CDC. Post-earthquake injuries treated at a field hospital — Haiti, MMWR 59:

Types of Tables 1-variable table (frequency distribution)
Range of values of a single variable Number of observations with each value 2-variable table Counts shown according to 2 variables at once 3-variable table Counts shown according to 3 variables at once Composite (combination) tables There are many different types of tables that can be used to organize and present data. Some of the more popular include: 1 variable tables 2 and 3 variable tables For publication purposes, several simple tables, each of different variable, can be combined and presented in a single “composite” table. We will discuss the first two types of tables and briefly show you an example of the more complicated kinds of tables.

Example of 1-Variable Table — Tuberculosis Cases by Sex, U.S., 2009
Table 1. Number of Reported Cases of Tuberculosis, by Sex, United States, 2009 Sex # Cases Males 6,990 Females 4,544 Unknown 11 Total 11,545 Here is an example of a simple 1-variable table. The disease (the What) is tuberculosis, the variable is sex, the measure is the number or count of reported cases. You can see that a total of 11,545 cases of TB were reported in the United States in 2009, and that more cases were reported in males than in females. CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Example of 1-Variable Table — Tuberculosis Cases by Age, U.S., 2009
Table 2. Number of Reported Cases of Tuberculosis, by Age, United States, 2009 Age Group (years) # Cases ≤ 5 – 15 – 24 1,274 25 – 44 3,893 45 – 64 3,434 ≥65 2,292 Unknown 6 Total 11,545 Here is another example, from the same TB data set. For sex, there were only two categories – males and females. For age, there are 80 or more categories. It is impractical to list ever possibility, so the CDC TB program grouped the ages into a small number of age groups. The largest number of cases occurred in the year-old age group, followed by the year-old age group. CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Example of 1-Variable Table, with Percent Column
Table 2. Number of Reported Cases of Tuberculosis, by Age, United States, 2009 Age Group (years) # Cases Percent ≤ % 5 – % 15 – 24 1, % 25 – 44 3, % 45 – 64 3, % ≥65 2, % Unknown % Total 11, % One variable tables (also called frequency distributions) list the values or categories that a variable can take, and the frequency with which each value appears in the data set. In addition, some tables add a percent column, as shown here. When a percent column is added, the table provides the relative frequency or proportional distribution of each value. The table shows the proportion of the total number of observations that appears in that category. The percent is computed by dividing the number of values in each category by total number in the table. Percents are particularly useful for comparing sets of data with unequal numbers of observations. CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Example of 1-variable Table, with
Percent and Cumulative Percent Columns Table 2. Number of Reported Cases of Tuberculosis, by Age, United States, 2009 Age Group # Cases Percent Cum Pct ≤ % 3.5% 5 – % 5.6% 15 – 24 1, % 16.6% 25 – 44 3, % 50.4% 45 – 64 3, % 80.1% ≥65 2, % 99.9% Unknown 6 0.1% 100.0% Total 11, % You can add yet another column, the cumulative percent or cumulative frequency. Here, the percent of a given row is added together with the percents of all previous rows. So the 5.6% cumulative percent for the 5-14 category is the sum of 3.5% plus 2.1%. One can then say that only 5.6% of TB cases occurred in persons younger than 15 years of age. You can also see that 50.4% of cases, or roughly half the cases, occurred in persons 44 years of age and under. Thus the median age (at 50%) is in the year age category, probably close to the upper end of that range. The cumulative percent or cumulative frequency column shows the percentage of the total number of observations that have a value less than or equal to the upper limit of the category. The cumulative frequency is computed by adding the values in that category and all previous categories, then dividing by total number in table. The cumulative frequency column is particularly useful for identifying median comparing sets of data with unequal numbers of observations CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Creating Categories Mutually exclusive, all inclusive Choices
Standard categories for the disease Equal intervals Equal numbers within each group Include category for unknown values When analyzing data, begin with more categories, then collapse into a smaller number of categories for presentation Earlier, you saw that the CDC TB program created categories or groupings for age. Does every program use the same age groups that the TB program used? <No> How does one choose categories? Here are some guidelines. First the categories should be mutually exclusive and all inclusive. That is, the categories should cover every possible value, with no overlap. Second, if standard categories exist for your particular disease, use those categories. For example, rotavirus infection occurs in children under 5 years. So a surveillance report on rotavirus might use age categories of 0-2 months, 3-5 months, 6-11 months, months, months, months, months, and 60+ months. Would a surveillance report on Alzheimer’s Disease use the same age categories? <Obviously, no.> So your first choice should be to use the categories that are standards for the disease you are working on. An alternative is to use equal intervals, such as 10-year age groups (0-9, 10-19, 20-29, etc.). A third choice is to create quartiles or quintiles, with equal numbers within each group. Because most epidemiologic data sets, particularly surveillance data sets, contain missing values, be sure to include a category for unknown or missing values. Finally, this is a tip -- when analyzing data, begin with more categories, so you can see patterns and unusual features. For presentation purposes, you can collapse the data into a smaller number of categories, which can be grasped more quickly by the readers or audience.

Some Standard Categories in U.S.
Notifiable Diseases P&I mortality NCHS mortality HIV/AIDS < 1 year 1-4 5-9 10-14 15-19 20-24 25-29 30-39 40-49 50-59 ≥60 Not stated Total < 28 days 28 d – 1 yr 1-14 15-24 25-44 45-64 65-74 75-84 ≥85 Unknown 5-14 25-34 35-44 45-54 55-64 < 5 years 5–12 13–14 15–19 20–24 25–29 30–34 35–39 40–44 45–49 50–54 55–59 60–64 ≥65 Even within the same agency, different programs use different age categories. Some are relatively obvious, but some are not. Five- and 10-year age groups are very common. For mortality data, the first category tends to be either deaths in children <28 days (which is the definition of neonatal mortality) or deaths in children <1 year (which is the definition of infant mortality). But look at the HIV/AIDS age categories. Would any of you have chosen those exact categories? When you think about the epidemiology of HIV/AIDS, the categories make sense (<5 years captures mother-to-child transmission), 5-12 is a period of very low risk, begins sexual activity and drug use, then the categories conform to standard 5-year age groups.

Two-Variable Tables Shows counts according to two variables simultaneously Also called “cross-tab” or contingency tables A two-variable table shows data broken down by two variables simultaneously. Two-variable tables are also called “cross-tabs” (short for cross-tabulation) or contingency tables. Earlier, you saw a table of TB cases by sex, and a separate table of TB cases by age group. We could use a two-variable table to show you a table of TB cases by sex and age group at the same time.

Example of Two-variable Table
Table 3. Number of Reported Cases of Tuberculosis, by Age and Sex, United States, 2009 Age Group Females Males Unk Total ≤ 5 – 15 – ,274 25 – 44 1,641 2, ,893 45 – 64 1,153 2, ,434 ≥ , ,292 Unknown Total 4,554 6, ,545 This is the table of number of reported cases of tuberculosis by age group and sex. CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Drank from stream near Campsite 6?
Example of Two-by-Two Table Ill Well Yes 18 4 22 No 5 39 44 23 43 66 Drank from stream near Campsite 6? This is an example of a particular type of two-variable table, the so-called Two-by-Two table. It is called a Two-by-Two table because each of the two variables has only two categories. These data are from an outbreak of gastroenteritis among campers. What can you learn from this small, compact table? Beginning with the totals outside the table, – a total of 66 campers are included in this analysis – 23 became ill, 43 did not – 22 drank from the stream near campsite 6, 44 did not Looking at the data within the table – 18 (of 22) who drank water from the stream near campsite 6 became ill. – Among the 44 campers who did not drink from the stream, only 5 became ill. So a lot of data is packed into this one little table.

Drank from stream near Campsite 6?
Example of Two-by-Two Table Ill Well Attack Rate (%) Yes 18 4 22 81.8% No 5 39 44 11.4% 23 43 66 Drank from stream near Campsite 6? In addition, some epidemiologists like to provide the attack rate next to the table. For the 22 campers who drank from the stream, 18 became ill, which is an attack rate of 81.8%. In contrast, the attack rate for those who did not drink from the stream was only 11.4% (5 / 44). So drinking from the stream seems strongly related to risk of illness.

Example of Three-variable Table
Table 3. Number of Reported Cases of Tuberculosis, by Age, Sex, and Birth Country, United States, 2009 Females Males Age group U.S. Other Total ≤ 5 167 20 183 31 401 5–14 82 37 72 54 245 * 15–24 178 377 207 499 1,274 25–44 411 1,215 635 1,591 3,893 45–64 463 669 1,172 1,080 3,434 65+ 365 509 631 761 2,292 1,667* 2,829* 2,900* 4,019* 11,545 Very briefly I will show you a three variable table but we will not be going into this level of detail in the practical session following this Usually too busy for PowerPoint slide On paper, three-variable is maximum most people can digest Three variables can sometimes fit on a Powerpoint slide, but how long would it take a member of the audience to identify the important patterns in this table? * Totals includes cases with missing age, sex, or birth country CDC. Reported Tuberculosis in the U.S., Atlanta: CDC, October 2010.

Composite (Combination) Tables
Combines two or more 1-way or 2-way tables Uses limited space efficiently Well suited for written and oral presentations, but simple tables must be prepared first Because space in journals is limited, sometimes several tables like the ones you have seen are combined into a “composite” table. These composite tables are more compact than a series of individual tables, so they use the available space more efficiently. However, to create a composite table, the individual tables must (and should) be prepared first.

Composite Table Example
I have included here a copy of a composite slide, so that you may refer to it from the handouts in your own time. Suffice to say, they are very detailed and require a lot of time to understand. Here is an example of a composite table from an article that appeared in the Journal of Infectious Diseases. The article describes a study of avian influenza (H5N1) virus among poultry vs. laboratory workers in Nigeria. The investigators conducted a study (questionnaire and serum) of 295 poultry workers and 25 lab workers with median 14 days of exposure to suspected or confirmed H5N1-infected poultry, with minimal personal protective equipment. This table is actual a series of two-variable tables. One variable is a demographic variable, and the second variable is type of worker (poultry or lab). How many different tables are combined into this one table? Age Sex Occupation Level of education Monthly household expenses So this single table (Table 1) combines 5 different tables. Ref: Ortiz JR, Katz MA, Mahmoud MN, et al. Lack of evidence of avian-to-human transmission of avian influenza A (H5N1) virus among poultry workers, Kano, Nigeria, J Infect Dis 2007; 196: Ortiz, Katz, Mahmoud, et al. J Infect Dis 2007;196:

Why Tables? When too many records, summarize in table (or graph)
Allow you to identify, explore, understand, and present distributions, trends, relationships, variations, and exceptions in the data Tables serve as basis for graphs – always create a table first!

Some Tips for Creating Printed Tables
Keep it simple Should be self-explanatory Title (what, where, when) with table number Label each row and column clearly and concisely Include units of measurement (years, mg/dl, etc.) Show totals for rows and columns Explain codes, abbreviations, symbols Note any exclusions in a footnote Note source in a footnote

Graphs Now we will talk about graphs.

Graphs Display quantitative data using a set of coordinates
Rectangular graphs (x, y coordinates) most common x axis along bottom = method of classification, often time y axis along side = frequency, usually number, percent or rate

Graphs: Advantages and Disadvantages
Easy to understand and interpret Reveal patterns in data Useful for generating hypotheses Useful before formal data analysis Disadvantage Loss of detail

Graph Types Arithmetic-scale line graph Histogram
Many other types, not covered in this lecture Semilogarithmic-scale line graph Frequency polygon Cumulative frequency curve Survival curve Scatter diagram

Arithmetic Scale Line Graph
# Cases Useful to portray data collected over time Intervals on y-axis are equal An arithmetic-scale line graph is probably the most common type of graph used, particularly with surveillance data The features of an arithmetic-scale line graph are: It is a rectangular graph (Remember, X-axis is the axis along the bottom, Y-axis is the axis along the side) <Click> X-axis has equal intervals, so a certain distance (such as 1 cm) represents the same number of years (e.g., 5 years), anywhere along the line X-axis usually portrays time Y-axis also has equal intervals Y-axis portrays number, rate, or proportion, for example, number of cases of disease, or incidence rate per 100,000 population, or percent of population with a particular characteristic (e.g., percent of population who smoke) over time So that is the reason arithmetic-scale line graphs are used – to portray data collected over time, i.e., to portray the time trend or pattern Multiple diseases or other characteristics can be displayed on the same graph, so patterns can be compared Start y-axis at 0; use scale breaks only if you must Intervals on x-axis are equal

Creating a Line Graph Make x-axis longer than y-axis (best ratio 5:3)
X-axis: Match x-axis scale to intervals used during data collection Y-axis: Always start y-axis with 0 Identify largest value, round up for maximum Y value Select reasonable intervals for y-axis Plot data Create title Add comments, footnotes Now, let’s focus on creating an arithmetic-scale line graph 1. Draw x- and y-axes. Most visually appealing X:Y ratio (and best for PowerPoint, computer screens, and projection screens) is 5:3 2. X-axis: Match x-axis scale to intervals used during data collection, e.g., for what range of years do you have data? 4. Y-axis: Always start y-axis with 0 Determine range of values on y-axis by identifying the largest value Select interval size for y-axis that will provide enough intervals to illustrate data in adequate detail 5. Plot the data 6. Create a title that describes the data, the location, and the time period 7. Add notations, footnotes, indicate the source

Creating a Line Graph: X-axis and Y-axis
First of all, when creating a line graph, create a rectangle with the X-axis along the bottom and the Y-axis along the left side. For most written reports and PowerPoint presentations, a “landscape” orientation works best, with an X:Y ratio of about 5:3.

Creating a Line Graph: Complete X-axis, Label X-axis
Data for Years 1960 – 2008 Next, determine the range of data for the x-axis, and create appropriate intervals. For this example we have data on the number of reported measles cases in the U.S. from 1960 to 2008. While we can have tick marks for every year, the x-axis could not fit a label for each year. Labelling every 5 years seems reasonable. Don’t forget to label the x-axis itself with the word “year”.

Creating a Line Graph: Complete Y-axis, Label Y-axis
481,530 cases in 1963 Number of Cases Next, determine the range of data for the y-axis, and create appropriate intervals. The y-axis must start at 0. The maximum value was over 481,000 cases of measles in What would you use as the maximum value for the y-axis? (Reasonable choice – 500,000) <Click> At CDC they chose to give themselves more room at the top, so they chose 600,000. And they chose to lable intervals of 100,000, which seems reasonable. Don’t forget to label the y-axis itself with the “number of cases”.

Creating a Line Graph; Plot the data
Number of Cases The next step is to plot the actual data. <Click>

Creating a Line Graph: Add Title
Number of Reported Cases of Measles by Year, United States, 1960–2008 Number of Cases The next step is to add a title. The title should describe what the data are (number or rate, disease or condition), where, and when

Number of Reported Cases of Measles by Year, United States, 1960–2008
Creating a Line Graph: Add Comments, Footnotes, Source Number of Reported Cases of Measles by Year, United States, 1960–2008 Number of Cases Vaccine licensed Finally, add comments or notations, footnotes, and source What do you think about the bump at about 1990? How might you explore, at least graphically, what was happening at that time? (Answer = Inset) CDC. Summary of Notifiable Diseases, U.S., Atlanta: CDC, June 2010.

Number of Reported Cases of Measles by Year, United States, 1960–2008
Graph with Inset Number of Reported Cases of Measles by Year, United States, 1960–2008 Number of Cases Vaccine licensed Inset = magnified portion of the larger graph, to see small section of data in more detail Variable remains the same CDC. Summary of Notifiable Diseases, U.S., Atlanta: CDC, June 2010.

Age-Adjusted Death Rates for Leading Causes of
Death, United States, Deaths per 100,000 Here is another arithmetic-scale line graph. Line graphs are very useful for comparing 2 or more series of data. In this graph, you can see the downward trend of heart disease, while mortality from cancer is relatively flat. Do you think that, in a few years, mortality from heart disease will fall below mortality from cancer? †

Comments on Arithmetic-Scale Line Graph
Method of choice for plotting rates over time X-axis almost always time (rarely, age) Y-axis can be counts, proportions, or rates Y-axis should start with 0 Determine largest value of Y needed to plot Round off that number and divide into intervals Set distance on either axis represents same quantity anywhere on that axis Good for comparing 2 or more sets of data

Histogram “Epidemic curve” in outbreak investigations
Frequency distribution of quantitative data x axis continuous, usually time (onset or diagnosis date) No spaces between adjacent columns, i.e., adjacent columns “touch” Easiest to interpret with equal class (x) intervals Column height proportional to number of observations in that interval

Histogram

No spaces between adjacent columns
Number of Cases of Salmonella Enteritidis by Date of Onset, Chicago, February 2000 Party One Case No spaces between adjacent columns This is an epidemic curve from an investigation of Salmonella enteriditis among party attendees by date and time of symptom onset, Chicago, Illinois, February As you can see, there is no space between adjacent columns. Often, the columns of a histogram are shown as a single bar. Feb Date and Time of Symptom Onset

Number of Cases of Salmonella Enteritidis
by Date of Onset, Chicago, February 2000 Party One Case Here is the same histogram. For outbreaks, some epidemiologists like to draw the columns of an epidemic curve as a stack of boxes. This is personal preference. Feb Date and Time of Symptom Onset

Number of Cases of Salmonella Enteritidis
by Date of Onset, Chicago, February 2000 Party Probable Case Culture-confirmed Case Either way, cases can be distinguished from one another by shading. Here, culture-confirmed cases are in gray, probable cases in white. Feb Date and Time of Symptom Onset

Charts Now we will talk about charts.

Charts Display quantitative data using only one coordinate
Most appropriate for comparing data with discrete categories Common types include: Bar charts Pie charts Maps Other

Bar Charts Can be vertical or horizontal
Use for variable with discrete, non-linear categories, such as county Has space between “columns”, since categories are not continuous 4 types – simple, grouped, stacked, 100% Best type depends on desired emphasis

Reported TB Cases by Race/Ethnicity United States, 2001 (Simple Bar)
Here is a bar chart showing the number of reported cases of tuberculosis in the United States by race/ethnicity. Do the colors add any meaning to the chart? <No>

Here is the same chart, without the distracting colors. What about the order of the columns? <Seemingly random. Not alphabetic, not by number of cases>

HCV Prevalence by Selected Groups, United States
Hemophilia Injecting drug users Hemodialysis STD clients Gen’l pop’n adults Surgeons Pregnant women Military personnel Average Percent Anti-HCV Positive

Number of Reported Tuberculosis Cases by Birth Country and Year, U. S
(Grouped Bar Chart) Number of Cases No. of Cases

Number of Reported Tuberculosis Cases by Birth Country and Year, U. S
(Stacked Bar Chart) Number of Cases No. of Cases

100% Component Bar Chart All bars same height (100%)
Components shown as proportions of the total, not actual values Good for comparing how components contribute to the whole within a group Not useful for comparing relative sizes of the components across different groups because the denominator changes Not useful for comparing relative sizes of the various categories of the main variable, the size of the group represented by each bar

(100% Component Bar Chart)
Number of Reported Tuberculosis Cases by Birth Country and Year, U.S., (100% Component Bar Chart) Proportion of Cases No. of Cases

Pie Charts Show components of a whole
Size of “slice” = proportional contribution of each component Hard to compare two or more pie charts Begin at 12 o’clock with largest slice and proceed clockwise Provide label and percent for each slice Don’t use 3-D!

Reported TB Cases by Race/Ethnicity United States, 2001 (Pie Chart)
Hispanic (25%) Black, non-Hispanic (30%) Asian/Pacific Islander (22%) White, non-Hispanic (21%) American Indian/ Alaska Native (1%)

Some Tips for Creating Printed Graphs
Should be self-explanatory Title (what, where, when) with table number Label each axis clearly and concisely Include units of measurement (years, mg/dl, etc.) In epidemiology, start Y-axis at zero Epidemic curve = histogram

Selecting the Right Presentation Method 1
Type of Graph or Diagram Application Arithmetic Scale Graph Histogram Number, proportion or rate over time 1.Frequency distribution for a continuous variable 2. Number of cases during an epidemic (epidemic curve) or over time <Skip over these two slides quickly> Serves as a reminder for what kind of data suits what presentation method

Selecting the Right Presentation Method 2
Type of Graph or Diagram Application Simple bar chart Grouped bar chart Stacked bar chart Pie chart Compare the size or frequency of different categories of the same variable Compare the size or frequency of different categories across 2 or more variables Compare totals and display component parts for 2 or more categories of second variable Display parts of a whole

Question 1 — What’s Wrong With This Graph?
Reported Tuberculosis Cases, United States, No. of Cases First, describe the trend. <Number of TB cases declined from 1980 to about 1984, then flattened, then rose in late 1980s, peaked in 1992, then resumed the decline. What’s wrong (or at least misleading) about this figure? <Y-axis does not start at 0> Year Source:

Reported Tuberculosis Cases, United States, 1981-2007
Answer 1 – Misleading Reported Tuberculosis Cases, United States, No. of Cases Here is the same graph, with the Y-axis beginning at 0. The descdription of the trend is exactly the same: Number of TB cases declined from 1980 to about 1984, then flattened, then rose in late 1980s, peaked in 1992, then resumed the decline. But in the previous graph it looked like TB was close to elimination. In this graph, you can see that we still have work to do! Year Source:

Question 2 — What’s Wrong With This Epi Curve?
Should be histogram (adjacent columns should touch) Eliminate 3-D effect (unnecessary) Axes should be labeled Needs a title

Number of Cases of Gastroenteritis, Warehouse Workers, TN, August 2003
Catered dinner * * Not counted as case

Question 3 — What’s Wrong With This Graph?
Rate* of Invasive Pneumococcal Disease by Age Group -- United States, 1998 Interval on X-axis should be the same across the axis. What are possible solutions? Histogram with different column widths (difficult to draw, looks odd) Use bar chart (no presumption that age groups have same number of years) * Rate per 100,000 population

Rate* of Invasive Pneumococcal Disease by Age Group – U.S., 1998
* Rate per 100,000 population

Question 4 — What’s Wrong With This Table?
Number of Reported Cases of Syphilis (P&S) by Age, United States, 2002 Age Group (years) # Cases < 15 – 20 – 25 – 30 – 35 1,097 35 – 40 1,367 40 – 45 1,023 45 – Total 6,862 Answer: Overlapping categories. In which interval would you record a 20-year old with syphilis?

Number of Reported Cases of Syphilis (P&S) by Age, United States, 2002
Age Group (years) # Cases < 15 – 20 – 25 – 30 – 34 1,097 35 – 39 1,367 40 – 44 1,023 45 – ≥ Total 6,862

Summary Data can be organized through the creation of tables, graphs and charts The purpose of creating these visual displays verify and analyze the data explore patterns and trends communicate information to others An effective figure should be able to be interpreted without any additional information

Summary 2 Tables can illustrate the number of people with particular characteristics and can provide valuable information about relationships between 2 variables Line graphs are useful for showing patterns or trends over some variable, usually time Histograms are most commonly used in epidemiology for epidemic curves (cases by time) Bar charts provide a visual display of data from a one-variable table, but grouped bar charts can show 2 variables

Conclusion Choose the tool that best serves the data and purpose
Start with tables Use appropriate titles and labels Print ≠ PowerPoint KISS (message, colors, dimensions)

Organizing and Displaying Epidemiologic Data with Tables and Graphs

Similar presentations

Presentation on theme: "Organizing and Displaying Epidemiologic Data with Tables and Graphs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Organizing and Displaying Epidemiologic Data with Tables and Graphs

Similar presentations

Presentation on theme: "Organizing and Displaying Epidemiologic Data with Tables and Graphs"— Presentation transcript:

Similar presentations

About project

Feedback