Download presentation
Presentation is loading. Please wait.
Published byKathleen Alexander Modified over 9 years ago
1
1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material
2
2 Class Plan Data Presentation (Lec 2 overview) Example (hand/SAS) Mean and variance Describing Data (and in next class) Simulating Data (and in next class)
3
3 Outline Descriptive Statistics – means of organizing and summarizing observations Types of data Data presentation and numerical summary measures
4
4 Types of data Nominal Data Ordinal Data Rank Data Discrete Data Continuous Data
5
5 Types of data Nominal Data 1: male 0:female Nominal data values fall into unordered categories or classes
6
6 Types of data Ordinal Data Observations with order among categories are referred to as ordinal 1.Mild 2.Moderate 3.Severe
7
7
8
8 Cause19991998 Floodgates/Canal Lock 15 9 Human Related 8 6 Natural 43 21 Perinatal 52 53 Watercraft 82 66 Undetermined 69 76 Total 263231 Example: Death of Manatees in Florida Florida Fish and Wildlife Conservation Commission Nominal categories
9
9 Cause19991998 Rank Floodgates/Canal Lock 15 9 4 Human Related 8 6 5 Natural 43 21 3 Perinatal 52 53 2 Watercraft 82 66 1 Undetermined 69 76 Total 263231 Example: Death of Manatees in Florida Florida Fish and Wildlife Conservation Commission Ranked data
10
10 Types of data Discrete Data Both order & magnitude important Data consists of restricted set of values e.g. Data on number of children per subject Subject Number of children 12 23 31 42 54
11
11 Types of data Continuous Data Data represents measurable quantities, but are not restricted to taking on specific values US adult heights US adult individual cholesterol measurements
12
12 Outline Descriptive Statistics – means of organizing and summarizing observations Types of data Data presentation and numerical summary measures
13
13 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) –One way scatter plot Continuous Data: –Box plot –2 way scatter plot –Line Graph
14
14 Example: Serum cholesterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Number of Men 80—119 13 120—159150 160—199442 200—239299 240—279115 280—319 34 320—359 9 360—399 5 Total1,067 Frequency Table
15
15 Example: Serum cholesterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Number of Men Relative Frequency (%) 80—119 13 1.2 120—15915014.1 160—19944241.4 200—23929928.0 240—279115 10.8 280—319 34 3.2 320—359 9 0.8 360—399 5 0.5 Total1,067100.0 Frequency Table
16
16 Bar Chart http://www.ncsu.edu/labwrite/res/gh/gh-bargraph.html#horizbar Label axes; Leave space between bars Car defects in three factories
17
17 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) Continuous Data: –Box plot –2 way scatter plot –Line Graph
18
18 Histogram Example
19
19 Histogram Choosing the number of bins – depends on range of data Equal widths of bins recommended When data demands unequal bin widths, take care to plot area proportional to relative frequency Key points
20
20 Histogram A histogram represents percentages by areas* Density scale (Y axis): the height of each block (bin) equals the percentage in that block (bin) divided by the bin width Total area of histogram = 100% When bin widths are equal – it is common for the histogram to show just the counts in each bin Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Key points
21
21 Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Histogram - example
22
22 Percent Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Histogram - example
23
23 Histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
24
24 Histogram Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
25
25 Histogram Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
26
26 Histogram density -2.0-0.40.402.0 Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf
27
27 Serum cholesterol level of men (1976-1980 survey) Cholesterol Level (mg/100 ml) Relative Frequency 25-34 yrs (%) Relative Frequency 55-64 yrs (%) 80—119 1.2 0.4 120—15914.1 3.9 160—19941.421.6 200—23928.037.3 240—279 10.822.9 280—319 3.210.4 320—359 0.8 2.9 360—399 0.5 0.6 Total100.0 Frequency Polygon - Example
28
28 Frequency Polygon - Example
29
29 Serum choleterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Relative Frequency (%) Cumulative 80—119 1.2 120—15914.1 15.3 160—19941.4 56.7 200—23928.0 84.7 240—279 10.8 95.5 280—319 3.2 98.7 320—359 0.8 99.5 360—399 0.5 100.0 Total100.0 Frequency Polygon - Example
30
30 Frequency Polygon - Example
31
31 Frequency Polygon - Example
32
32 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) Continuous Data: –Box plot –2 way scatter plot –Line Graph
33
33 Example - Dyslipidemia in HIV Cohort Histogram reveals an asymmetric, skewed distribution
34
34 Example - Dyslipidemia in HIV Cohort Natural log transformation of the data results in a more symmetric distribution
35
35 Box plot Dyslipidemia in HIV Cohort 50 th percentile Natural log transformed Triglyceride measurements 25 th percentile 75 th percentile UB LB UB (LB) = most extreme data point that is within 1.5 times box width (IQR) of the 75 th (25 th ) percentile Outliers
36
36 Box plot Dyslipidemia in HIV Cohort
37
37 2 way scatter plot Dyslipidemia in HIV Cohort Reveals relationship between 2 continuous variables
38
38 Summary Data Types: –Nominal –Ordinal –Discrete –Continuous Data presentation (Nominal/Ordinal data): –Tables (Frequency, Relative Frequency) –Bar charts Data presentation (Discrete/Continuous) –Histogram (Frequency Polygon) Data presentation (Continuous) –Box plot, shapes of distributions –2 way scatter plot
39
39 In-Class Example Distance willing to Travel to a Household Hazardous waste site: DistanceFreq < 1 mile75 1>-2 miles90 2>-5 miles45 5>-10 miles90 300 Histogram, Polygon, Cum % Dist.
40
40 In-Class Example Distance willing to Travel to a Household Hazardous waste site: DistanceFreq%/mile < 1 mile752525 >1-2 miles903030 >2-5 miles4515 5 >5-10 miles9030 6 300 Histogram, Polygon, Cum % Dist.
41
41 Histogram of Travel Distance (miles) for n=300 Density Distance (Miles) 0 1 2 3 4 5 10
42
42 Polygon of Travel Distance (miles) for n=300 Density Distance (Miles) 0 1 2 3 4 5 10
43
43 Cumulative % of Travel Distance (miles) for n=300 Cum. Percent Distance (Miles) 0 1 2 3 4 5 10 0 25 50 75 100
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.