Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public.

Similar presentations


Presentation on theme: "1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public."— Presentation transcript:

1 1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public Health) for lecture material

2 2 Class Plan Data Presentation (Lec 2 overview) Example (hand/SAS) Mean and variance Describing Data (and in next class) Simulating Data (and in next class)

3 3 Outline Descriptive Statistics – means of organizing and summarizing observations Types of data Data presentation and numerical summary measures

4 4 Types of data Nominal Data Ordinal Data Rank Data Discrete Data Continuous Data

5 5 Types of data Nominal Data 1: male 0:female Nominal data values fall into unordered categories or classes

6 6 Types of data Ordinal Data Observations with order among categories are referred to as ordinal 1.Mild 2.Moderate 3.Severe

7 7

8 8 Cause19991998 Floodgates/Canal Lock 15 9 Human Related 8 6 Natural 43 21 Perinatal 52 53 Watercraft 82 66 Undetermined 69 76 Total 263231 Example: Death of Manatees in Florida Florida Fish and Wildlife Conservation Commission Nominal categories

9 9 Cause19991998 Rank Floodgates/Canal Lock 15 9 4 Human Related 8 6 5 Natural 43 21 3 Perinatal 52 53 2 Watercraft 82 66 1 Undetermined 69 76 Total 263231 Example: Death of Manatees in Florida Florida Fish and Wildlife Conservation Commission Ranked data

10 10 Types of data Discrete Data Both order & magnitude important Data consists of restricted set of values e.g. Data on number of children per subject Subject Number of children 12 23 31 42 54

11 11 Types of data Continuous Data Data represents measurable quantities, but are not restricted to taking on specific values US adult heights US adult individual cholesterol measurements

12 12 Outline Descriptive Statistics – means of organizing and summarizing observations Types of data Data presentation and numerical summary measures

13 13 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) –One way scatter plot Continuous Data: –Box plot –2 way scatter plot –Line Graph

14 14 Example: Serum cholesterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Number of Men 80—119 13 120—159150 160—199442 200—239299 240—279115 280—319 34 320—359 9 360—399 5 Total1,067 Frequency Table

15 15 Example: Serum cholesterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Number of Men Relative Frequency (%) 80—119 13 1.2 120—15915014.1 160—19944241.4 200—23929928.0 240—279115 10.8 280—319 34 3.2 320—359 9 0.8 360—399 5 0.5 Total1,067100.0 Frequency Table

16 16 Bar Chart http://www.ncsu.edu/labwrite/res/gh/gh-bargraph.html#horizbar Label axes; Leave space between bars Car defects in three factories

17 17 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) Continuous Data: –Box plot –2 way scatter plot –Line Graph

18 18 Histogram Example

19 19 Histogram Choosing the number of bins – depends on range of data Equal widths of bins recommended When data demands unequal bin widths, take care to plot area proportional to relative frequency Key points

20 20 Histogram A histogram represents percentages by areas* Density scale (Y axis): the height of each block (bin) equals the percentage in that block (bin) divided by the bin width Total area of histogram = 100% When bin widths are equal – it is common for the histogram to show just the counts in each bin Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Key points

21 21 Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Histogram - example

22 22 Percent Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf Histogram - example

23 23 Histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

24 24 Histogram Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

25 25 Histogram Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

26 26 Histogram density -2.0-0.40.402.0 Constructing a 100% area histogram Source: http://www.stat.berkeley.edu/users/rice/Stat2/Chapt3.pdf

27 27 Serum cholesterol level of men (1976-1980 survey) Cholesterol Level (mg/100 ml) Relative Frequency 25-34 yrs (%) Relative Frequency 55-64 yrs (%) 80—119 1.2 0.4 120—15914.1 3.9 160—19941.421.6 200—23928.037.3 240—279 10.822.9 280—319 3.210.4 320—359 0.8 2.9 360—399 0.5 0.6 Total100.0 Frequency Polygon - Example

28 28 Frequency Polygon - Example

29 29 Serum choleterol level of men aged 25-34 years. Cholesterol Level (mg/100 ml) Relative Frequency (%) Cumulative 80—119 1.2 120—15914.1 15.3 160—19941.4 56.7 200—23928.0 84.7 240—279 10.8 95.5 280—319 3.2 98.7 320—359 0.8 99.5 360—399 0.5 100.0 Total100.0 Frequency Polygon - Example

30 30 Frequency Polygon - Example

31 31 Frequency Polygon - Example

32 32 Data Presentation Nominal / Ordinal Data: –Frequency (relative frequency) tables –Bar charts Discrete/ Continuous Data: –Histogram (Frequency Polygon) Continuous Data: –Box plot –2 way scatter plot –Line Graph

33 33 Example - Dyslipidemia in HIV Cohort Histogram reveals an asymmetric, skewed distribution

34 34 Example - Dyslipidemia in HIV Cohort Natural log transformation of the data results in a more symmetric distribution

35 35 Box plot Dyslipidemia in HIV Cohort 50 th percentile Natural log transformed Triglyceride measurements 25 th percentile 75 th percentile UB LB UB (LB) = most extreme data point that is within 1.5 times box width (IQR) of the 75 th (25 th ) percentile Outliers

36 36 Box plot Dyslipidemia in HIV Cohort

37 37 2 way scatter plot Dyslipidemia in HIV Cohort Reveals relationship between 2 continuous variables

38 38 Summary Data Types: –Nominal –Ordinal –Discrete –Continuous Data presentation (Nominal/Ordinal data): –Tables (Frequency, Relative Frequency) –Bar charts Data presentation (Discrete/Continuous) –Histogram (Frequency Polygon) Data presentation (Continuous) –Box plot, shapes of distributions –2 way scatter plot

39 39 In-Class Example Distance willing to Travel to a Household Hazardous waste site: DistanceFreq < 1 mile75 1>-2 miles90 2>-5 miles45 5>-10 miles90 300 Histogram, Polygon, Cum % Dist.

40 40 In-Class Example Distance willing to Travel to a Household Hazardous waste site: DistanceFreq%/mile < 1 mile752525 >1-2 miles903030 >2-5 miles4515 5 >5-10 miles9030 6 300 Histogram, Polygon, Cum % Dist.

41 41 Histogram of Travel Distance (miles) for n=300 Density Distance (Miles) 0 1 2 3 4 5 10

42 42 Polygon of Travel Distance (miles) for n=300 Density Distance (Miles) 0 1 2 3 4 5 10

43 43 Cumulative % of Travel Distance (miles) for n=300 Cum. Percent Distance (Miles) 0 1 2 3 4 5 10 0 25 50 75 100


Download ppt "1 Introduction to Biostatistics (BIO/EPI 540) Data Presentation Graphs and Tables Acknowledgement: Thanks to Professor Pagano (Harvard School of Public."

Similar presentations


Ads by Google