# BIOSTATISTICS Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University 1.

## Presentation on theme: "BIOSTATISTICS Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University 1."— Presentation transcript:

BIOSTATISTICS Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University 1

In God we trust. All others must bring data. 3

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H.G. Wells 4

5

 Statistical ideas can be intimidating and difficult  Thus:  Statistical results are often “skipped-over” when reading scientific literature  Data is often mis-interpreted 6

“On average, my class is doing well. Half of my students think that 2+2=3, the other half thinks that 2+2=5.” 7

 A Bar Chart is a map of the locations of the nearest taverns  A p-value is the result of a urinalysis  Martingale residuals are the droppings of a rare bird  A t-test is a blinded taste test between black and green tea 8

 Spends Saturday nights in the library  Favorite Movie: Revenge of the Nerds  Doesn’t play golf  A necessary evil  SMART! 9

“A statistician is a person that is good with numbers but that lacks the personality to become an accountant.” 10

 The science of collecting, monitoring, analyzing, summarizing, and interpreting data  This includes study design 11

 Statistics applied to biological (life) problems, including:  Public health  Medicine  Ecological and environmental  Much more statistics than biology, however biostatisticians must learn the biology as well 12

 Identify and develop treatments for disease and estimate their effects.  Identify risk factors for diseases.  Design, monitor, analyze, interpret, and report results of clinical studies.  Develop statistical methodologies to address questions arising from medical/public health data. 13

 Combines rigors of mathematics with uncertainties of the real world.  Can make contribution to advancement of science, statistics, medicine, and public health.  Can study diseases/health problems in which you may have an interest (cancers, HIV, reproductive health, …). 14

 Much of life is composed of a systematic component (i.e., signal) and a random component (i.e., error or noise)  Example:  Smoking is associated with lung cancer.  Yet not everyone that smokes, gets lung cancer, and not everyone that gets lung cancer, smokes  Yet we know that there is an association (a systematic component) 15

 Our challenge  Identify the systematic component (separate it from the random component), estimate it, and perhaps make inferences with it 16

 Population  A group of individuals that we would like to know something about  Parameter  A characteristic of the population in which we have a particular interest  Often denoted with Greek letters ( μ, σ, ρ )  Examples:  The proportion of the population that would respond to a certain drug  The association between a risk factor and a disease in a population 17

 คือ ค่าต่างๆ ที่รวบรวมมาจากประชากรหรือ คำนวณได้จากประชากร ใช้อักษรกรีกเป็น สัญลักษณ์ ได้แก่   แทนค่า ค่าเฉลี่ยเลขคณิต   แทนค่า ส่วนเบี่ยงเบนมาตรฐาน   2 แทนค่า ความแปรปรวน   แทนค่า สัมประสิทธิ์สหสัมพันธ์

 คือค่าต่างๆ ที่รวบรวมมาจากกลุ่มตัวอย่างหรือ คำนวณได้จากกลุ่มตัวอย่าง ใช้ตัว ภาษาอังกฤษเป็นสัญลักษณ์ ได้แก่  แทนค่า ค่าเฉลี่ยเลขคณิต  s แทนค่า ส่วนเบี่ยงเบนมาตรฐาน  s 2 แทนค่า ความแปรปรวน  r แทนค่า สัมประสิทธิ์สหสัมพันธ์

 Sample  A subset of a population (hopefully representative)  Statistic  A characteristic of the sample  Examples:  The observed proportion of the sample that responds to treatment  The observed association between a risk factor and a disease in this sample 20

 Studying populations is too expensive and time- consuming, and thus impractical  If a sample is representative of the population, then by observing the sample we can learn something about the population  And thus by looking at the characteristics of the sample (statistics), we may learn something about the characteristics of the population (parameters) 21

Populations and Samples 22 Sample / Statistics x, s, s 2 Population Parameters μ, σ, σ 2

 Two steps  Descriptive Statistics  Describe the sample  Inferential Statistics  Make inferences about the population using what is observed in the sample  Primarily performed in two ways:  Hypothesis testing  Estimation 23

 Samples are random  If we had chosen a different sample, then we would obtain different statistics (sampling variation or random variation)  However, note that we are trying to estimate the same (constant) population parameters 24

 Describe the Sample  Begin one variable at a time  Describe important variables in your analyses (e.g., endpoints, demographics, confounders, etc.) 25

 Several types of data  Nominal  Ordinal  Discrete  Continuous  Time-to-event with censoring  The type of data influences the analysis methods to be employed 26

ScaleCharacteristic Question Examples Nominal Is A different than B?Marital status Eye color Gender Religious affiliation Race Ordinal Is A bigger than B?Stage of disease Severity of pain Level of satisfaction Interval By how many units do A and B differ? Temperature SAT score Ratio How many times bigger than B is A? Distance Length Time until death Weight Height 27

 Mutually exclusive unordered categories  Examples  Sex (male, female)  Race/ethnicity (white, black, latino, asian, native american, etc.)  Site  Can summarize in:  Tables – using counts and percentages  Bar chart/graph 28

 Ordered Categories  Examples  Adverse events  Mild, moderate, severe, life-threatening, death  Income  Low, medium, high 29

 Often only integer numbers are possible  If there are many different discrete values, then discrete data is often treated as continuous  Examples: CD4 count, HIV viral load  If there are very few discrete values, then discrete data is often treated as ordinal 30

 Any value on the continuum is possible (even fractions or decimals)  Examples:  Height  Weight  Many “discrete” variables are often treated as continuous  Examples: CD4 count, viral load 31

 Time to an event (continuous variable)  The event does not have to be survival  Concept of “Censoring”  If we follow a person until the event, then the survival time is clear  If we follow someone for a length of time but the event does not occur, the the time is censored (but we still have partial information; namely that the event did not occur during the follow up period)  Examples: time to progression (cancer), time to response, time to relapse, time to death 32

 Think of data as a rectangular matrix of rows and columns  Simplest structure  Rows represent the “experimental unit” (e.g., person)  Each row is an independent observation  Columns represent “variables” measured on the experimental unit  More complex structures  Multiple rows per person (e.g., multiple timepoints) 33

34

 It is ALWAYS a good idea to summarize your data (at least for important variables)  You become familiar with the data and the characteristics of the sample that you are studying  You can also identify problems with data collection or errors in the data (data management issues)  Range checks for illogical values 35

 Some visual ways to summarize data (one variable at a time):  Tables  Graphs  Bar charts  Histograms  Box plots 36

 Summarizes a variable with counts and percentages  The variable is categorical (e.g., nominal or ordinal) 37

Cause of Death FrequencyPercent Motor Vehicle48 Drowning14 House Fire12 Homicide77 Other19 Total100 38

 Note that you can take a continuous variable and create categories with it  How do you create categories for a continuous variable?  Choose cutoffs that are biologically meaningful  Natural breaks in the data  Precedent from prior research 39

 How to choose the categories?  Talk to physician about risk categories  May look at National Cholesterol Education Program (NCEP) guidelines and categories 40

41

 Bar Graphs  Nominal data  No order to horizontal axis  Histograms  Continuous or ordinal data on horizontal axis  Box Plots  Continuous data 42

 The width of the plot has no meaning  25% of the data <13  25% of the data 13-16  25% of the data 16-18  25% of the data >18 43

1. http://www.medpagetoday.com/lib/content/Medpage- Guide-to-Biostatistics.pdf http://www.medpagetoday.com/lib/content/Medpage- Guide-to-Biostatistics.pdf 2. http://faculty.franklin.uga.edu/dhall/sites/faculty.frank lin.uga.edu.dhall/files/lec1.pdf http://faculty.franklin.uga.edu/dhall/sites/faculty.frank lin.uga.edu.dhall/files/lec1.pdf 3. http://www.cartercenter.org/resources/pdfs/health/ep hti/library/lecture_notes/env_health_science_student s/ln_biostat_hss_final.pdf http://www.cartercenter.org/resources/pdfs/health/ep hti/library/lecture_notes/env_health_science_student s/ln_biostat_hss_final.pdf 4. http://www.meduniwien.ac.at/msi/biometrie/Lehre/p hdpropedeutics/Skripten/Medical%20Biostatistics%20 I%20version%202009-10.pdf http://www.meduniwien.ac.at/msi/biometrie/Lehre/p hdpropedeutics/Skripten/Medical%20Biostatistics%20 I%20version%202009-10.pdf 5. http://www.hstathome.com/tjziyuan/biostatistical%20 method.pdf http://www.hstathome.com/tjziyuan/biostatistical%20 method.pdf 44

- Read through Biostatistics book from http://www.hstathome.com/tjziyuan/biostatistical%20method.pdf http://www.hstathome.com/tjziyuan/biostatistical%20method.pdf, then - We need four groups to solve out these problems, then present it, what are they talking about? 1. Statistical Methods for Assessing Biomarkers (P.81-109) 2. Power and Sample Size Considerations in Molecular Biology (P.111-130) 3. Multiple Tests for Genetic Effects in Association Studies (P. 143-168) 4. Statistical Considerations in Assessing Molecular Markers for Cancer Prognosis and Treatment Efficacy (P. 169-189) 45

Download ppt "BIOSTATISTICS Asst. Prof. NIRUWAN TURNBULL, PhD Faculty of Public Health Mahasarakham University 1."

Similar presentations