Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction and descriptive statistics 30th August 2006 Tron Anders Moger.

Similar presentations


Presentation on theme: "Introduction and descriptive statistics 30th August 2006 Tron Anders Moger."— Presentation transcript:

1 Introduction and descriptive statistics 30th August 2006 Tron Anders Moger

2 New England Journal of Medicine, Editorial, Jan. 6, 2000, p. 42-49 The eleven most important developments in medicine in the past millennium –Elucidation of human anatomy and physiology –Discovery of cells and their substructures –Elucidation of the chemistry of life –Application of statistics to medicine –Development of anesthesia –Discovery of the relation of microbes to disease –Elucidation of inheritance and genetics –Knowledge of the immune system –Development of body imaging –Discovery of antimicrobial agents –Development of molecular pharmacotherapy

3 Introduction A lot of knowledge appear through numbers and quantitative data. Problems in interpreting statistical results are often underestimated. Important to learn “numerical literacy” – the ability to understand numbers and quantitative relationships.

4 Number of births in former East Germany

5 Mortality in Tanzania and Norway

6 Research and numbers Numbers often appear in medical research. The numbers are often uncertain, they have variability They must be organized in order to interpret them Wish to generalize the results to the general population

7 Statistical data Appear from: Numerical measurements with an instrument on a continuous scale (Continuous data). Examples: – Fever: 39.6 (Unproblematic) – IQ: 116 (Problematic) Categorization (categorical data). Examples: –Man / woman (Unproblematic) –Depressed / Not depressed (Problematic)

8 Reliability: Precision of data? How much will they differ if the measurements are repeated? Validity: Do we capture what we are really interested in? Is the measurement relevant? Variability in the data

9 Reliability of lung function measurements 6 repeated measurements on 12 students.

10 Reliability of questionnaire/interview Alcohol use (men 31-50 years): –Mean number of times alcohol users say that they have felt intoxicated: 1993 (questionnaire): 14.1 times per year 1994 (interview): 7.3 times per year In 1994 they used the word drunk.

11 Reliability of clinical study Sackett et al: Clinical Epidemiology (Little, Brown and Company, 1985). Pictures of the eye of 100 patients are studied by two clinicians to see if there is evidence of retinopathy Second clinician No Yes First No: 46 10 clinician Yes: 12 32 Observed agreement: (46+32)/100 =78%

12 Sources of variation in data Laboratory variation Observer variation Instrument variation Measurement variation Biological variation between individuals Day to day variation within the same individual/hospital

13 Generalization Sample: The units, experiments, individuals etc. that are in the study E.g.: –15 patients with migraine –Neurophysiological study on rats Population: The collection of units etc. one wishes the results to apply for –All patients with migraine –All repetitions of the neurophysiological experiment

14 Pairs of terms Sample –Histogram –Mean –Proportion –Measurements of cholesterol level –Weather Population –Probability distribution –Expectation –Risk –Cholesterol level in the population −Climate

15 Types of data: Continuous data. Data measured on a continuous scale, e.g. height, weight, age. Can be truly continuous (with decimals) or discrete (integers) Categorical data. Data in categories, e.g. gender, education level, grouped age, hospital department. Can be nominal or ordinal.

16 Data in SPSS (and other statistical software): IMPORTANT: One line in the data file always correspond to one observation! Common to have an id variable for each observation If a measurement is missing, leave the cell empty To create a new variable in SPSS, choose Data->Insert variable in the Data View window, or by writing the variable name in Name in the Variable View window

17 Data coding: The value of the variable for continuous data For categorical data, define a suitable coding, e.g. 0=male and 1=female, or 0=grammar schoole, 1=high school and 2=college/university degree In Variable View, the definition of the coding can be defined in Values In Label you can write further information about the variable

18 Descriptive statistics Tables Graphs, plots Measures of central tendency Measures of variability

19 Types of graphs Histogram Box-plot Scatter plot Line plot Bar plot

20 The age of 100 medical students

21 How can you get an overview of these data in SPSS? Explore! Choose Analyze - Descriptive Statistics - Explore. Select the relevant variables by clicking them, and transferring them to Dependent List. Choose Plots, remove the check on “Stem and leaf” and check “Histogram” instead. Click Continue and OK.

22 Histogram: The distribution of age among the students (n=100)

23 Box-plot: The distribution of age among the students

24 Measures of central tendency Mean The students: 22.2 years Median The middle observation when the observations are arranged in increasing order The students: 22.0 years The mean is influenced by extreme observation. The median is robust

25 Measures of variability Standard deviation The students: 3.06 years Coefficient of variation: s/ *100% The students: 13.8% Quartiles: Arrange the data in increasing order. The 25% quartile is at the observation where 25% of the observations have lower values, and 75% of the observations have higher values. (In SPSS: Check Percentiles in the Statistics meny in Explore) The students: 25% quartile: 20.0 years 75% quartile: 23.0 years

26 How to get separate plots for each category of a categorical variable, e.g. gender Click Analyze - Descriptive Statistics - Explore. Move the continuous variable to Dependent List. Move gender to Factor List That’s it!

27 Separate boxplots for each gender

28 Relationship between two continuous variables: Scatter plot! Choose Graphs - Scatter - Define. Choose a variable for the Y-axis and one for the X-axis Separate markers for separate groups is achieved by transferring the categorical variable to Set Markers by Can also include regression lines by choosing “Fit line at total”, or a line for each category by choosing “Fit line at subgroups”.

29 Scatter plot, weight versus height for the students

30 Scatter plot, weight versus height, with regression lines Will talk much more about regression later

31 Correlation coefficient A numerical measure of the relationship between two continuous variables x and y Range between -1 and 1 Values close to 0: No relationship Values close to 1 or -1: Almost linear relationship

32 Descriptive statistics for categorical variables Not very useful to calculate the mean for e.g. educational level Would like to find the percentages within each category in the study Analyze->Descriptive Statistics ->Frequencies Move the variable to Variables(s)

33 Frequency table Last column shows the cumulative distribution; always sums up to 100%

34 Simple bar plot

35 Relationships between categorical variables Choose Analyze->Descriptive Statistics ->Crosstabs Move one variable to Rows, and another to Columns Click Cells, and check relevant percentages (Rows, Columns or Total)

36 Crosstable: Relationship between race and smoking

37 Bar plot: Relationship between race and smoking

38 Line plot for ordinal categorical variables (time-series plot)

39 Conclusion Tons of different options on how to present results You will (hopefully) learn to understand which option is most relevant for each problem during this course


Download ppt "Introduction and descriptive statistics 30th August 2006 Tron Anders Moger."

Similar presentations


Ads by Google