Basic statistical methods in psychiatric research

Basic statistical methods in psychiatric research
Abiodun O. Adewuya MBChB, FWACP, FMCPsych Mental Health in Africa: Time for Action Addis ba; 8.30am, 25th April 2006

Warning!!!! I AM NO STATISTICIAN ! ! !

Day 1 Definitions and language of statistics
The meaning and classification of variables Introduction to descriptive and inferential statistics Univariate analysis the distribution the central tendency the dispersion Basic multivariate analysis – chi square, t-test, ANOVA, correlation co-efficient

Day 2 Regression analysis Odds Ratio
Data reduction and Factor analysis Screening and diagnostic tests Reliabitility, ROC curve Meta analysis Sample size calculations

Statistics The science of collecting, summarizing, presenting and interpreting data, and of using them to estimate the magnitude of associations and test hypothesis

Language of statistics
Population: everyone/everything you wish to study Sample: a piece of the population Variable: Xtics of each member of the pop Census: study of the entire population Parameter: no describing xtic of a pop Statistics : no describing xtics of a sample Sampling error: Diff btw the xtics of the population and the sample of that pop

Sampling error Size of sampling error determined by
size of sample – the more the size of sample, the less the error variation – how different members of the population are from one another with regards to the variables being studied – the higher the variability, the more the sampling error Non sampling errors Respondents lying Measurement errors Errors due to non-respondents

variables The raw data of a study consist of observations made on individual May be people but may be RBC, urine specimen, rats, hospitals etc Any aspect of an indiv measured like BP, or recorded like age, sex is called variable There may be only one variable in a study or there may be many

Classification of variables
Categorical Nominal – binary or multiple Ordinal (ordered) Numerical Discrete Continous

Nominal variables Allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. For example, all we can say is that 2 individuals are different in terms of variable A (e.g., they are of different race), but we cannot say which one "has more" of the quality represented by the variable. Typical examples of nominal variables are gender, race, color, city, etc.

Ordinal variables Allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more." A typical example of an ordinal variable is the socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Likert scales – to collect info on attitude, degrees of agreement, frequency of use etc

numerical variables……
discrete numerical but limited number of discreet values which are usually whole numbers e.g episodes of diarrhea, relapse in schizophrenia Continous A measurement on a Continous scale e.g temp, weight, height

Derived variables Often we derive variables for our analysis from those originally recorded Categorized or calculated from recorded variables e.g age from “Todays date – DOB”; age groups BMI from weight and height Based on threshold or cut-off e.g LBW (Yes if <2.5kg, No if =/> 2.5Kg) Transformed variables e.g scoring WHOQOL-Bref

Outcome & exposure variables
Outcome: variable that is the focus of attention, whose variation we are seeking to understand Exposure: we are mainly interested n identifying factors or exposures that may influence the size or the occurrence of outcome variables Outcome = Dependent variable Exposure = Independent variable

Descriptive & inferential statistics
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. Inferential statistics: making deductions or conclusions

Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at: the distribution the central tendency the dispersion

The distribution Summary of the frequency of individual values or ranges of values for a variable. Can be number (percentage) or a range of numbers (percentage) One of the most common ways to describe a single variable is with a frequency distribution. Frequency distributions can be depicted in two ways, as a table or as a graph (bar chart, pie chart, histogram etc)

The normal distribution
Called Gaussian distribution after its discoverer Gauss The frequency distribution is symmetrical about the mean and bell-shaped; The bell is narrow for small SD and wide for large SD Normally dtb variables include BP, temp, Hb level etc Dtbs not normally distributed are income, etc

The central tendency The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency: Mean Median Mode

The mean The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of values. For example, the mean or average quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values: 15, 20, 21, 20, 36, 15, 25, 15 The sum of these 8 values is 167, so the mean is 167/8 =

The median The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample. For example, if there are 500 scores in the list, score #250 would be the median. If we order the 8 scores shown above, we would get: 15,15,15,20,20,21,25,36 There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20, the median is 20. If the two middle scores had different values, you would have to interpolate to determine the median.

The mode The mode is the most frequently occurring value in the set of scores. To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the mode. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently. Notice that for the same set of 8 scores we got three different values , 20, and for the mean, median and mode respectively. If the distribution is truly normal (i.e., bell-shaped), the mean, median and mode are all equal to each other.

Dispersion Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range the standard deviation.

The range The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is = 21.

The standard deviation…
The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample. the square root of the sum of the squared deviations from the mean divided by the number of scores minus one

Inferential Statistics
With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.

Pearson Chi-square The Pearson Chi-square is the most common test for significance of the relationship between categorical variables. This measure is based on the fact that we can compute the expected frequencies in a two-way table (i.e., frequencies that we would expect if there was no relationship between the variables).

The only assumption underlying the use of the Chi-square (other than random selection of the sample) is that the expected frequencies are not very small. The reason for this is that, actually, the Chi-square inherently tests the underlying probabilities in each cell; and when the expected cell frequencies fall, for example, below 5, those probabilities cannot be estimated with sufficient precision

The value of the Chi-square and its significance level depends on the overall number of observations and the number of cells in the table. relatively small deviations of the relative frequencies across cells from the expected pattern will prove significant if the number of observations is large

Yates Correction. The approximation of the Chi-square statistic in small 2 x 2 tables can be improved by reducing the absolute value of differences between expected and observed frequencies by 0.5 before squaring (Yates' correction). This correction, which makes the estimation more conservative, is usually applied when the table contains only small observed frequencies, so that some expected frequencies become less than 5

Fisher Exact Test This test is only available for 2x2 tables; For when
Overall total of the table is less than 20 or Overall total is between 20 and 40 and the smallest of the four expected number is less than 5

One-Sample T Test tests whether the mean of a single variable differs from a specified constant. whether the average IQ score for a group of students differs from 100 (supposed normal).

The Paired-Samples T Test
procedure compares the means of two variables for a single group. It computes the differences between values of the two variables for each case and tests whether the average differs from 0. Normally a “BEFORE” and “AFTER” means for the same group e.g mean BPRS score before Antipsychotic and mean BPRS score after 2 weeks of haloperidol treatment

The Independent-Samples T Test
procedure compares 2 means for 2 grps of cases. E.g. a YES or NO grp of cases Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the YES/NO differences and not to other factors. E.g are the BPRS score differences due to Atypical (YES) or Conventional (NO) antipsycotics?

The One-Way ANOVA ANOVA is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test. for a quantitative dependent variable by a single factor (independent) variable. In addition to determining that differences exist among the means, you may want to know which means differ.

The One-Way ANOVA There are two types of tests for comparing means:
a priori contrasts - are tests set up before running the experiment post hoc tests - run after the experiment has been conducted Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ. Post hoc Range tests identify homogeneous subsets of means that are not different from each other. Pairwise multiple comparisons test the difference between each pair of means, Pairwise Equal distribution= Bonferoni Unequal distribution + Dunnet

Correlations Correlation is a measure of the relation between two or more variables. The measurement scales used should be at least interval scales Correlation coefficients can range from to The value of represents a perfect negative correlation while a value of represents a perfect positive correlation . A value of 0.00 represents a lack of correlation. Normal dtb: Pearson Not normal Spearman

The Partial Correlations
procedure computes partial correlation coefficients that describe the linear relationship between two variables while controlling for the effects of one or more additional variables. Very like regression

Basic inferential statistics
Nominal Vs Nominal – Chi- square Nominal vs Ordinal – Chi-square Ordinal Vs Ordinal – Chi-square

Bi-Nominal (yes/no, male/fem) Vs Continous – Independent sample T-test
Multi-nomial / Ordinal Vs Continous - ANOVA Continous Vs Continous – Correlation coefficient Normally dtb – Pearson Not normally dtb - Spearman

Statistical significance (P- value)
the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"), and that in the population from which the sample was drawn, no such relationship or differences exist. tells us something about the degree to which the result is "true" (in the sense of being "representative of the population"). More technically, the value of the p-value represents a decreasing index of the reliability of a result . The higher the p-value, the less we can believe that the observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population.

Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population." For example, a p-value of .05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke." In other words, assuming that in the population there was no relation between those variables whatsoever, and we were repeating experiments like ours one after another, we could expect that approximately in every 20 replications of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in ours

Thank you

Basic statistical methods in psychiatric research

Similar presentations

Presentation on theme: "Basic statistical methods in psychiatric research"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic statistical methods in psychiatric research

Similar presentations

Presentation on theme: "Basic statistical methods in psychiatric research"— Presentation transcript:

Similar presentations

About project

Feedback