Presentation is loading. Please wait.

Presentation is loading. Please wait.

BME STATS WORKSHOP Introduction to Statistics. Part 1 of workshop.

Similar presentations


Presentation on theme: "BME STATS WORKSHOP Introduction to Statistics. Part 1 of workshop."— Presentation transcript:

1 BME STATS WORKSHOP Introduction to Statistics

2 Part 1 of workshop

3 The way to think about inferential statistics They are tools that allow us to make black and white statements even though the data does not clearly provide answers. –This is to say that we will use probabilities which speak of shades of grey but will make statements with respect to rejecting or failing to reject some null hypothesis.

4 Inferencing from data analysis As scientists we have the unique privilege of using ingenious tools and methods that help us make informed decisions. One of those tools is statistical analysis. It allows us to more accurately determine the reality of our data. This workshop should help you make better conclusions on your data by using simple but effective statistical tools to cut through the levels of grey often encountered in research.

5 The Essence of Inferential Statistics 1.We compare a statistic obtained from acquired data to a theoretical distribution of that statistic. Thus, relativity is important in statistics. You will surely have conducted t-tests in the past to compare measures from a control with an experimental group. That t value is evaluated against a distribution of ts. In statistics, size does mater. Large t values increase the likelihood of the investigator stating that he has significant results.

6 Essence con’d 2.Signal to noise ratio. Most statistics used in this workshop such as the t statistic are made up of differences due to treatment and differences due to individuals (also called error). Error is simply random variation.

7 Essence Con’d 3.Rare events This is related directly to point one. In order for treatment to be successful, the obtained statistic has to be sufficiently rare. We will find out that large statistical values are considered rare. For a better understanding of these points we will describe a Monte Carlo experiment.

8 The Plan! 1.Constructing a distribution. 2.How to apply a statistic obtained from an experimental. 3.Interpretation of a result. 4.What does a significant result mean?

9 Constructing a Distribution: Some Definitions Sample distribution: –A distribution of values from some measurement. This measurement can be of anything, such as height, weight or age to name a few. Sampling distribution: –A distribution of a statistic obtained from a sample distribution. This statistic can be a mean, mode, median, variance or anything else that is a calculation from individual measures. As we will see, the t statistic can be used to construct a sampling distribution.

10 Distributions Sample distributions are often bell shaped or normal but this is guaranteed. On occasion exponential, rectangular or odd shaped distributions are observed. Sampling distributions on the other hand are almost always normally shaped. This is true even is the measurements used to calculate a statistic are from non-normal distributions.

11 How to construct a sampling distribution of the t statistic. An example under the null hypothesis of equal means We first have to have a sample distribution of some measure from some population with specific parameters such as 25 year old women. The measurement of interest could be height. We then randomly sample from this distribution to make up two groups of individuals of a specified sample size. –Ex. Two groups of ten individual. From these two groups a t value is calculated. This t value is then plotted. After this calculation, the individuals are returned to the sample distribution. The process of “sampling” with replacement is repeated as many times as possible. Using computers you might opt for 1000 or more samplings. Thus, you would have a sampling distribution of 1000 ts.

12 How to use a sampling distribution of ts In any sampling distribution there are a number of values that are extreme. This is normal and we will use this concept to make decisions about our experiments. Traditionally, we determine the t value at which point all values greater make up 5% of all values in that distribution. If we are concerned about both tails of that distribution we will find the value at which point all values greater make up 2.5% of all values on the positive tail and 2.5% on the negative tail.

13 How to use con’d. We then conduct an experiment in which we have a control and an experimental group. We calculate a t statistic from this experiment. This t value is evaluated against the sampling distribution of ts we have constructed. If our obtained value is greater than the value from the distribution that marks the 5% cutoff we end up stating that the experiment produced a significant result. In other words the control was significantly different from the experimental group. Sig. Not Sig.

14 Some specifics about using a t distribution. What does stating significance really mean? First of all when find a t value that is outside of the critical values in a distribution we should really start by saying, “the obtained value is rare if when calculated from two groups obtained from the same population.” We would then follow up that statement with, “Since that value is rare and is obtained from an experiment, it is reasonable to conclude that the groups do not come from the same population.” –This is indeed saying that the treatment was effective. Thus, we have a significant result.

15 Monte Carlo How will building a distribution help us understand statistics

16 Monte Carlo Building a t distribution

17 Distributions: ts How do you build distributions of a statistic? In this case t. 1) You start with a population of interest. 2) Calculate means from two samples with a specific number of individuals. 3) Calculate the t statistic using those two samples. 4) Do this again and again. Possibly 1000 times or more. Remember that these distributions are built under the null hypothesis. n 1 =xxn 2 =xx Repeat the process as often as you can.

18 Family of ts The larger the sample size used the less variability in the results. As we can see here, the greater the degrees of freedom (df) the less extreme are the obtain values resulting in a tighter distribution. Note: Degrees of freedom when using the t-test are calculated at n 1 +n 2 -2. Thus, for a sample size of 10 per group the dfs are 18.

19 Theoretical Distribution of ts. We use this table to determine the critical values. The computer uses the density functions.

20 Variables Independent variable: –That variable you manipulate. Subjects are allocated to groups Dependent variable –That variable which depends on the manipulation. Measures such as weight or height or some other variable that varies depending on treatment

21 Cause and effect Cause can only be inferred when subjects are randomly allocated to groups. –Random allocation ensures that all characteristics are evenly distributed across all groups. This way, differences between groups cannot be due to biases in the subject selection, a very important element of experimental design.

22 An example of data analysis

23 Comparing Reaction time Following Alcohol Consumption. University males were recruited to participate in an experiment in which they consumed a specific amount of alcohol. The males were randomly separated into two groups. One group consumed the alcohol and the other some non-alcoholic drink. Ten minutes after the second drink was consumed the subjects were asked to push a button on a box the moment they heard a buzzer. When the button was pushed the buzzer stopped. The investigator recorded the amount of time the buzzer sounded in milliseconds.

24 Hypotheses We state hypotheses in terms of populations. This is to say that we are making statements on what we think exists in the real world. From our sample we will reject or fail to reject the null hypothesis. Here we have a situation in which we are predicting differences only. This is a non- directional hypothesis. H 0 :  c =  a H 1 :  c ≠  a

25 The data (Time in ms) Control Alcohol group 150 200 110 250 200 220 135 225 90 250 111 234

26 Results from an output provided by SPSS Probability of a Type1 error is provided inside the red box added by myself (not SPSS). Commonly, investigators call this the significance level. It should be noted that statisticians would not label that value as such.

27 Critical Values A critical value is that value using a theoretical distribution that marks the point beyond which less than a specific percent of values can be found. –We typically use 5%. In our example we have 12 scores from 12 individuals, thus 10 degrees of freedom. –From the distribution of all ts we can determine how large a calculated t from our experiment must be for us to reject the null hypothesis of equal means. –That value (see table previously shown) is 2.228. –Our obtained t is larger than the critical value (-5.465). We reject the null hypothesis in favour of the alternate. –You will notice that the t value is negative for our experiment. What is important is the magnitude, not the direction. If we were to reverse the groups in our calculations the value would have been positive.

28 Interpretation of the results Alcohol increases the amount of time needed to turn off the buzzer suggesting that the subjects are impaired in their reactions. We are able to make this statement because the t value obtained here would be rare if the samples came from the same population. Due to this situation, we give ourselves permission to reject the null hypothesis of equal means in the population.

29 Some Important Concepts

30 The standard deviation The concept of variance and standard deviation (SD) is everything in statistics. It is used to determine if individuals or samples are inside or outside of normal. Anyone that is more than 1.96 SD away from the population mean of some measure is said to not belong to that population. However, this is only true when we have population parameters (more on this later).

31 Variance: Standard Deviation (SD): Standard error of the mean=SEM: A few formulas to help us along.

32 Variability is Important The greater the variability the greater the noise. Note here that with greater variability in the data, more overlap of the sample distributions is observed. This will result in smaller signal to noise ratios. Thus, when we have more variability we will need larger sample sizes to detect mean differences (more on this later). Keep this in mind when reviewing the upcoming slides.

33 T-Test Two Sample t-test Comparing two sample means. It is evident from the formula that the smaller the variability, the larger the t value.

34 Hypothesis Testing revisited. We always determine whether or not a statistic is rare given the null hypothesis never from the alternate hypothesis. You might remember this from the Monte Carlo studies. Thus we have to deal with the concept of the Type1 and the Type2 error.

35 Type 1 error The probability of being wrong when stating that samples are from different populations. This is the p<.05 that we use to reject the null hypothesis of equal means in the population. –If we have a p of.02, it means that the probability of being wrong when stating that two samples come from different populations is.02. –The.05 is a cutoff that is said to be acceptable.

36 Type 2 error. The probability of failing to reject the null hypothesis when the null is not true. In truth, the samples are most likely from different populations. Often, we simply don’t have enough power or the tools are not sensitive enough to detect these differences.

37 Assumptions of a Distribution What are they and why are they important?

38 Assumptions are rules They are the rules by which distributions are constructed. These rules must be followed in order for a statistic obtained from an experiment to be compared to the theoretical distribution. If your experiment breaks these rules, it is possible that you will either to conservative or to liberal when making a statement about the reality of the population.

39 Assumptions 1.Samples come from a normally distributed population 2.Both samples have equal variances (homogeneity of variance) 3.Samples are made up of randomly selected individuals 4.Both samples be of equal sample size.

40 What to do when we violate assumptions 1. We can transform the data so that the sample can have the characteristics desired. 2. We can use distribution free statistics. –These statistics are insensitive to violations of assumptions. However, they do have limitations (more in later sessions).

41 Part 2 of workshop

42 Starting out with PASW (formerly SPSS but now SPSS again) An introduction

43 What is SPSS It is “Statistical Package for the Social Sciences). It started life as a text driven program (SPSSx), migrated to the PC as line code and, finally made it to the Windows environment. This is the version we enjoy today.

44 Do you need the latest version? No. With each new version there are graphical changes and on occasion additional statistical tools. –However, the basics do not change. An analysis of variance conducted with version 10 will produce the same results as those with version 19 (the latest at the time of this workshop).

45 Latest version cont’d One problem is with the output of different versions. –Older versions of SPSS cannot read the output of newer versions. Thus, the outputs are not backward compatible. –One way to get around this issue is to use the export function in the newer versions to save the outputs as PDF, DOC, or PPT so that the results can be read.

46 Getting started If you’ve used Excel in the past, then you have a base from which to work. SPSS uses a worksheet that is similar but not identical to Excel. –However, the similarities end there.

47 Learning Curve If you use SPSS on a regular basis, you should be somewhat proficient in a week or two. –Developing an expertise will take you somewhat longer depending on your interest and statistics knowledge. –Lets get started!

48 This is what you see when you start the program. In front of you is the worksheet in the “data view”. You enter all your data in the worksheet.

49 You also have the option of “variable view” by clicking on the tab below or clicking on the column heading “var”.

50 The variable view is where you write down the name of your variable (variable name). Also in this view you have the option of providing variable labels and other descriptors that can help you recognize your data. Name your variable.

51 Let’s start with a short review on variables. Independent variable (IV): That variable which is manipulated. Dependent variable (DV): That variable whose measures depend on some manipulation. Any experiment can have more than one IV or DV. These variables have to be set up correctly in a worksheet in order to properly analyze data.

52 Let’s say that the study is designed to determine if a certain drug facilitates weight loss We will need an independent variable….say Drug Type. –We could have two groups based on drug treatment. Drug 1 Drug 2 We will also need a dependent variable…say weight. –In the worksheet we will indicate the weight for each individual after being on the drug for a period of time.

53 Entering data. We simply click on an empty box and begin typing as appropriate. Shown here are the designations for group membership for the IV in our fictitious experiment with two groups.

54 Back to the variable view where we change the variable name and add a label which will help us remember what that variable means for future reference. Also, the variable label is the text that will be printed on the output following an analysis.

55 Clicking on the empty square under values allows for the user to specify group names. The number value is assigned a label by the user.

56 On returning to the worksheet, the group labels and the variable name specified by you replace the default labels.

57 We will now add the dependent variable with data IV DV

58 Some Descriptive Statistics PASW easily allows us to produce descriptive statistics. –Mean –Standard deviation –Standard error –Median –Etc….

59 You conduct all analyses from the Analyze option. Here we are asking for PASW to show descriptive statistics using the Means sub-option.

60 Many options for descriptive statistics are available

61 Relevant output table is shown here. Note that the statistics requested in the earlier slide are displayed in this table.

62 Graphs: Can be constructed from a number of options You may wish to use the chart builder option but users who are familiar with older versions of this program sometimes find it difficult to change. I like the legacy option which retains the old method. In the next slide we will see a graph using the error bar option.

63 Here we have the 95% intervals but typically you would want the error bars to represent one standard error.

64 Finally an analysis We will conduct a two sample independent t-test.

65 Here we specify the tests of means in the compare means option

66 You must indicate which groups will be compared. You must use the number assigned to the groups.

67 Levene’s test determines if the variance in one group is different from the other. This is an important assumption. The results are significant. Sig. (2-tailed) is the Type 1 error.

68 Let’s add a third group The same method as building the database in the first place applies to adding a group. With the addition of a third group we will need to perform an analysis of variance (ANOVA) with posthoc tests.

69 Significant results.

70 Interpretation The ANOVA indicates that there are differences between the groups. This result allowed for conducting a posthoc Tukey test. –All groups are considered different from one another. –This is shown by the observation that all comparisons are significant.

71 A graph of the results obtained from the Univariate sub- option is shown here.

72 Adding a second IV will allow us to conduct an interaction analysis using the Univariate sub-option. We observe a significant main effect for IV1 but not IV2. Also, there is no significant interaction between IV1 and IV2 on the dependent variable. See graph.

73

74 After all this you might want to explore the interaction You would run simple main effect analysis which can be done through a syntax window. You write a program This was the norm when PASW was SPSSx. SPSS was text driven.

75 Syntax This program allows us to determine if there are differences on the dependent variable of one IV at levels (groups) of another variable.

76 Results of the Simple Main Effects Analysis Next slide.

77 The default error term in MANOVA has been changed from WITHIN CELLS to WITHIN+RESIDUAL. Note that these are the same for all full factorial designs. * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e * * * * * * * * * * * * * * * * * 30 cases accepted. 0 cases rejected because of out-of-range factor values. 0 cases rejected because of missing data. 6 non-empty cells. 1 design will be processed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * * * * * * Tests of Significance for DependentVar using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 2469.20 24 102.88 INDEPENDENTVAR2 WITH 16.90 1 16.90.16.689 IN INDEPENDENTVAR(1) INDEPENDENTVAR2 WITH 6.40 1 6.40.06.805 IN INDEPENDENTVAR(2) INDEPENDENTVAR2 WITH.10 1.10.00.975 IN INDEPENDENTVAR(3) INDEPENDENTVAR 14581.40 2 7290.70 70.86.000 (Model) 14604.80 5 2920.96 28.39.000 (Total) 17074.00 29 588.76 R-Squared =.855 Adjusted R-Squared =.825 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

78 Here is how you would set up a database for a repeated measures design 1.Arrange groups in columns so that group one has data in column 1, group 2 in column 2 and so on. 2.Specify the IV in PASW. 3.Define the groups by specifying which column belongs to which group. 4.Click on OK.

79 Group data are in columns Use repeated measures option

80 Give the variable a name and indicate the number of groups (3 in this case) Click on add to get this popup.

81 Results are significant. We can say that there are mean differences between the groups but we cannot say which pairs of groups differ. Always interpret using the Greenhouse- Geisser.

82


Download ppt "BME STATS WORKSHOP Introduction to Statistics. Part 1 of workshop."

Similar presentations


Ads by Google