Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech.

Laboratory for Interdisciplinary Statistical Analysis Anne Ryan agryan@vt.edu Virginia Tech

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis 1948: The Statistical Laboratory was founded as a division of the Virginia Agricultural Experiment Station to help agronomists design experiments and calculate sums of squares.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis 1949: Based on the success of the Statistical Laboratory, the Department of Statistics at Virginia Polytechnic Institute (VPI) was founded—the 3rd oldest statistics department in the United States.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis 1973: The Statistical Laboratory was re-formed as the Statistical Consulting Center to assist with statistical analyses in every college of Virginia Polytechnic Institute & State University (VPI&SU).

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis 2007: The Graduate Student Assembly led a movement to save statistical consulting and collaboration from death by budget cuts, ensuring that graduate students could receive help with their research. The College of Science, Provost, Vice President of Research, Graduate School, and six additional colleges agreed that researchers should be able to receive free statistical consulting and collaboration.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis 2008: The Statistical Consulting Center was re-organized as the Laboratory for Interdisciplinary Statistical Analysis (LISA) to collaborate with researchers across the Virginia Tech (VT) campuses.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Established in 2008 YearClientsHours 20002991368 20012931938 20023212220 20033042192 20042741775 2005211495 2006171541 2007190965 20088952184 20097193093 201011244420

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis YearClientsHours 20002991368 20012931938 20023212220 20033042192 20042741775 2005211495 2006171541 2007190965 20088952184 20097193093 201011244420

Laboratory for Interdisciplinary Statistical Analysis www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech. 10

Laboratory for Interdisciplinary Statistical Analysis www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Collaboration LISA statisticians meet with faculty, staff, and graduate students to understand their research and think of ways to help them using statistics. 11

Laboratory for Interdisciplinary Statistical Analysis www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Collaboration LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Walk-In Consulting Every day from 1-3PM clients get answers to their (quick) questions about using statistics in their research. 12

Laboratory for Interdisciplinary Statistical Analysis www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Walk-In Consulting Collaboration Short Courses Short Courses are designed to teach graduate students how to apply statistics in their research. 13

Laboratory for Interdisciplinary Statistical Analysis www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Short Courses LISA helps VT researchers benefit from the use of Statistics www.lisa.stat.vt.edu Walk-In Consulting Collaboration All services are FREE for VT researchers. We assist with research—not class projects or homework. 14

How can LISA help? Formulate research question. Screen data for integrity and unusual observations. Implement graphical techniques to showcase the data – what is the story? Develop and implement an analysis plan to address research question. Help interpret results. Communicate! Help with writing the report or giving the talk. Identify future research directions.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to www.lisa.stat.vt.edu

Laboratory for Interdisciplinary Statistical Analysis To request a collaboration meeting go to www.lisa.stat.vt.eduwww.lisa.stat.vt.edu 1. Sign in to the website using your VT PID and password. 2. Enter your information (email address, college, etc.) 3. Describe your project (project title, research goals, specific research questions, if you have already collected data, special requests, etc.) 4. Wait 0-3 days, then contact the LISA collaborators assigned to your project to schedule an initial meeting.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Introduction to R R is a free software environment for statistical computing and graphics. Download: http://www.r-project.org/ http://www.r-project.org/ Topics Covered: Data objects in R, loops, import/export datasets, data manipulation Graphing Basic Analyses: T-tests, Regression, ANOVA

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Linear Regression & Structural Equation Monitoring Linear regression is used to model the relationship between a continuous response and a continuous predictor. SEM is a modeling technique that investigates causal relationships among variables. Time –related latent variables, modification indices and critical ratio in exploratory analyses, and computation of implied moments, factor score weights, total effects, and indirect effects.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Generalized Linear Models Modeling technique for situations where the errors are not necessarily normal. Can handle situations where you have binary responses, counts, etc. Uses a link function to relate the response to the linear model. Cover: Basic statistical concepts of GLM and how it relates to regression using normal errors.

www.lisa.stat.vt.edu Laboratory for Interdisciplinary Statistical Analysis Mixed Models and Random Effects Mixed Model: A statistical model that has both random effects and fixed effects. Fixed Effect: Levels of the factor are predetermined. Random Effect: Levels of the factor were chosen at random. The primary focus of the course will be to identify scenarios where a mixed model approach will be appropriate. The concepts will be explained almost wholly through examples in SAS or in R.

Anne Ryan 23

 Defense:  Prosecution:  What’s the Assumed Conclusion? Represent the accused (defendant) Hold the “Burden of Proof”—obligation to shift the assumed conclusion from an oppositional opinion to one’s own position through evidence ANSWER: The accused is innocent until proven guilty. Prosecution must convince the judge/jury that the defendant is guilty beyond a reasonable doubt 24

Burden of Proof—Obligation to shift the conclusion using evidence Trial Hypothesis Test Innocent until proven guilty Accept the status quo (what is believed before) until the data suggests otherwise 25

Decision Criteria Trial Hypothesis Test Evidence has to convincing beyond a reasonable Occurs by chance less than 100α% of the time (ex: 5%) 26

1. Test 2. Assumptions 3. Hypotheses 4. Mechanics 5. Conclusion 28

 State the name of the testing method to be used  It is important to not be off track in the very beginning  Hypothesis Tests we will Perform: ◦ One Sample t test for μ ◦ T wo sample t test for μ ◦ Paired t test ◦ ANOVA 29

 List all the assumptions required for your test to be valid.  All tests have assumptions  Even if assumptions are not met you should still comment on how this affects your results. 30

 State the hypothesis of interest  There are two hypotheses ◦ Null Hypothesis: Denoted ◦ Alternative Hypothesis: Denoted  Examples of possible hypotheses: 31

 For hypothesis testing there are three popular versions of testing ◦ Left Tailed Hypothesis Test ◦ Right Tailed Hypothesis Test ◦ Two Tailed or Two Sided Hypothesis Test 32

3. Two Tailed or Two Sided Hypothesis Test: The researcher is interested in looking above and below they hypothesized value. 34

 Computational Part of the Test  What is part of the Mechanics step? ◦ Stating the Significance Level ◦ Finding the Rejection Rule ◦ Computing the Test Statistic ◦ Computing the p-value 36

 Significance Level: Here we choose a value to use as the significance level, which is the level at which we are willing to start rejecting the null hypothesis.  Denoted by α  Default value is α=.05, use α=.05 unless otherwise noted! 37

 Rejection Rule: State our criteria for rejecting the null hypothesis. ◦ “Reject the null hypothesis if p-value<.05”.  p-value: The probability of obtaining a point estimate as “extreme” as the current value where the definition of “extreme” is taken from the alternative hypotheses assuming the null hypothesis is true. 38

 Test Statistic: Compute the test statistic, which is usually a standardization of your point estimate.  Translates your point estimate, a statistic, to follow a known distribution so that is can be used for a test. 39

 p-value: After computing the test statistic, now you can compute the p-value.  Use software to compute p-values. 40

 Conclusion: Last step of the hypothesis test just like it is the last step when computing confidence intervals.  Conclusions should always include: ◦ Decision: reject or fail to reject ◦ Linkage: why you made the decision (interpret p- value) ◦ Context: what your decision means in context of the problem. 41

 Note: Your decision can only be one of two choices: 1. Reject --data gives strong indication that is more likely 2. Fail to Reject --data gives no strong indication that is more likely  When conducting hypothesis tests, we assume that is true, therefore the decision CAN NOT be to accept the null hypothesis 42

 Used to test whether the population mean is different from a specified value.  Example: Is the mean height of 12 year old girls greater than 60 inches? http://office.microsoft.com/en-us/images 44

 The population mean is not equal to a specified value. Null Hypothesis, H 0 : μ = μ 0 Alternative Hypothesis: H a : μ ≠ μ 0 The population mean is greater than a specified value. H 0 : μ = μ 0 H a : μ > μ 0 The population mean is less than a specified value. H 0 : μ = μ 0 H a : μ < μ 0 45

 The sample is random.  The population from which the sample is drawn is either normal or the sample size is large. 46

 Step 3: Calculate the test statistic: Where Step 4: Calculate the p-value based on the appropriate alternative hypothesis.  Step 5: Write a conclusion. 47

http://en.wikipedia.org/wiki/Iris_flower_data_set 48

 Steps 2-4: JMP Demonstration Analyze  Distribution Y, Columns: Sepal Width Normal Quantile Plot Test Mean Specify Hypothesized Mean: 3.5 49

 Two sample t-tests are used to determine whether the population mean of one group is equal to, larger than or smaller than the population mean of another group.  Example: Is the mean cholesterol of people taking drug A lower than the mean cholesterol of people taking drug B? 52

 The population means of the two groups are not equal. H 0 : μ 1 = μ 2 H a : μ 1 ≠ μ 2 The population mean of group 1 is greater than the population mean of group 2. H 0 : μ 1 = μ 2 H a : μ 1 > μ 2 The population mean of group 1 is less than the population mean of group 2. H 0 : μ 1 = μ 2 H a : μ 1 < μ 2 53

 The two samples are random and independent.  The populations from which the samples are drawn are either normal or the sample sizes are large.  The populations have the same standard deviation. 54

 Step 3: Calculate the test statistic where  Step 4: Calculate the appropriate p-value.  Step 5: Write a Conclusion. 55

 A researcher would like to know whether the mean sepal width of setosa irises is different from the mean sepal width of versicolor irises.  The researcher randomly selects 50 setosa irises and 50 versicolor irises and measures their sepal widths.  Step 1 Hypotheses: H 0 : μ setosa = μ versicolor H a : μ setosa ≠ μ versicolor http://en.wikipedia.org/ wiki/Iris_flower_data_set http://en.wikipedia.org/ wiki/Iris_versicolor 56

 Steps 2-4: JMP Demonstration: Analyze  Fit Y By X Y, Response: Sepal Width X, Factor: Species Means/ANOVA/Pooled t Normal Quantile Plot  Plot Actual by Quantile 57

Step 5 Conclusion: There is strong evidence (p-value < 0.0001) that the mean sepal widths for the two varieties are different. 58

 The paired t-test is used to compare the population means of two groups when the samples are dependent.  Example: A researcher would like to determine if background noise causes people to take longer to complete math problems. The researcher gives 20 subjects two math tests one with complete silence and one with background noise and records the time each subject takes to complete each test. 60

 The population mean difference is not equal to zero. H 0 : μ difference = 0 H a : μ difference ≠ 0 The population mean difference is greater than zero. H 0 : μ difference = 0 H a : μ difference > 0 The population mean difference is less than a zero. H 0 : μ difference = 0 H a : μ difference < 0 61

 The sample is random.  The data is matched pairs.  The differences have a normal distribution or the sample size is large. 62

Where d bar is the mean of the differences and s d is the standard deviations of the differences. Step 4: Calculate the p-value. Step 5: Write a conclusion. Step 3: Calculate the test Statistic: 63

 A researcher would like to determine whether a fitness program increases flexibility. The researcher measures the flexibility (in inches) of 12 randomly selected participants before and after the fitness program.  Step 1: Formulate a Hypothesis H 0 : μ After - Before = 0 H a : μ After - Before > 0 http://office.microsoft.com/en-us/images 64

 Steps 2-4: JMP Analysis: Create a new column of After – Before Analyze  Distribution Y, Columns: After – Before Normal Quantile Plot Test Mean Specify Hypothesized Mean: 0 65

Step 5 Conclusion: There is not evidence that the fitness program increases flexibility. 66

 ANOVA is used to determine whether three or more populations have different distributions. A B C Medical Treatment 68

 The first step is to use the ANOVA F test to determine if there are any significant differences among the population means.  If the ANOVA F test shows that the population means are not all the same, then follow up tests can be performed to see which pairs of population means differ. 69

In other words, for each group the observed value is the group mean plus some random variation. 70

 Step 1: We test whether there is a difference in the population means. 71

 The samples are random and independent of each other.  The populations are normally distributed.  The populations all have the same standard deviations.  The ANOVA F test is robust to the assumptions of normality and equal standard deviations. 72

Compare the variation within the samples to the variation between the samples. A B C A B C Medical Treatment 73

Variation within groups small compared with variation between groups → Large F Variation within groups large compared with variation between groups → Small F 74

The mean square for groups, MSG, measures the variability of the sample averages. SSG stands for sums of squares groups. 75

Mean square error, MSE, measures the variability within the groups. SSE stands for sums of squares error. 76

 Step 4: Calculate the p-value.  Step 5: Write a conclusion. 77

 A researcher would like to determine if three drugs provide the same relief from pain.  60 patients are randomly assigned to a treatment (20 people in each treatment).  Step 1: Formulate the Hypotheses H 0 : μ Drug A = μ Drug B = μ Drug C H a : The μ i are not all equal. http://office.microsoft.com/en-us/images 78

 JMP demonstration Analyze  Fit Y By X Y, Response: Pain X, Factor: Drug Normal Quantile Plot  Plot Actual by Quantile Means/ANOVA 79

Step 5 Conclusion: There is strong evidence that the drugs are not all the same. 80

 The p-value of the overall F test indicates that the level of pain is not the same for patients taking drugs A, B and C.  We would like to know which pairs of treatments are different.  One method is to use Tukey’s HSD (honestly significant differences). 81

 Tukey’s test simultaneously tests  JMP demonstration Oneway Analysis of Pain By Drug  Compare Means  All Pairs, Tukey HSD for all pairs of factor levels. Tukey’s HSD controls the overall type I error. 82

The JMP output shows that drugs A and C are significantly different. 83

 We are interested in the effect of two categorical factors on the response.  We are interested in whether either of the two factors have an effect on the response and whether there is an interaction effect. ◦ An interaction effect means that the effect on the response of one factor depends on the level of the other factor. 85

 We would like to determine the effect of two alloys (low, high) and three cooling temperatures (low, medium, high) on the strength of a wire.  JMP demonstration Analyze  Fit Model Y: Strength Highlight Alloy and Temp and click Macros  Factorial to Degree Run Model http://office.microsoft.com/en-us/images 88

Conclusion: There is strong evidence of an interaction between alloy and temperature. 89

The one sample t-test allows us to test whether the population mean of a group is equal to a specified value. The two-sample t-test and paired t-test allow us to determine if the population means of two groups are different. ANOVA allows us to determine whether the population means of several groups are different. 90

 For information about using SAS, SPSS and R to do ANOVA: http://www.ats.ucla.edu/stat/sas/topics/anova.htm http://www.ats.ucla.edu/stat/spss/topics/anov a.htm http://www.ats.ucla.edu/stat/r/sk/books_pra. htm 91

 Fisher’s Irises Data (used in one sample and two sample t-test examples).  Flexibility data (paired t-test example): Michael Sullivan III. Statistics Informed Decisions Using Data. Upper Saddle River, New Jersey: Pearson Education, 2004: 602. 92

 Special thanks to Jennifer Kensler for course materials and help with JMP! 93

Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech.

Similar presentations

Presentation on theme: "Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech.

Similar presentations

Presentation on theme: "Laboratory for Interdisciplinary Statistical Analysis Anne Ryan Virginia Tech."— Presentation transcript:

Similar presentations

About project

Feedback