Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 20116.813/6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis.

Similar presentations


Presentation on theme: "Spring 20116.813/6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis."— Presentation transcript:

1 Spring 20116.813/6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis

2 UI Hall of Fame or Shame? Spring 20116.813/6.831 User Interface Design and Implementation2

3 Nanoquiz closed book, closed notes submit before time is up (paper or web) we’ll show a timer for the last 20 seconds Spring 201136.813/6.831 User Interface Design and Implementation

4 1.To maximize external validity, a good research method to use is: (choose one best answer) A. formative evaluation B. field study C. survey D. lab study 2.When deciding between between-subjects and within-subjects designs for an experiment, the most important issues to consider are: (choose all best answers) A. the setting of the experiment B. ordering effects C. individual differences D. tasks 3.Louis Reasoner is running a user study to compare two input devices, and he decides to let participants in his user study decide which device they’ll use in the study. This decision threatens: (choose one best answer) A. reliability B. external validity C. internal validity D. experimenter bias 2019181716151413121110 9 8 7 6 5 4 3 2 1 0 Spring 201146.813/6.831 User Interface Design and Implementation

5 Today’s Topics Hypothesis testing Graphing with error bars T test ANOVA test Spring 20116.813/6.831 User Interface Design and Implementation5

6 Experiment Analylsis Hypothesis: Mac menubar is faster to access than Windows menubar –Design: between-subjects, randomized assignment of interface to subject Spring 20116.813/6.831 User Interface Design and Implementation6 Windows Mac 625647 480503 621559 633586

7 Statistical Testing Compute a statistic summarizing the experimental data mean(Win) mean(Mac) Apply a statistical test –t test: are two means different? –ANOVA (ANalysis Of VAriance): are three or more means different? Test produces a p value –p value = probability that the observed difference happened purely by chance –If p < 0.05, then we are 95% confident that there is a difference between Windows and Mac Spring 20116.813/6.831 User Interface Design and Implementation7

8 Standard Error of the Mean Spring 20116.813/6.831 User Interface Design and Implementation8 N = 4 : Error bars overlap, so can’t conclude anything N=10: Error bars are disjoint, so Windows may be different from Mac

9 Graphing Techniques Pros –Easy to compute –Give a feel for your data Cons –Not a substitute for statistical testing Spring 20116.813/6.831 User Interface Design and Implementation9 max min 25 th percentile 75 th percentile median Windows Max Error bars Tukey box plots

10 Quick Intro to R R is an open source programming environment for data manipulation –includes statistics & charting Get the data in win = scan() # win = [625, 480, … ] mac = scan() # mac = [647, 503, … ] Compute with it means = c(mean(win), mean(mac)) # means = [584, 508] stderrs = c(sd(win)/sqrt(10), sd(mac)/sqrt(10)) # stderrs = [23.29, 26.98] Graph it plot = barplot(means, names.arg=c("Windows", "Mac"), ylim=c(0,800)) error.bar(plot, means, stderrs) Spring 20116.813/6.831 User Interface Design and Implementation10

11 Spring 20116.813/6.831 User Interface Design and Implementation11 Hypothesis Testing Our hypothesis: position of menubar matters –i.e., mean(Mac times) < mean(Windows times) –This is called the alternative hypothesis (also called H1) If we ’ re wrong: position of menu bar makes no difference –i.e., mean(Mac) = mean(Win) –This is called the null hypothesis (H0) We can ’ t really disprove the null hypothesis –Instead, we argue that the chance of seeing a difference at least as extreme as what we saw is very small if the null hypothesis is true

12 Spring 20116.813/6.831 User Interface Design and Implementation12 Statistical Significance Compute a statistic from our experimental data X = mean(Win) – mean(Mac) Determine the probability distribution of the statistic assuming H0 is true Pr( X=x | H0) Measure the probability of getting the same or greater difference Pr ( X > x0 | H0 ) one-sided test 2 Pr ( X > |x0| | H0) two-sided test If that probability is less than 5%, then we say –“ We reject the null hypothesis at the 5% significance level ” –equivalently: “ difference between menubars is statistically significant (p <.05) ” Statistically significant does not mean scientifically important

13 Spring 20116.813/6.831 User Interface Design and Implementation13 T test T test compares the means of two samples A and B Two-sided: –H0: mean(A) = mean(B) –H1: mean(A) <> mean(B) One-sided: –H0: mean(A) = mean(B) –H1: mean(A) < mean(B) Assumptions: –samples A & B are independent (between- subjects, randomized) –normal distribution –equal variance

14 Running a T Test (Excel) Spring 20116.813/6.831 User Interface Design and Implementation14 WinMac 625647 480503 621559 633586 WindowsMac Mean589.8573.9 Variance5368.53574.0 Observations44 Pooled Variance4471.2 Hypothesized Mean Difference0 df6 t Stat0.336 P(T<=t) one-tail0.374 t Critical one-tail1.943 P(T<=t) two-tail0.748 t Critical two-tail2.447

15 Running a T Test Spring 20116.813/6.831 User Interface Design and Implementation15 WinMac 625647 480503 621559 633586 694458 599380 505477 527409 651589 505472 WindowsMac Mean584.0508.1 Variance5409.37295.0 Observations10 Pooled Variance6352.2 Hypothesized Mean Difference0 df18 t Stat2.130 P(T<=t) one-tail0.024 t Critical one-tail1.734 P(T<=t) two-tail0.047 t Critical two-tail2.101

16 Running a T Test (in R) t.test(win, mac) Spring 20116.813/6.831 User Interface Design and Implementation16

17 Using Factors in R time = c(win,mac) menubar = factor(c(rep(“win”,10),rep(“mac”,10))) time = [ 625, 480, …, 647, 503, …] menubar = [ win, win, …, mac, mac, …] t.test(time ~ menubar) Spring 20116.813/6.831 User Interface Design and Implementation17

18 Spring 20116.813/6.831 User Interface Design and Implementation18 Paired T Test For within-subject experiments with two conditions Uses the mean of the differences (each user against themselves) H0: mean(A_i – B_i) = 0 H1: mean(A_i – B_i) <> 0 (two-sided test) or mean(A_i – B_i) > 0 (one-sided test)

19 Reading a Paired T Test Spring 20116.813/6.831 User Interface Design and Implementation19 WinMac 625647 480503 621559 633586 694458 599380 505477 527409 651589 505472 WindowsMac Mean584.0508.1 Variance5409.37295.0 Observations10 Pearson Correlation0.370 Hypothesized Mean Difference0 df9 t Stat2.675 P(T<=t) one-tail0.013 t Critical one-tail1.833 P(T<=t) two-tail0.025 t Critical two-tail2.262

20 Running a Paired T Test (in R) t.test(times ~ menubar, paired=TRUE) Spring 20116.813/6.831 User Interface Design and Implementation20

21 Spring 20116.813/6.831 User Interface Design and Implementation21 Analysis of Variance (ANOVA) Compares more than 2 means One-way ANOVA –1 independent variable with k >= 2 levels –H0: all k means are equal –H1: the means are different (so the independent variable matters)

22 Running a One-Way ANOVA (Excel) Spring 20116.813/6.831 User Interface Design and Implementation22 WinMacBottom 625647485 480503436 621559512 633586564 694458560 599380587 505477391 527409488 651589555 505472446 GroupsCountSumAverageVariance Windows105839584.05409.3 Mac105080508.17295.0 Bottom105023502.34175.8 Total3015943531.56671.6 Source of VariationSSdfMSFP-valueF crit Between Groups415552207773.6930.0383.354 Within Groups151920275626 Total19347629

23 Running ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] fit = aov(time ~ menubar) summary(fit) Spring 20116.813/6.831 User Interface Design and Implementation23

24 Running Within-Subjects ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] subject = [ u1, u1, u2, u2, …, u1, u1, u2, u2 …, u1, u1, u2, u2, …] fit = aov(time ~ menubar + Error(subject/menubar)) summary(fit) Spring 20116.813/6.831 User Interface Design and Implementation24

25 Tukey HSD Test Tests pairwise differences for significance after a significant ANOVA test –More stringent than multiple pairwise t tests Be careful in general about applying multiple statistical tests Spring 20116.813/6.831 User Interface Design and Implementation25 Win vs. Mac3.201 Mac vs. Bottom0.242 Win vs. Bottom3.443 critical value3.5

26 Tukey HSD Test (in R) TukeyHSD(fit) Spring 20116.813/6.831 User Interface Design and Implementation26

27 Spring 20116.813/6.831 User Interface Design and Implementation27 Two-Way ANOVA 2 independent variables with j and k levels, respectively Tests whether each variable has an effect independently Also tests for interaction between the variables

28 Two-way Within-Subjects ANOVA (in R) time = [ 625, 480, …, 647, 503, …, 485, 436, …] menubar = [ win, win, …, mac, mac, …, btm, btm, …] device = [ mouse, pad, …, mouse, pad, …, mouse, pad, …] subject = [ u1, u1, u2, u2, …, u1, u1, u2, u2 …, u1, u1, u2, u2, …] fit = aov(time ~ menubar*device + Error(subject/menubar*device)) summary(fit) Spring 20116.813/6.831 User Interface Design and Implementation28

29 Other Tests Two discrete-valued variables –“does past experience affect menubar preference?” independent var { WinUser, MacUser} dependent var {PrefersWinMenu, PrefersMacMenu} –contingency table –Fisher exact test and chi square test Two (or more) scalar variables –Regression Spring 20116.813/6.831 User Interface Design and Implementation29 PrefersWinPrefersMac WinUser259 MacUser819

30 Tools for Statistical Testing Web calculators Excel Statistical packages –Commercial: SAS, SPSS, Stata –Free: R Spring 20116.813/6.831 User Interface Design and Implementation30

31 Summary Use statistical tests to establish significance of observed differences Graphing with error bars is cheap and easy, and great for getting a feel for data Use t test to compare two means, ANOVA to compare 3 or more means Spring 20116.813/6.831 User Interface Design and Implementation31

32 UI Hall of Fame or Shame? Spring 20116.813/6.831 User Interface Design and Implementation32


Download ppt "Spring 20116.813/6.831 User Interface Design and Implementation1 Lecture 15: Experiment Analysis."

Similar presentations


Ads by Google