Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Collection and Analysis

Similar presentations


Presentation on theme: "Data Collection and Analysis"— Presentation transcript:

1 Data Collection and Analysis

2 Scientific Method Form hypothesis Collect data Analyze
Accept/reject hypothesis

3 Empirical Experiment Typical question: Lifelines PerspectiveWall
Which visualization is better in a situation? Lifelines PerspectiveWall

4 Question Does Vis Tool (Lifelines or PerspWall) have an effect on user performance time for task X? Null hypothesis: No effect Lifelines = PerspWall Want to disprove, provide counter-example, show an effect

5 Variables Independent variables (what you vary):
Tool or technique (Lifelines, Perspective Wall) Task type (find, count, compare) Data size (100, 1000, ) Dependent variables (what you measure): User performance time Errors Subjective satisfaction (survey) HCI metrics

6 Example: 2 x 3 design Task1 Task2 Task3 Life-Lines Persp. Wall
Ind Var 2: Task Type Task1 Task2 Task3 Life-Lines Persp. Wall n users per cell Ind Var 1: Vis. Tool Measured user performance times (dep var)

7 Groups “Between subjects” variable
1 group of users for each variable treatment Group 1: 20 users, Lifelines Group 2: 20 users, PerspWall Total: 40 users, 20 per cell “With-in subjects” (repeated) variable All users perform all treatments Counter-balancing order effect Group 1: 20 users, Lifelines then PerspWall Group 2: 20 users, PerspWall then Lifelines Total: 40 users, 40 per cell

8 Data Measure dependent variables Spreadsheet:
Lifelines task 1, 2, 3, PerspWall task 1, 2, 3

9 Averages Task1 Task2 Task3 Life-Lines 37.2 54.5 103.7 Persp. Wall 29.8
Ind Var 2: Task Type Task1 Task2 Task3 Life-Lines 37.2 54.5 103.7 Persp. Wall 29.8 53.2 145.4 Ind Var 1: Vis. Tool Measured user performance times (dep var)

10 PerspWall better than Lifelines?
Problem with averages: lossy Compares only 2 numbers What about the 40 data values? Perf time (secs) Lifelines PerspWall

11 Another Picture Need stats that take all data into account Perf time
(secs) Lifelines PerspWall

12 Statistics t-test ANOVA: ANalysis Of VAriance
Compares 1 dep var on 2 treatments of 1 ind var ANOVA: ANalysis Of VAriance Compares 1 dep var on n treatments of m ind vars Result: “significant difference” between treatments? p = significance level (confidence) typical cut-off: p < 0.05

13 Statistics in Microsoft Excel
Enter data into a spreadsheet Go to Tools…, Data Analysis… (may need to choose Analysis Toolpak from Addins first) Select appropriate analysis

14 t-tests in Excel Used to compare two groups of data
Most common is “t-test: two-sample assuming equal variances” Other t-tests: Paired two-sample for means Two-sample assuming unequal variances

15 ANOVAs in Excel Allows for more than two groups of data to be compared
Most common is “ANOVA: Single factor analysis” Other ANOVAs: ANOVA: Two-factor with replication ANOVA: Two-factor without replication

16 p < 0.05 Found a “statistically significant difference”
Averages determine which is ‘better’ Conclusion: Vis Tool has an “effect” on user performance for task1 PerspWall better user performance than Lifelines for task1 “95% confident that PerspWall better than Lifelines” Not “PerspWall beats Lifelines 95% of time” Found a counterexample to the null hypothesis Null hypothesis: Lifelines = PerspWall Hence: Lifelines  PerspWall

17 p > 0.05 Hence, same? Be careful! How?
Vis Tool has no effect on user performance for task1? Lifelines = PerspWall ? Be careful! We did not detect a difference, but could still be different Did not find a counter-example to null hypothesis Provides evidence for Lifelines = PerspWall, but not proof Boring! Basically found nothing How? Not enough users (other tests can verify this) Need better tasks, data, …

18 Reporting Results Often considered the most important section of professional papers Statistics NOT the most important part of the results section Statistics used to back up differences described in a figure or table

19 Reporting Means, SDs, t-tests
Give means and standard deviations, then t-test … the mean number was significantly greater in condition 1 (M=9.13, SD=2.52) than in condition 2 (M=5.66, SD=3.01), t(44)=3.45, p=.01

20 What Are Those Numbers? … the mean number was significantly greater in condition 1 (M=9.13, SD=2.52) than in condition 2 (M=5.66, SD=3.01), t(44)=3.45, p=.01 M is the mean SD is the standard deviation t is the t stat the number in parentheses is the degrees of freedom (df) p is the probability the difference occurred by chance

21 Reporting ANOVAs … for the three conditions, F(2,52)=17.24, MSE= , p<.001 F(x,y) -- F value for x between groups and y within groups degrees of freedom (df) MSE -- mean square error for the between groups condition p -- probability that difference occurred by chance


Download ppt "Data Collection and Analysis"

Similar presentations


Ads by Google