# Analysing Eye-Tracking Data

## Presentation on theme: "Analysing Eye-Tracking Data"— Presentation transcript:

Analysing Eye-Tracking Data
Hayward Godwin University of Southampton

Outline Part 1 Eye-tracking measures – an overview Data Viewer reports
The Organise-Analyse-Visualise approach in R Part 2 Try it yourself!

Eye-Tracking Measures
An Overview for a detailed review, see Rayner (2009)

“Global” versus “Local” measures
Global measures are computed at the overall (or global) level of a trial and ignore what was being fixated at any point in time e.g., mean fixation duration for a trial Local measures are computed for each object or stimulus in a trial, paying attention to what was being fixated at any point in time e.g., mean fixation duration for target words in a reading study Many measures can be computed at both a global and a local level

Mean Fixation Duration (global) (Mean duration of fixations)
“Search for a blue square target” Mean Fixation Duration = ( )/6 125 130 110 90 190 150

“Search for a blue square target”
Mean Fixation Duration (local) (Mean duration of fixations on a specific object type) “Search for a blue square target” Mean Fixation Duration for target = ( )/2 125 130 110 90 190 150

Number of Fixations (global) (Mean number of fixations)
“Search for a blue square target” Number of fixations = 6 125 130 110 90 190 150

“Search for a blue square target”
Number of Fixations (local) (Mean number of fixations on a specific object type) “Search for a blue square target” Number of fixations for target = 2 125 130 110 90 190 150

Total Gaze Duration (global) (Sum of fixation durations)
“Search for a blue square target” Total gaze duration = 125 130 110 90 190 150

“Search for a blue square target”
Total Gaze Duration (local) (sum of fixation durations on a specific object type) “Search for a blue square target” Total gaze duration for target = 125 130 110 90 190 150

“Search for a blue square target”
First-pass Gaze Duration (sum of fixation durations on the first visit or pass of an object) “Search for a blue square target” First-pass gaze duration for target = 110 (the second fixation of 190ms duration occurs on the second pass so is excluded) 125 130 110 90 190 150

“Search for a blue square target”
Single Fixation Duration (mean of fixation durations when an object is only ever fixated once) “Search for a blue square target” This is one of the cleanest measures there are in eye-tracking since only fixating an object once means we can chart the time taken to fully process that object Here, only two objects are ever fixated once. These are highlighted to the left. Since the target object is fixated twice, this trial would be excluded from the single fixation duration calculations. 125 130 110 90 190 150

“Search for a blue square target”
Proportion of objects fixated (global) (Proportion of objects directly fixated) “Search for a blue square target” Proportion fixated = 3 / 5 = 0.6 125 130 110 90 190 150

“Search for a blue square target”
Proportion of objects fixated (local) (Proportion of objects directly fixated, broken down by object type) “Search for a blue square target” Proportion of distractors fixated=2/4=0.5 Probability of fixating target = 1/1 = 1 125 130 110 90 190 150

“Search for a blue square target”
Saccade onset latency (Time from display onset to start of first saccade) “Search for a blue square target” If display occurs at time 0, then this is 130ms 125 130 110 90 190 150

Mean number of visits (Mean number of times each object is visited)
“Search for a blue square target” Count up number of times each object is visited and then divide by the number of objects that were visited Do NOT include zero values for unvisited objects = 4 / 3 = 1.3 125 130 110 90 190 150

“Search for a blue square target” Mean length of all saccades = ( ) / 5 125 1.2 1.4 130 110 2.2 90 190 0.2 3.4 150

Verification Time (Time between first fixating and button press)
“Search for a blue square target” Find when button press occurred. If we find that it occurred 150ms into the second fixation (of 190ms) on the target, then verification time = A better way to do this is to find the time the first fixation starts on the target and take this value away from the RT 125 130 110 90 190 150

“Search for a blue square target”
Scanpath Ratio (sum of saccade lengths to target divided by shortest distance to target) “Search for a blue square target” Scanpath ratio = ( ) / 5.2 125 1.2 1.4 130 5.2 110 2.2 90 190 0.2 3.4 150

Notes on Measures Many, many measures that can be run
Just because you can run these, it doesn’t mean that you should Focus on running only the measures that address your research questions and avoid doing or reporting additional ones for the sake of it (i.e., avoid fishing!)

Data Viewer Reports

Fixation Report One row of data for every fixation in your study (per trial, per participant) You will typically need to use the fixation report if you are running visual search/scene perception studies Use fixation reports to filter out fixations that coincide with other events, such as display changes, button-press responses, etc This can be done by filtering using the Interest Period (as you’ll see in the tutorials) but often you’ll end up removing some fixations you still want Fixation reports can also be used to re-compute the size of interest areas and capture fixations that fell just outside of interest areas

Fixation Report – Important Columns
RECORDING_SESSION_LABEL: The recording session ID TRIAL_INDEX: Trial number CURRENT_FIX_INDEX: The fixation ID for the current CURRENT_FIX_DURATION: The duration of the current fixation CURRENT_FIX_BUTTON_PRESS_X: The time during the current fixation that a button was pressed CURRENT_FIX_INTEREST_AREA_LABEL: The interest area label of the current fixation (“.” if the eyes are not on an IA) CURRENT_FIX_NEAREST_INTEREST_AREA_LABEL: The nearest IA to the eyes CURRENT_FIX_NEAREST_INTEREST_AREA_DISTANCE: The distance to the CENTRE of the nearest IA Can also get NEXT_ and PREVIOUS_ versions of all measures

Interest Area Report One row of data for every interest area in your study (per trial, per participant) Reading researchers typically use this type of report They typically change the interest period to be set to the time period of the trial itself, enabling the filtering out of any unnecessary fixations

Interest Area Report – Important Columns
RECORDING_SESSION_LABEL: The recording session ID TRIAL_INDEX: Trial number IA_DWELL_TIME - Total time spent on the IA (sum of all fixations on IA) IA_FIRST_FIXATION_DURATION - Often referred to as First Fix Duration in reading research. The duration of the first fixation of the interest area (only on first pass, if the target region is skipped this will have no value) IA_FIRST_RUN_DWELL_TIME - Often referred to as Gaze Duration in reading research. A sum of all fixation on the IA for the first pass. You also use this column for calculating Single Fixation Duration, but remove all occurrences where the IA region was fixated more than once. IA_ID/IA_LABEL - The ID number and label for the interest area IA_REGRESSION_IN - Returns 0 or 1 IA_REGRESSION_IN_COUNT - Returns the number of regressions in IA_REGRESSION_OUT - Returns 0 or 1 IA_REGRESSION_OUT_COUNT - Returns the number of regressions out IA_REGRESSION_PATH_DURATION - Often referred to as Go Past Time in reading research. Sum of all fixations that occur before passing to the right of the target interest area (to a greater numbered IA_ID). IA_SKIP - Returns a 0 or 1

Message Report One row of data for every message that occurred during the study (per trial, per participant) If you want an accurate view of when things happened during your study, the message report is the one to use This is particularly important for gaze-contingent studies where display changes occur You can technically get most of the messages that occur from the fixation report. However, some messages do get missed from the fixation report

Message Report – Important Columns
RECORDING_SESSION_LABEL: The recording session ID TRIAL_INDEX: Trial number CURRENT_MSG_LABEL : message text details CURRENT_MSG_TEXT : message text details CURRENT_MSG_TIME : the time the message occured

Sample Report One row of data for every sample recorded by the eye-tracker during the study (per trial, per participant) If you have your Eyelink running at 1000Hz, that gives you 1,000 rows of data per second of recording Sample reports typically are tens of millions of rows in size You’ll only need to use a sample report if you have certain highly customised setups (e.g., moving displays) or want to get an idea of millisecond-by-millisecond pupil size (as is the case in pupillometry)

The Organise-Analyse-Visualise Approach in R

Data In the past, data could easily be organised in Excel, Analysed in SPSS and Visualised in SPSS/Excel/Sigmaplot With the size and complexity of eye-tracking studies, this is no longer really possible We can now do all three steps in R, making the transition between them easier: Organise: data.table Analyse: ezANOVA Visualise: ggplot

Organising your Scripts for Reproducible Results
However you do things, it’s best to have a consistent approach to organising your R scripts I have two types of script: ORGANISE__XYZ.R scripts that organise the data ANALYSE__XYZ.R scripts that analyse and visualise the data However you set up your own R scripts, find an approach and stick to it This then makes it easier to copy and paste existing scripts, and being consistent means you can go back to old stuff and understand it more easily

Organise: the data.table package
Why use data.table? It does things very quickly It extends (builds upon) data.frame objects, meaning that everything you can do to a data.frame object, you can do to a data.table Now going to go through some examples of what it can do and how to use it I’ll be giving out the example code later, so no need to type or run through it now

Create a data.frame Create a normal data.frame
It will look something like this on the right It lists different trials for a bunch of participants and gives you their RT (Reaction Time) in ms

Convert data.frame to data.table

Add Keys For large data sets you will want to set keys
When data are keyed, they can be processed faster A key is set to various columns in your data.table When a column is associated with a key, it will be able to group the data by that column more rapidly In our example, let's set participant id (ppt) and trialType as keys so we can group the data by these values more rapidly using the setkey command

Basic Syntax {WHERE} allows you to select only certain columns. In other words you can get the command you run to focus only on the data cells WHERE certain conditions are met {SELECT} is where you tell data.table what columns or values you want back. In other words you SELECT certain values {GROUPBY} allows you to group the output data in different ways. This is a bit like pivot tables in Excel.

Getting means How about the mean RT overall? Gives us:
In other words we are SELECTing the mean of the RT column

Getting means Overall RT isn’t the interesting. Let’s GROUP BY trialtype: Gives us: In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType column

Getting means Now let's group by participant and trialType: Gives us:
In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns

Getting means But what if we want to only obtain the means for trials 3 and 4? How do we do that? We use WHERE ! (Reminder “==“ means “is equal to”) Gives us: In other words we are SELECTing the mean of the RT column but GROUPING BY the trialType and ppt columns but only including values WHERE trial is 3 or 4

Adding Columns Data.table also offers more convenient syntax for adding columns If you run: You add a newColumn column with a value of 1. You can combine this with WHERE and GROUP BY commands. If you run: You get:

Joins and Merges Suppose we forgot to include information relating to which condition each participant was in. How do we get that in there? We can use a join! A join in data science is a special type of operation that combines two datasets To do this, create a new data.table, listing the participant id and the condition and follow the steps in the next slide Joins (or merges) hunt down identical column names and then join the data from one table with that from another

Performing the Join Create new data.table containing condition information and set the keys To perform a join, it’s one simple command

We then have our joined-up data
DT joinedDT cDT

We then have our joined-up data
DT joinedDT cDT

Other Types of Join We’ve just done our first join!
Note that we’ve just joined one column with one other column, but there is no theoretical limit to how many columns you can join by at once There are many types of join, which you may want to use (e.g., left, right, natural, outer, full, Cartesian product, etc.) The main point is making sure that the column names match in the tables you are trying to join, or else things will go horribly wrong

Analysing Data Worked Example

Worked Example: Mean Fixation Durations (global)
Let’s begin by taking data from a fixation report We’ll analyse it, compute mean fixation durations (global), run an ANOVA, and then plot a graph The data and scripts required are on the website but let’s walk through it together first

Computing Mean Fixation Durations (global) Example from a fixation report
First we compute the by-trial, by-participant means: This gives us the mean fixation duration for each participant and each trial Then we take the mean of these to get means by participant:

Computing Mean Fixation Durations (global) Example from a fixation report
This is what we now have: Each participant (RECORDING_SESSION_LABEL) grouped by TRIAL_TYPE with a DV (mean fixation duration) What next?

Computing Mean Fixation Durations (global) Example from a fixation report
Now we analyse the data using ezAONVA! This is from the ez package Note: make sure that all columns that are factors in your anova are factors in R before proceeding

Computing Mean Fixation Durations (global) Example from a fixation report
ezANOVA syntax: The dependent variable column A list of within-subjects factors A list of between-subjects factors The column containing participant IDs The data.table name

Computing Mean Fixation Durations (global) Example from a fixation report
Here, we want to see if the within-subjects variable TRIAL_TYPE influences fixation durations. So we do this: And get this: Most of this should be self-explanatory (it’s significant!) Note that ges is generalised eta-squared, a measure of effect size (remember: APA format wants effect sizes now). Cite this paper when you use it:

Computing Mean Fixation Durations (global) Example from a fixation report
Let’s plot it! To produce a plot, we can use ezStats to first get descriptive means The nice thing here is that ezStats has the same syntax as ezANOVA (i.e., you can copy/paste) Take a look at the values:

Computing Mean Fixation Durations (global) Example from a fixation report
Now, let’s plot it! We use ggplot to do the plotting.

The data.table containing the means for plotting
Controlling axes and making it APA format Draw points (as opposed to bars/lines) Set up the aesthetics of the plot, with x being the values plotted along the x-axis and y being the value plotted on the y-axis Save the plot to disk

Graphing with ggplot There’s a very large number of options when plotting with ggplot We will only cover very basic ones here More information can be found at: And elsewhere online…

Computing Mean Fixation Durations (local) Example from a fixation report
Next, we want to see if the within-subjects variable TRIAL_TYPE influences fixation durations AND if fixation durations are different for each interest area type We have two types of interest area: TARGET and DISTRACTOR We therefore run local mean fixation durations, comparing target and distractor fixation durations We also now need to remove fixations that did not fall on an interest area The column to use is CURRENT_FIX_INTEREST_AREA_LABEL

Computing Mean Fixation Durations (local) Example from a fixation report
Same process as before: compute by-trial means and then by-ppt means The only difference now is that we’re removing fixations that didn’t land on an interest area (i.e., WHERE CURRENT_FIX_INTEREST_AREA_LABEL is “.”) We’re also now GROUPING BY the CURRENT_FIX_INTEREST_AREA_LABEL column

Computing Mean Fixation Durations (local) Example from a fixation report
Now it’s time to run the ANOVA This is done the same as before, just now we have one more within- subjects factor But the results are similar: only TRIAL_TYPE is significant

Computing Mean Fixation Durations (local) Example from a fixation report
Next, we get the means as before: Again, we are now adding CURRENT_FIX_INTEREST_AREA_LABEL to our list of grouping within-subjects factor columns

Sneak Peak at the Graph Note that this graph has two panels – or in ggplot’s language – two facets, one for DISTRACTOR_A objects and one for TARGET objects How do we get it to do that?

The facet_wrap command will create facets for every level of CURRENT_FIX_INTEREST_AREA_LABEL
You’re not limited to creating facets for only one column. Try out facet_wrap(TRIAL_TYPE~CURRENT_FIX_INTEREST_AREA_LABEL) and see what happens

Writing it up When writing up eye-tracking data, don’t just assume the reader knows why you examined each measure Given the complexity and number of possible measures it’s vital that you are extremely clear both in your own head and when you write things up why each measure was examined and what that measure is telling you If people start complaining that you’ve explained it too much and that it’s bordering on being patronising, then you’re doing it right

Writing it up From Godwin, Hyde, Taunton, Calver, Blake & Liversedge (2013)
Simple approach: Begin by stating what the measure has been shown to demonstrate in the past Make a prediction for that measure in your own study Then describe how you examined it Finally describe what it showed Don’t just bombard the reader with F and t values

Writing it up From Sheridan & Reingold (2013)

Writing it up From Sheridan & Reingold (2013)

Writing it up From Fitzsimmons & Drieghe (2013)

Writing it up From Fitzsimmons & Drieghe (2013)

The bigger picture This approach forms part of a larger picture when writing up your work Let’s just note a few pointers before finishing

The bigger picture Introduction
First paragraph: general context of the work, prelude main points Middle paragraphs: existing research on the topic, highlighting what has been missed or not done (either at all or perfectly) before Ending paragraphs: say how your work will overcome the limitations in previous work, clearly noting how what you have done fills a gap in the existing literature and human knowledge. Tell them why your work is awesome. State your research question(s). Applied relevance also gets noted if relevant Final paragraph(s): make a series of clear and direct predictions. State WHY you are examining each measure and PREDICT what you think each will show you

The bigger picture Results
First paragraph: describe what you are going to do in your results and why Second paragraph: describe how you cleaned your eye-tracking data Middle paragraphs: go through each of your measures in the same order as you predicted them in your introduction. For each one, state WHY you are analysing that one and WHAT it shows you, and whether it confirms or rejects your predictions

The bigger picture Discussion
First paragraph: re-state what you did in the study and remind the reader of your goals and research questions. Middle paragraphs: go through each of your measures in the same order as you predicted them in your introduction. For each one, state WHY you analysed that one, what the outcome was, and WHAT THAT MEANS in relation to your predictions Later paragraphs: draw the results together for an overall picture. State applied implications if necessary. Suggest future studies that would be cool. Never end by saying something along the lines of “more research is needed.”

The rest of today Next up:
Head to the website ( and go through the Part 4: Data Viewer section Then go through the Part 5: Data Analysis section, which will outline the bits we’ve gone through above and some extra pieces here and there That’s it.