Scene video and scanpath mapping#

In this tutorial, we will map gaze data from an eye-tracking recording to video frames, estimate a scanpath, and overlay the gaze fixations on the video. We will use the pyneon library to work with Neon eye-tracking recordings, which contain video and event data, including gaze information.


1. Setup: Loading a Neon Recording#

First, we load the Neon recording, which contains video and gaze data. Ensure that you have installed the required libraries such as pyneon and have the recording dataset available.

[1]:
# Import necessary libraries
import sys
import numpy as np
from pyneon import NeonDataset, NeonRecording
import pandas as pd

# Define path to the recording directory
recording_dir = "../../data/OpticalFlow/data"

# Load the recording
recording = NeonRecording(recording_dir)

# Print recording details
print(recording)

Recording ID: 9265f7c1-e3ce-4108-9dd1-639305591f79
Wearer ID: 990a3de0-5a1a-4f07-bf9b-d82369213dbf
Wearer name: Qian
Recording start time: 2024-07-24 16:31:22.223000
Recording duration: 96.226 s
                 exist              filename                                              path
3d_eye_states     True     3d_eye_states.csv     ..\..\data\OpticalFlow\data\3d_eye_states.csv
blinks            True            blinks.csv            ..\..\data\OpticalFlow\data\blinks.csv
events            True            events.csv            ..\..\data\OpticalFlow\data\events.csv
fixations         True         fixations.csv         ..\..\data\OpticalFlow\data\fixations.csv
gaze              True              gaze.csv              ..\..\data\OpticalFlow\data\gaze.csv
imu               True               imu.csv               ..\..\data\OpticalFlow\data\imu.csv
labels            True            labels.csv            ..\..\data\OpticalFlow\data\labels.csv
saccades          True          saccades.csv          ..\..\data\OpticalFlow\data\saccades.csv
world_timestamps  True  world_timestamps.csv  ..\..\data\OpticalFlow\data\world_timestamps.csv
scene_video_info  True     scene_camera.json     ..\..\data\OpticalFlow\data\scene_camera.json
scene_video       True             video.mp4             ..\..\data\OpticalFlow\data\video.mp4


2. Mapping Gaze Data to Video Frames#

In Neon recordings, gaze events are not naturally synchronized with the video. To map gaze data to specific video frames, we can use the map_gaze_to_video method. This method requires the pyneon.video object for determination of video timestamps, the pyneon.fixations object to make use of PupilLabs fixation detection pipeline and the pyneon.gaze object for improved time resolution of gaze estimation.

By default, Neon reports fixations with a single coordinate. This is computed as average between all gaze coordinates over the interval dennoted as a fixation. However, this clashes with the funcional definition of a fixation as tracking a fixed point in space, used by Neon.

Imagine looking at a fixed point, for example a street sign, while you are walking past it. Despite the movement of your body and the relative movement of the sign, the fixation will be stabilised. As such, taking an average gaze coordinate over the enntire duration will not correspond to the location of the sign, or the fixation, ar any given point in time. Feeding this point into an optical flow algorithm would, with high likelihood, lead to tracking anything but the sign.

Therefore, we use partial averages of gaze locations around the respective frame’s timestamp. As the video is sampled at 30Hz while the gaze output nominally reaches 200Hz, we expect to take the average over 6 subsequent gaze points. This achieves a trade-off between recency of the reported gaze position at the given frame and error minimisation, by averaging over microsaccades around the actual fixation target as well as random errors.

[2]:
# Map gaze data to the video timestamps
recording.map_gaze_to_video()

# Inspect the mapped gaze data
print(recording.mapped_gaze.tail())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 2
      1 # Map gaze data to the video timestamps
----> 2 recording.map_gaze_to_video()
      4 # Inspect the mapped gaze data
      5 print(recording.mapped_gaze.tail())

File ~\Documents\GitHub\PyNeon\pyneon\recording.py:457, in NeonRecording.map_gaze_to_video(self, resamp_float_kind, resamp_other_kind)
    440 def map_gaze_to_video(
    441     self,
    442     resamp_float_kind: str = "linear",
    443     resamp_other_kind: str = "nearest",
    444 ) -> pd.DataFrame:
    445     """
    446     Map gaze data to video frames.
    447
   (...)
    455         Interpolation method for non-float columns.
    456     """
--> 457     return map_gaze_to_video(self, resamp_float_kind, resamp_other_kind)

TypeError: map_gaze_to_video() takes 1 positional argument but 3 were given

Above, we can see that each frame gets a current gaze position as well as a fixation status. Currently, three types of fixation status are used:

  1. start denoting the first frame corresponding to a fixation

  2. during corresponding to intermediate frames of the same fixation

  3. end denoting the last frame of the fixation

This determination will become relevant for tracking the scanpath with optical flow. After all, while a fixation is still active, we get up-to-date gaze information. Only after its end, tracking becomes necessary.


3. Estimating the Scanpath#

Having matched every frame with a gaze coordinate, we can now get into the meat of the scanpath estimation. In dynamic scenes, the same object will not occupy the same scene-camera location over time. Therefore, we need to continuously map past fixation points as long as they are still visible in the frame.

The estimate_scanpath method achieves this by feeding fixation point denoted as end into a Lucas-Kanade sparse optical flow algorithm. This algorithm compares the video in vicinity of the point with the subsequent frame, updating the location in dependence of its movement. While a point is tracked, its status is flagged as tracked. In practice, many scene frames will have multiple simultaneously present past fixations. Our implementation carries them and repeately performs an optical flow estimation for each point. Only when they can no longer be tracked, will they be flagged as lost and subsequently dropped for the next frame.

It should be noted that this algorithm is not optimised for performance and that it will take a considerable amount of time to run on limited hardware. For our computers, the algorithm takes roughly half the time of the video, though this benchmark heavily depends on the density of past fixation points and computational ressources

[3]:
# Estimate the scanpath based on the mapped gaze data
recording.estimate_scanpath()

# Save the estimated scanpath as a pickle and CSV
recording.estimated_scanpath.to_pickle("estimated_scanpath.pkl")
recording.estimated_scanpath.to_csv("estimated_scanpath.csv")

# Inspect the estimated scanpath
print(recording.estimated_scanpath.head())
   time                                          fixations
0  0.00    fixation id   x   y fixation status
0       ...
1  0.05    fixation id   x   y fixation status
1       ...
2  0.10    fixation id   x   y fixation status
2       ...
3  0.15    fixation id   x   y fixation status
3       ...
4  0.20    fixation id   x   y fixation status
4       ...

We should take a moment to understand the format of the estimated_scanpath property. As we care about getting a scanpath mapped on every single video-frame, we create it as a dataframe of dataframes. As such, every row carries the timestamp of the unnderlying video and saves a dataframe in the fixations cell. In this dataframe, every present fixation is provided with an id, coordinates and a fixation status, as seen below. The benefit of treating is a dataframe is the possibility to use intuitive pandas indexing, allowing us, for example, to get a list of fixations at frame 1334. This frame corresponds to a turn with a lot of lost fixations.

As a quirk of Neon taking some time to start up, the first frames will usually not yield any usable results. Still, we carry them for consistency.

[4]:
print(recording.estimated_scanpath["fixations"][1334])
   fixation id            x           y fixation status
0        102.0  1195.487429  495.576714          during
1         98.0    28.327660  473.548737         tracked
2         97.0   621.986206  691.748291         tracked
3         96.0   154.725586  919.655640         tracked
4         93.0   258.866272  648.244324         tracked
5         91.0    24.861883  335.163666         tracked
6         88.0          NaN         NaN            lost
7         81.0    88.962830  318.220032         tracked
8         80.0    11.042568  376.608856         tracked
9         79.0          NaN         NaN            lost
10        78.0          NaN         NaN            lost
11        74.0          NaN         NaN            lost

4. Understanding Fixation Status#

Each fixation is assigned a status that indicates its lifecycle:

  • start: first frame of fixation

  • during: intermediate frames of fixation

  • end: last frame of fixation

  • tracked: Optical flow algorithm tracks fixation

  • lost: Tracking is lost, fixation is no longer tracked and gets dropped


5. Overlaying Fixations on the Video#

Now that we have the scanpath, we can overlay the gaze fixations on the video. This creates a video output with overlaid fixations, where:

  • A blue dot represents the current gaze location.

  • Green dots represent tracked fixations.

  • A red dot indicates no fixation (saccades or blinks).

Further, we draw connecting lines between past fixations to show the scanpath for the current video. The show_video option creates a live-output of the video rendering, but also increases the runtime.

[5]:
# Overlay the scanpath on the video and show the output
recording.overlay_scanpath_on_video("../../data/OpticalFlow/test.mp4", show_video=True)

Summary#

  • Mapping Gaze to Video: We used the map_gaze_to_video method to match gaze data with video frames based on timestamps.

  • Estimating Scanpath: The scanpath was estimated using estimate_scanpath, which tracks fixations and uses optical flow to follow past fixations across scene changes.

  • Overlaying Fixations: The fixations were visualized on the video by calling overlay_fixations_on_video.

This workflow can be used to process eye-tracking data, align it with video frames, and visualize gaze movements within video recordings.