Scene Video and Scanpath Mapping#

In this tutorial, we will map gaze data from an eye-tracking recording to video frames, estimate a scanpath, and overlay the gaze fixations on the video. We will use the pyneon library to work with Neon eye-tracking recordings, which contain video and event data, including gaze information.


Setup: Loading a Neon Recording#

First, we load the Neon recording, which contains video and gaze data. Ensure that you have installed the required libraries such as pyneon and have the recording dataset available.

[6]:
# Import necessary libraries
from pyneon import Dataset, get_sample_data, Stream

# Load a sample recording
dataset_dir = get_sample_data("markers", format="cloud")
dataset = Dataset(dataset_dir)

recording = dataset[1]
print(recording)

Data format: cloud (version: 2.5)
Recording ID: c17cd630-764e-4e61-87ee-95be3d6b8181
Wearer ID: 028e4c69-f333-4751-af8c-84a09af079f5
Wearer name: Pilot
Recording start time: 2025-09-22 00:31:44.395000
Recording duration: 35977000000 ns (35.977 s)


Mapping Gaze Data to Video Frames#

In Neon recordings, gaze events are not naturally synchronized with the video. To map gaze data to specific video frames, we can use the map_gaze_to_video method. This method requires the pyneon.video object for determination of video timestamps, the pyneon.fixations object to make use of PupilLabs fixation detection pipeline and the pyneon.gaze object for improved time resolution of gaze estimation.

By default, Neon reports fixations with a single coordinate. This is computed as average between all gaze coordinates over the interval denoted as a fixation. However, this clashes with the functional definition of a fixation as tracking a fixed point in space, used by Neon.

Imagine looking at a fixed point, for example a street sign, while you are walking past it. Despite the movement of your body and the relative movement of the sign, the fixation will be stabilized. As such, taking an average gaze coordinate over the entire duration will not correspond to the location of the sign, or the fixation, at any given point in time. Feeding this point into an optical flow algorithm would, with high likelihood, lead to tracking anything but the sign.

Therefore, we use partial averages of gaze locations around the respective frame’s timestamp. As the video is sampled at 30Hz while the gaze output nominally reaches 200Hz, we expect to take the average over 6 subsequent gaze points. This achieves a trade-off between recency of the reported gaze position at the given frame and error minimization, by averaging over microsaccades around the actual fixation target as well as random errors.

[7]:
# Map gaze data to the video timestamps
synced_gaze = recording.sync_gaze_to_video()

Above, we can see that each frame gets a current gaze position as well as a fixation status. Currently, three types of fixation status are used:

  1. start denoting the first frame corresponding to a fixation

  2. during corresponding to intermediate frames of the same fixation

  3. end denoting the last frame of the fixation

This determination will become relevant for tracking the scanpath with optical flow. After all, while a fixation is still active, we get up-to-date gaze information. Only after its end, tracking becomes necessary.

3. Estimating the Scanpath#

Having matched every frame with a gaze coordinate, we can now get into the core of scanpath estimation. In dynamic scenes, the same object will not occupy the same scene-camera location over time. Therefore, we need to continuously map past fixation points as long as they are still visible in the frame.

The estimate_scanpath method achieves this by feeding fixation points (those marked as end) into a Lucas-Kanade sparse optical flow algorithm. This algorithm compares the video region around each point with the subsequent frame, updating the point location according to its motion. While a point is tracked, its status is set to tracked. In practice, many scene frames will contain multiple past fixations; our implementation tracks them and repeatedly performs an optical flow estimation for each point. When a point can no longer be tracked it is marked lost and dropped for subsequent frames.

Note: this algorithm is not optimized for performance and may take considerable time on limited hardware. On our machines it runs at roughly 0.5x real-time (about half the video length), but this benchmark depends heavily on the density of past fixation points and available computational resources.

[8]:
# Estimate the scanpath based on the mapped gaze data
from pyneon.video import estimate_scanpath

scanpath_df = estimate_scanpath(recording.scene_video, synced_gaze)
scanpath_df.index.name = "timestamp [ns]"
scanpath = Stream(scanpath_df)

# Inspect the estimated scanpath
print(scanpath.data.head())
Estimating scanpath:  98%|█████████▊| 1023/1044 [00:35<00:00, 29.25it/s]C:\Users\qian.chu\Documents\GitHub\PyNeon\pyneon\video\scanpath.py:149: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  curr_fixations = pd.concat(
Estimating scanpath: 100%|██████████| 1044/1044 [00:35<00:00, 29.01it/s]
                                                             fixations  \
timestamp [ns]
1758493904395000000    fixation id gaze x [px] gaze y [px] fixation...
1758493904445000000    fixation id gaze x [px] gaze y [px] fixation...
1758493904495000000    fixation id gaze x [px] gaze y [px] fixation...
1758493904545000000    fixation id gaze x [px] gaze y [px] fixation...
1758493904595000000    fixation id gaze x [px] gaze y [px] fixation...

                     frame index
timestamp [ns]
1758493904395000000            0
1758493904445000000            1
1758493904495000000            2
1758493904545000000            3
1758493904595000000            4

We should take a moment to understand the format of the scanpath.data. To map a scanpath to every video frame, we create it as a dataframe of dataframes. Each row contains the timestamp and the frame index of the underlying video and stores a dataframe in the fixations cell. In that dataframe, every present fixation has an id, coordinates, and a fixation status. Treating it as a dataframe enables intuitive pandas indexing; for example, you can get the list of fixations at frame 2000.

Because Neon can take some time to start, the first frames usually do not yield usable results. We keep them for consistency.

[9]:
# print fixations when column frame_idx is 1334. Frame_idx is not the idx of the dataframe, but the index of the video frame.
print(scanpath.data.loc[scanpath.data["frame index"] == 500, "fixations"].values[0])
  fixation id  gaze x [px] gaze y [px] fixation status
0          32   794.401571  622.680714          during
1          31    828.15155  648.225586         tracked
2          20   319.503265  492.966797         tracked
3          19   372.389526  775.100464         tracked
4          18  1205.629761   745.42688         tracked
5          17  1214.341553  401.155457         tracked
6          16  1181.176392  460.626556         tracked
7          10    415.88559  809.316284         tracked
8           7  1187.457275  749.382629         tracked
9           5  1210.753784  408.591949         tracked

4. Understanding Fixation Status#

Each fixation is assigned a status that indicates its lifecycle:

  • start: first frame of fixation

  • during: intermediate frames of fixation

  • end: last frame of fixation

  • tracked: Optical flow algorithm tracks fixation

  • lost: Tracking is lost, fixation is no longer tracked and gets dropped


5. Overlaying Fixations on the Video#

Now that we have the scanpath, we can overlay the gaze fixations on the video. This creates a video output with overlaid fixations, where:

  • A blue dot represents the current gaze location.

  • Green dots represent tracked fixations.

  • A red dot indicates no fixation (saccades or blinks).

Further, we draw connecting lines between past fixations to show the scanpath for the current video. The show_video option creates a live-output of the video rendering, but also increases the runtime.

[10]:
# Overlay the scanpath on the video and show the output
from pyneon.vis import overlay_scanpath

overlay_scanpath(
    recording.scene_video,
    scanpath.data,
    circle_radius=10,
    line_thickness=2,
    text_size=1,
    max_fixations=10,
    show_video=True,
    output_path=None,
)
Plotting scanpath on scene video: 100%|██████████| 1044/1044 [00:25<00:00, 40.55it/s]

Summary#

  • Mapping Gaze to Video: We used the map_gaze_to_video method to match gaze data with video frames based on timestamps.

  • Estimating Scanpath: The scanpath was estimated using pyneon.video.estimate_scanpath, which tracks fixations and uses optical flow to follow past fixations across scene changes.

  • Overlaying Fixations: The fixations were visualized on the video by calling pyneon.vis.overlay_scanpath.

This workflow can be used to process eye-tracking data, align it with video frames, and visualize gaze movements within video recordings.