{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scene video and scanpath mapping\n",
    "\n",
    "In this tutorial, we will map gaze data from an eye-tracking recording to video frames, estimate a scanpath, and overlay the gaze fixations on the video. We will use the `pyneon` library to work with Neon eye-tracking recordings, which contain video and event data, including gaze information.\n",
    "\n",
    "---\n",
    "\n",
    "## 1. Setup: Loading a Neon Recording\n",
    "\n",
    "First, we load the Neon recording, which contains video and gaze data. Ensure that you have installed the required libraries such as `pyneon` and have the recording dataset available.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\n",
      "Dataset | 1 recordings\n",
      "\n",
      "Recording ID: 9a141750-95ca-48ee-9693-53bbb896b87e\n",
      "Wearer ID: c4f68887-e96c-467f-a901-0fc9fce09c0a\n",
      "Wearer name: JGH\n",
      "Recording start time: 2025-06-16 12:49:27.817000\n",
      "Recording duration: 357.538s\n",
      "                 exist                  filename                                                                                                                                        path\n",
      "3d_eye_states     True         3d_eye_states.csv         C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\3d_eye_states.csv\n",
      "blinks            True                blinks.csv                C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\blinks.csv\n",
      "events            True                events.csv                C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\events.csv\n",
      "fixations         True             fixations.csv             C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\fixations.csv\n",
      "gaze              True                  gaze.csv                  C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\gaze.csv\n",
      "imu               True                   imu.csv                   C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\imu.csv\n",
      "labels            True                labels.csv                C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\labels.csv\n",
      "saccades          True              saccades.csv              C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\saccades.csv\n",
      "world_timestamps  True      world_timestamps.csv      C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\world_timestamps.csv\n",
      "scene_video_info  True         scene_camera.json         C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\scene_camera.json\n",
      "scene_video       True  11f35cc2_0.0-357.538.mp4  C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\11f35cc2_0.0-357.538.mp4\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Import necessary libraries\n",
    "import sys\n",
    "import numpy as np\n",
    "from pyneon import get_sample_data, Dataset, Recording\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "# Download sample data (if not existing) and return the path\n",
    "sample_dir = get_sample_data(\"Artworks\")\n",
    "print(sample_dir)\n",
    "\n",
    "dataset_dir = sample_dir / \"Timeseries Data + Scene Video\"\n",
    "dataset = Dataset(dataset_dir)\n",
    "print(dataset)\n",
    "\n",
    "recording = dataset[0]\n",
    "print(recording)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 2. Mapping Gaze Data to Video Frames\n",
    "\n",
    "In Neon recordings, gaze events are not naturally synchronized with the video. To map gaze data to specific video frames, we can use the `map_gaze_to_video` method. This method requires the `pyneon.video` object for determination of video timestamps, the `pyneon.fixations` object to make use of PupilLabs fixation detection pipeline and the `pyneon.gaze` object for improved time resolution of gaze estimation.\n",
    "\n",
    "By default, Neon reports fixations with a single coordinate. This is computed as average between all gaze coordinates over the interval dennoted as a fixation. However, this clashes with the funcional definition of a fixation as _tracking a fixed point in space_, used by Neon.\n",
    "\n",
    "Imagine looking at a fixed point, for example a street sign, while you are walking past it. Despite the movement of your body and the relative movement of the sign, the fixation will be stabilised. As such, taking an average gaze coordinate over the enntire duration will not correspond to the location of the sign, or the fixation, ar any given point in time. Feeding this point into an optical flow algorithm would, with high likelihood, lead to tracking anything but the sign.\n",
    "\n",
    "Therefore, we use partial averages of gaze locations around the respective frame's timestamp. As the video is sampled at 30Hz while the gaze output nominally reaches 200Hz, we expect to take the average over 6 subsequent gaze points. This achieves a trade-off between recency of the reported gaze position at the given frame and error minimisation, by averaging over microsaccades around the actual fixation target as well as random errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                     gaze x [px]  gaze y [px]  worn  fixation id  blink id  \\\n",
      "timestamp [ns]                                                               \n",
      "1750071325151422222          NaN          NaN  <NA>         <NA>      <NA>   \n",
      "1750071325201422222          NaN          NaN  <NA>         <NA>      <NA>   \n",
      "1750071325251422222          NaN          NaN  <NA>         <NA>      <NA>   \n",
      "1750071325301422222          NaN          NaN  <NA>         <NA>      <NA>   \n",
      "1750071325351422222          NaN          NaN  <NA>         <NA>      <NA>   \n",
      "\n",
      "                     azimuth [deg]  elevation [deg]  frame_idx  \n",
      "timestamp [ns]                                                  \n",
      "1750071325151422222            NaN              NaN      10690  \n",
      "1750071325201422222            NaN              NaN      10691  \n",
      "1750071325251422222            NaN              NaN      10692  \n",
      "1750071325301422222            NaN              NaN      10693  \n",
      "1750071325351422222            NaN              NaN      10694  \n"
     ]
    }
   ],
   "source": [
    "# Map gaze data to the video timestamps\n",
    "synced_gaze = recording.sync_gaze_to_video(overwrite=True)\n",
    "\n",
    "# Inspect the mapped gaze data\n",
    "print(synced_gaze.data.tail())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above, we can see that each frame gets a current gaze position as well as a fixation status. Currently, three types of fixation status are used:\n",
    "\n",
    "1. `start` denoting the first frame corresponding to a fixation\n",
    "2. `during` corresponding to intermediate frames of the same fixation\n",
    "3. `end` denoting the last frame of the fixation\n",
    "\n",
    "This determination will become relevant for tracking the scanpath with optical flow. After all, while a fixation is still active, we get up-to-date gaze information. Only after its end, tracking becomes necessary."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 3. Estimating the Scanpath\n",
    "\n",
    "Having matched every frame with a gaze coordinate, we can now get into the meat of the scanpath estimation. In dynamic scenes, the same object will not occupy the same scene-camera location over time. Therefore, we need to continuously map past fixation points as long as they are still visible in the frame.\n",
    "\n",
    "The `estimate_scanpath` method achieves this by feeding fixation point denoted as `end` into a Lucas-Kanade sparse optical flow algorithm. This algorithm compares the video in vicinity of the point with the subsequent frame, updating the location in dependence of its movement. While a point is tracked, its status is flagged as `tracked`. In practice, many scene frames will have multiple simultaneously present past fixations. Our implementation carries them and repeately performs an optical flow estimation for each point. Only when they can no longer be tracked, will they be flagged as `lost` and subsequently dropped for the next frame.\n",
    "\n",
    "It should be noted that this algorithm is not optimised for performance and that it will take a considerable amount of time to run on limited hardware. For our computers, the algorithm takes roughly half the time of the video, though this benchmark heavily depends on the density of past fixation points and computational ressources\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading saved scanpath from C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\derivatives\\scanpath.pkl\n",
      "                                                             fixations  \\\n",
      "timestamp [ns]                                                           \n",
      "1750070967817000000    fixation id gaze x [px] gaze y [px] fixation...   \n",
      "1750070967867000000    fixation id gaze x [px] gaze y [px] fixation...   \n",
      "1750070967917000000    fixation id gaze x [px] gaze y [px] fixation...   \n",
      "1750070967967000000    fixation id gaze x [px] gaze y [px] fixation...   \n",
      "1750070968017000000    fixation id gaze x [px] gaze y [px] fixation...   \n",
      "\n",
      "                     frame_idx  \n",
      "timestamp [ns]                  \n",
      "1750070967817000000          0  \n",
      "1750070967867000000          1  \n",
      "1750070967917000000          2  \n",
      "1750070967967000000          3  \n",
      "1750070968017000000          4  \n"
     ]
    }
   ],
   "source": [
    "# Estimate the scanpath based on the mapped gaze data\n",
    "scanpath = recording.estimate_scanpath()\n",
    "\n",
    "# Inspect the estimated scanpath\n",
    "print(scanpath.data.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should take a moment to understand the format of the `scanpath.data`. As we care about getting a scanpath mapped on every single video-frame, we create it as a dataframe of dataframes. As such, every row carries both the timestamp as well as the frame index of the underlying video and saves a dataframe in the `fixations` cell. In this dataframe, every present fixation is provided with an id, coordinates and a fixation status, as seen below. The benefit of treating is a dataframe is the possibility to use intuitive pandas indexing, allowing us, for example, to get a list of fixations at frame 2000.\n",
    "\n",
    "As a quirk of Neon taking some time to start up, the first frames will usually not yield any usable results. Still, we carry them for consistency. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   fixation id gaze x [px] gaze y [px] fixation status\n",
      "0          174  903.107571  428.153429          during\n",
      "1          173  924.962036  437.257629         tracked\n",
      "2          172  922.353638  493.392975         tracked\n",
      "3          171  960.446289  465.010101         tracked\n",
      "4          170  835.920654  440.765076         tracked\n",
      "5          169  833.580444  467.248718         tracked\n",
      "6          168   698.10437  662.543335         tracked\n",
      "7          167  693.992065  630.123047         tracked\n",
      "8          166  713.760315  561.713867         tracked\n",
      "9          165  543.353333  473.653381         tracked\n",
      "10         164  698.986816   486.05957         tracked\n",
      "11         163  704.265442  439.653992         tracked\n",
      "12         162  750.842773  539.532349         tracked\n"
     ]
    }
   ],
   "source": [
    "# print fixations when column frame_idx is 1334. Frame_idx is not the idx of the dataframe, but the index of the video frame.\n",
    "print(scanpath.data.loc[scanpath.data[\"frame_idx\"] == 2000, \"fixations\"].values[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 4. Understanding Fixation Status\n",
    "\n",
    "Each fixation is assigned a status that indicates its lifecycle:\n",
    "\n",
    "- **start**: first frame of fixation\n",
    "- **during**: intermediate frames of fixation\n",
    "- **end**: last frame of fixation\n",
    "- **tracked**: Optical flow algorithm tracks fixation\n",
    "- **lost**: Tracking is lost, fixation is no longer tracked and gets dropped\n",
    "\n",
    "---\n",
    "\n",
    "## 5. Overlaying Fixations on the Video\n",
    "\n",
    "Now that we have the scanpath, we can overlay the gaze fixations on the video. This creates a video output with overlaid fixations, where:\n",
    "\n",
    "- A **blue dot** represents the current gaze location.\n",
    "- **Green dots** represent tracked fixations.\n",
    "- A **red dot** indicates no fixation (saccades or blinks).\n",
    "\n",
    "Further, we draw connecting lines between past fixations to show the scanpath for the current video. The show_video option creates a live-output of the video rendering, but also increases the runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading saved scanpath from C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\derivatives\\scanpath.pkl\n",
      "Overlay video already exists at C:\\Users\\jan-gabriel.hartel\\Documents\\GitHub\\PyNeon\\data\\Artworks\\Timeseries Data + Scene Video\\artworks-9a141750\\derivatives\\scanpath.mp4; skipping render.\n",
      "`show_video=True` has no effect because rendering was skipped.\n"
     ]
    }
   ],
   "source": [
    "# Overlay the scanpath on the video and show the output\n",
    "recording.overlay_scanpath(show_video=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Summary\n",
    "\n",
    "- **Mapping Gaze to Video**: We used the `map_gaze_to_video` method to match gaze data with video frames based on timestamps.\n",
    "- **Estimating Scanpath**: The scanpath was estimated using `estimate_scanpath`, which tracks fixations and uses optical flow to follow past fixations across scene changes.\n",
    "- **Overlaying Fixations**: The fixations were visualized on the video by calling `overlay_fixations_on_video`.\n",
    "\n",
    "This workflow can be used to process eye-tracking data, align it with video frames, and visualize gaze movements within video recordings.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pyneon",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}