{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Interpolate data and concatenate streams\n", "\n", "Informative as it is, raw Neon data is not always easy to work with. Different data streams (e.g., gaze, eye states, IMU) are sampled at different rates, don't necessarily share a common start timestamp, and within each stream data might not have been sampled at a constant rate. This tutorial demonstrates how to deal with these issues by interpolating data streams and concatenating them into a single DataFrame.\n", "\n", "We will use the same ``boardView`` dataset as in the [previous tutorial](read_recording.ipynb)." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from pyneon import get_sample_data, NeonRecording\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "recording_dir = (\n", " get_sample_data(\"boardView\")\n", " / \"Timeseries Data + Scene Video\"\n", " / \"boardview1-d4fd9a27\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now access raw data from gaze, eye states, and IMU streams." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "recording = NeonRecording(recording_dir)\n", "gaze = recording.gaze\n", "eye_states = recording.eye_states\n", "imu = recording.imu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Irregularly sampled data\n", "Data points from each stream are indexed by `timestamp [ns]`, which denotes the UTC time of the sample in nanoseconds. But are these samples equally spaced in time? Let's take a look at the first few samples of each stream, where, due to device boot-up, the sampling may be irregular." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Take the first 0.5 seconds of gaze data\n", "gaze_begin = gaze.crop(0, 0.5, by=\"time\")\n", "# And the corresponding eye states and IMU data\n", "eye_states_begin = eye_states.restrict(gaze_begin)\n", "imu_begin = imu.restrict(gaze_begin)\n", "\n", "\n", "# Define a function to plot the timestamps of the gaze, eye states, and IMU data\n", "def plot_timestamps(gaze, eye_states, imu):\n", " _, ax = plt.subplots(figsize=(8, 2))\n", " ax.scatter(gaze.ts, np.ones_like(gaze.ts), s=5)\n", " ax.scatter(eye_states.ts, np.ones_like(eye_states.ts) * 2, s=5)\n", " ax.scatter(imu.ts, np.ones_like(imu.ts) * 3, s=5)\n", " ax.set_yticks([1, 2, 3])\n", " ax.set_yticklabels([\"Gaze\", \"Eye states\", \"IMU\"])\n", " ax.set_ylim(0.5, 3.5)\n", " ax.set_xlabel(\"Timestamp (ns)\")\n", " plt.show()\n", "\n", "\n", "plot_timestamps(gaze_begin, eye_states_begin, imu_begin)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As apparent from the figure above, in addition to the apparently later onset of IMU data, during the first 0.5 seconds of the recording the data suffers from dropouts and irregular sampling. While this is a common issue for the first few samples due to device boot-up, it could also happen at any time during the recording. For example, even in the middle of a recording (5 - 5.5 seconds), some irregular sampling might still be observed:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gaze_middle = gaze.crop(5, 5.5, by=\"time\")\n", "eye_states_middle = eye_states.restrict(gaze_middle)\n", "imu_middle = imu.restrict(gaze_middle)\n", "\n", "plot_timestamps(gaze_middle, eye_states_middle, imu_middle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some irregular sampling in the IMU data is observed in this segment of the recording as well. How frequent are these irregularities? Let's take a look at the distribution of the time differences between consecutive samples, and compare them to the expected time difference for a regular, nominal (as specified by Pupil Labs) sampling rate." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nominal sampling frequency of gaze: 200 Hz. Actual: 199.36 Hz\n", "Nominal sampling frequency of eye states: 200 Hz. Actual: 199.36 Hz\n", "Nominal sampling frequency of IMU: 110 Hz. Actual: 113.88 Hz\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(\n", " f\"Nominal sampling frequency of gaze: {gaze.sampling_freq_nominal} Hz. \"\n", " f\"Actual: {gaze.sampling_freq_effective:.2f} Hz\"\n", ")\n", "print(\n", " f\"Nominal sampling frequency of eye states: {eye_states.sampling_freq_nominal} Hz. \"\n", " f\"Actual: {eye_states.sampling_freq_effective:.2f} Hz\"\n", ")\n", "print(\n", " f\"Nominal sampling frequency of IMU: {imu.sampling_freq_nominal} Hz. \"\n", " f\"Actual: {imu.sampling_freq_effective:.2f} Hz\"\n", ")\n", "\n", "fig, axs = plt.subplots(3, 1, tight_layout=True)\n", "\n", "axs[0].hist(gaze.ts_diff, bins=50)\n", "axs[0].axvline(1e9 / gaze.sampling_freq_nominal, c=\"red\", label=\"Nominal\")\n", "axs[0].set_title(\"Gaze\")\n", "\n", "axs[1].hist(eye_states.ts_diff, bins=50)\n", "axs[1].axvline(1e9 / eye_states.sampling_freq_nominal, c=\"red\", label=\"Nominal\")\n", "axs[1].set_title(\"Eye states\")\n", "\n", "axs[2].hist(imu.ts_diff, bins=50)\n", "axs[2].axvline(1e9 / imu.sampling_freq_nominal, c=\"red\", label=\"Nominal\")\n", "axs[2].set_title(\"IMU\")\n", "axs[2].set_xlabel(\"Time difference [ns]\")\n", "\n", "for i in range(3):\n", " axs[i].set_yscale(\"log\")\n", " axs[i].set_ylabel(\"Counts\")\n", " axs[i].legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For gaze and eye states data, the empirical distribution of time differences is close to the expected value (though with some integer multiples of the nominal sampling rate, which hints at possible eye video frame drops). For IMU data, the distribution is much wider." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolating data streams\n", "\n", "Given the presence of irregular sampling, if you want to perform analyses that assume continuous data streams, interpolation is necessary. PyNeon uses the `scipy.interpolate.interp1d` [(API reference)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html) function to interpolate data streams. For instances of `NeonStream`, we can call `interpolate()` which returns a copy of the object with the data interpolated with the default parameters." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Nominal sampling frequency of gaze: 200 Hz. Actual (after interpolation): 200.03 Hz\n", "Only one unique time difference: [5000000]\n", "The new gaze stream is uniformly sampled: True\n" ] } ], "source": [ "# Resample to the nominal sampling frequency\n", "gaze_resampled = gaze.interpolate()\n", "\n", "# Three ways you can check if the resampling was successful:\n", "# 1. Compare the effective sampling frequency to the nominal sampling frequency\n", "print(\n", " f\"Nominal sampling frequency of gaze: {gaze_resampled.sampling_freq_nominal} Hz. \"\n", " f\"Actual (after interpolation): {gaze_resampled.sampling_freq_effective:.2f} Hz\"\n", ")\n", "# 2. Check the number of unique time differences\n", "print(f\"Only one unique time difference: {np.unique(gaze_resampled.ts_diff)}\")\n", "# 3. Call the `is_uniformly_sampled` property (boolean)\n", "print(\n", " f\"The new gaze stream is uniformly sampled: {gaze_resampled.is_uniformly_sampled}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above example, we resampled the gaze data with default parameters, which means that the resampled data will have the same start and timestamps as the original data, and the sampling rate is set to the nominal sampling frequency (200 Hz, as specified by Pupil Labs). Notice that resampling would not change the data type of the columns. For example, the `bool`-type `worn` column and the integer-type `fixation_id` column are preserved.\n", "nominal sampling rate (you can also customize by passing the `new_ts` argument).\n", "\n", "Alternatively, one can also resample the gaze data to any desired timestamps by specifying the `new_ts` parameter. This is especially helpful when synchronizing different data streams. For example, we can resample the gaze data (~200Hz) to the timestamps of the IMU data (~110Hz)." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original gaze data length: 6091\n", "Original IMU data length: 3459\n", "Gaze data length after resampling to IMU: 3459\n" ] } ], "source": [ "print(f\"Original gaze data length: {gaze.data.shape[0]}\")\n", "print(f\"Original IMU data length: {imu.data.shape[0]}\")\n", "gaze_resampled_to_imu = gaze.interpolate(new_ts=imu.ts)\n", "print(\n", " f\"Gaze data length after resampling to IMU: {gaze_resampled_to_imu.data.shape[0]}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Concatenating different streams\n", "\n", "Based on the resampling method, it is then possible to concatenate different streams into a single DataFrame by resampling them to common timestamps. The method `concat_streams()` provides such functionality. It takes a list of stream names and resamples them to common timestamps, defined by the latest start and earliest end timestamps of the streams. The news ampling frequency can either be directly specified or taken from the lowest/highest sampling frequency of the streams.\n", "\n", "In the following example, we will concatenate the gaze, eye states, and IMU streams into a single DataFrame using the default parameters (e.g., using the lowest sampling frequency of the streams)." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Concatenating streams:\n", "\tGaze\n", "\t3D eye states\n", "\tIMU\n", "Using lowest sampling rate: 110 Hz (['imu'])\n", "Using latest start timestamp: 1732621490607650343 (['imu'])\n", "Using earliest last timestamp: 1732621520979070343 (['gaze' '3d_eye_states'])\n", " gaze x [px] gaze y [px] worn fixation id blink id \\\n", "1732621490607650343 705.518843 554.990998 1 1 \n", "1732621490616741252 704.882466 553.793144 1 1 \n", "1732621490625832161 707.703787 556.712159 1 1 \n", "1732621490634923070 711.389879 553.846843 1 1 \n", "1732621490644013979 709.281775 555.543777 1 1 \n", "\n", " azimuth [deg] elevation [deg] pupil diameter left [mm] \\\n", "1732621490607650343 -7.085339 3.473196 3.346414 \n", "1732621490616741252 -7.126717 3.550038 3.363306 \n", "1732621490625832161 -6.944033 3.363048 3.368352 \n", "1732621490634923070 -6.707339 3.547815 3.365432 \n", "1732621490644013979 -6.842683 3.438366 3.374732 \n", "\n", " pupil diameter right [mm] eyeball center left x [mm] \\\n", "1732621490607650343 3.360563 -32.282935 \n", "1732621490616741252 3.359459 -32.249418 \n", "1732621490625832161 3.350918 -32.216781 \n", "1732621490634923070 3.361014 -32.249155 \n", "1732621490644013979 3.365781 -32.234159 \n", "\n", " ... acceleration x [g] acceleration y [g] \\\n", "1732621490607650343 ... -0.067383 -0.340820 \n", "1732621490616741252 ... -0.062619 -0.315917 \n", "1732621490625832161 ... -0.052682 -0.329993 \n", "1732621490634923070 ... -0.058795 -0.334090 \n", "1732621490644013979 ... -0.060815 -0.322199 \n", "\n", " acceleration z [g] roll [deg] pitch [deg] yaw [deg] \\\n", "1732621490607650343 0.932129 1.923968 -20.230545 132.920122 \n", "1732621490616741252 0.925714 1.923949 -20.227639 132.924402 \n", "1732621490625832161 0.924432 1.920479 -20.228228 132.927574 \n", "1732621490634923070 0.931812 1.916587 -20.228839 132.931756 \n", "1732621490644013979 0.925476 1.916104 -20.227092 132.937429 \n", "\n", " quaternion w quaternion x quaternion y quaternion z \n", "1732621490607650343 0.395828 -0.085287 -0.154390 0.901227 \n", "1732621490616741252 0.395796 -0.085271 -0.154370 0.901246 \n", "1732621490625832161 0.395766 -0.085242 -0.154389 0.901258 \n", "1732621490634923070 0.395728 -0.085207 -0.154411 0.901275 \n", "1732621490644013979 0.395683 -0.085190 -0.154403 0.901297 \n", "\n", "[5 rows x 34 columns]\n" ] } ], "source": [ "concat_stream = recording.concat_streams([\"gaze\", \"eye_states\", \"imu\"])\n", "print(concat_stream.data.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We show an exemplary sampling of eye, imu and concatenated data below. It can be seen that imu data has subsequent missing values which can in turn be interpolated" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'time_to_ts' is not defined", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[36], line 3\u001b[0m\n\u001b[0;32m 1\u001b[0m start_time \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m5\u001b[39m\n\u001b[0;32m 2\u001b[0m end_time \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m5.3\u001b[39m\n\u001b[1;32m----> 3\u001b[0m start_ts \u001b[38;5;241m=\u001b[39m \u001b[43mtime_to_ts\u001b[49m(start_time, concat_stream)\n\u001b[0;32m 4\u001b[0m end_ts \u001b[38;5;241m=\u001b[39m time_to_ts(end_time, concat_stream)\n\u001b[0;32m 6\u001b[0m raw_gaze_data_slice \u001b[38;5;241m=\u001b[39m gaze\u001b[38;5;241m.\u001b[39mdata[\n\u001b[0;32m 7\u001b[0m (gaze\u001b[38;5;241m.\u001b[39mdata[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtimestamp [ns]\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m>\u001b[39m\u001b[38;5;241m=\u001b[39m start_ts) \u001b[38;5;241m&\u001b[39m (gaze\u001b[38;5;241m.\u001b[39mdata[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtimestamp [ns]\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m<\u001b[39m\u001b[38;5;241m=\u001b[39m end_ts)\n\u001b[0;32m 8\u001b[0m ]\n", "\u001b[1;31mNameError\u001b[0m: name 'time_to_ts' is not defined" ] } ], "source": [ "start_time = 5\n", "end_time = 5.3\n", "start_ts = time_to_ts(start_time, concat_stream)\n", "end_ts = time_to_ts(end_time, concat_stream)\n", "\n", "raw_gaze_data_slice = gaze.data[\n", " (gaze.data[\"timestamp [ns]\"] >= start_ts) & (gaze.data[\"timestamp [ns]\"] <= end_ts)\n", "]\n", "raw_eye_states_data_slice = eye_states.data[\n", " (eye_states.data[\"timestamp [ns]\"] > start_ts)\n", " & (eye_states.data[\"timestamp [ns]\"] <= end_ts)\n", "]\n", "raw_imu_data_slice = imu.data[\n", " (imu.data[\"timestamp [ns]\"] >= start_ts) & (imu.data[\"timestamp [ns]\"] <= end_ts)\n", "]\n", "concat_data_slice = concat_stream[\n", " (concat_stream[\"timestamp [ns]\"] >= start_ts)\n", " & (concat_stream[\"timestamp [ns]\"] <= end_ts)\n", "]\n", "\n", "# plot all data in the same scatter plot\n", "plt.figure(figsize=(15, 4))\n", "plt.scatter(\n", " raw_gaze_data_slice[\"timestamp [ns]\"],\n", " np.zeros_like(raw_gaze_data_slice[\"timestamp [ns]\"]) + 2,\n", " label=\"Raw gaze data\",\n", " color=\"red\",\n", ")\n", "plt.scatter(\n", " raw_eye_states_data_slice[\"timestamp [ns]\"],\n", " np.zeros_like(raw_eye_states_data_slice[\"timestamp [ns]\"]) + 1,\n", " label=\"Raw eye states data\",\n", " color=\"orange\",\n", ")\n", "plt.scatter(\n", " raw_imu_data_slice[\"timestamp [ns]\"],\n", " np.zeros_like(raw_imu_data_slice[\"timestamp [ns]\"]),\n", " label=\"Raw IMU data\",\n", " color=\"blue\",\n", ")\n", "plt.scatter(\n", " concat_data_slice[\"timestamp [ns]\"],\n", " np.zeros_like(concat_data_slice[\"timestamp [ns]\"]) - 1,\n", " label=\"Concatenated data\",\n", " color=\"green\",\n", ")\n", "# set x-ticks with higher frequency and add gridlines\n", "plt.xticks(concat_data_slice[\"timestamp [ns]\"], labels=None)\n", "plt.yticks([])\n", "plt.grid()\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A linear interpolation allows us to estimate missing values. In the end, the concatenated dataframe combines all continuous data into one central location" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot imu data and interpolated data in same plot\n", "plt.figure(figsize=(16, 3))\n", "plt.scatter(\n", " raw_imu_data_slice[\"time [s]\"],\n", " raw_imu_data_slice[\"acceleration z [g]\"],\n", " label=\"Raw imu data\",\n", " color=\"blue\",\n", ")\n", "plt.plot(\n", " concat_data_slice[\"time [s]\"],\n", " concat_data_slice[\"acceleration z [g]\"],\n", " label=\"Interpolated imu data\",\n", " color=\"green\",\n", ")\n", "plt.legend()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 2 }