Interpolate data and concatenate streams#

Raw Neon data is not always easy to work with. A data stream (e.g., gaze, eye states, IMU) might not have been sampled at a constant rate. Additionally, streams may have differing sampling rates and lack synchronized start timestamps. All of these issues add complexity to continuous data analysis and sensor fusion.

This tutorial demonstrates how to deal with these issues by interpolating data streams and concatenating them into a single DataFrame. We will use the same boardView dataset as in the previous tutorial.

[1]:
import numpy as np
from pyneon import get_sample_data, NeonRecording

import matplotlib.pyplot as plt
import seaborn as sns

recording_dir = (
    get_sample_data("boardView")
    / "Timeseries Data + Scene Video"
    / "boardview1-d4fd9a27"
)

We now access raw data from gaze, eye states, and IMU streams.

[2]:
recording = NeonRecording(recording_dir)
gaze = recording.gaze
eye_states = recording.eye_states
imu = recording.imu

Irregular sampling in data streams#

Data points from each stream are indexed by timestamp [ns], which denotes the UTC time of the sample in nanoseconds. Are these samples uniformly distributed over time? We can examine the initial samples from each stream to assess their distribution, where, due to device boot-up, the sampling may be irregular.

[3]:
# Take the first 0.3 seconds of gaze data
gaze_begin = gaze.crop(0, 0.3, by="time")
# And the corresponding eye states and IMU data
eye_states_begin = eye_states.restrict(gaze_begin)
imu_begin = imu.restrict(gaze_begin)


# Define a function to plot the timestamps of the gaze, eye states, and IMU data
def plot_timestamps(gaze, eye_states, imu, concat_stream=None):
    _, ax = plt.subplots(figsize=(8, 2))
    ax.scatter(gaze.ts, np.ones_like(gaze.ts), s=5)
    ax.scatter(eye_states.ts, np.ones_like(eye_states.ts) * 2, s=5)
    ax.scatter(imu.ts, np.ones_like(imu.ts) * 3, s=5)
    # If a concatenated stream (explained later) is provided, plot its timestamps as well
    if concat_stream is not None:
        ax.scatter(concat_stream.ts, np.ones_like(concat_stream.ts) * 4, s=5)
        ax.set_yticks([1, 2, 3, 4])
        ax.set_yticklabels(["Gaze", "Eye states", "IMU", "Concatenated"])
        ax.set_ylim(0.5, 4.5)
    else:
        ax.set_yticks([1, 2, 3])
        ax.set_yticklabels(["Gaze", "Eye states", "IMU"])
        ax.set_ylim(0.5, 3.5)
    ax.set_xlabel("Timestamp (ns)")
    plt.show()


plot_timestamps(gaze_begin, eye_states_begin, imu_begin)
../_images/tutorials_interpolate_and_concat_5_0.png

As shown in the figure above, in addition to the apparently later onset of IMU data, the first 0.3 second of gaze and eye states data indeed suffers from dropouts.

In addition to dropouts, irregular sampling may also occur, especially for IMU data from our experience. For example, it can be observed in the middle of this recording (5 - 5.3 seconds):

[4]:
gaze_middle = gaze.crop(5, 5.3, by="time")
eye_states_middle = eye_states.restrict(gaze_middle)
imu_middle = imu.restrict(gaze_middle)

plot_timestamps(gaze_middle, eye_states_middle, imu_middle)
../_images/tutorials_interpolate_and_concat_7_0.png

How frequent are these irregularities? We now examine the distribution of the time differences between consecutive samples, and compare them to the expected time difference for a regular, nominal (as specified by Pupil Labs) sampling rate.

[5]:
print(
    f"Nominal sampling frequency of gaze: {gaze.sampling_freq_nominal} Hz. "
    f"Actual: {gaze.sampling_freq_effective:.1f} Hz"
)
print(
    f"Nominal sampling frequency of eye states: {eye_states.sampling_freq_nominal} Hz. "
    f"Actual: {eye_states.sampling_freq_effective:.1f} Hz"
)
print(
    f"Nominal sampling frequency of IMU: {imu.sampling_freq_nominal} Hz. "
    f"Actual: {imu.sampling_freq_effective:.1f} Hz"
)

fig, axs = plt.subplots(3, 1, tight_layout=True)

axs[0].hist(gaze.ts_diff, bins=50)
axs[0].axvline(1e9 / gaze.sampling_freq_nominal, c="red", label="Nominal")
axs[0].set_title("Gaze")

axs[1].hist(eye_states.ts_diff, bins=50)
axs[1].axvline(1e9 / eye_states.sampling_freq_nominal, c="red", label="Nominal")
axs[1].set_title("Eye states")

axs[2].hist(imu.ts_diff, bins=50)
axs[2].axvline(1e9 / imu.sampling_freq_nominal, c="red", label="Nominal")
axs[2].set_title("IMU")
axs[2].set_xlabel("Time difference [ns]")

for i in range(3):
    axs[i].set_yscale("log")
    axs[i].set_ylabel("Counts")
    axs[i].legend()
plt.show()
Nominal sampling frequency of gaze: 200 Hz. Actual: 199.4 Hz
Nominal sampling frequency of eye states: 200 Hz. Actual: 199.4 Hz
Nominal sampling frequency of IMU: 110 Hz. Actual: 113.9 Hz
../_images/tutorials_interpolate_and_concat_9_1.png

For gaze and eye states data, the empirical distribution of time differences is close to the expected value. However, some integer multiples of the nominal sampling rate suggest possible eye video frame drops. For IMU data, the distribution is much wider.

Interpolating data streams#

Given the presence of irregular sampling, if you want to perform analyses that assume continuous data streams, interpolation is necessary. PyNeon uses the scipy.interpolate.interp1d (API reference) function to interpolate data streams. The interpolate() method of NeonStream creates a new object with interpolated data.

With the default parameters, the interpolated data will have the same start timestamp as the original data, and the sampling rate is set to the nominal sampling frequency specified by Pupil Labs (200 Hz for gaze and eye states, 110 Hz for IMU).

[6]:
# Interpolate to the nominal sampling frequency
gaze_resampled = gaze.interpolate()

# Three ways you can check if the resampling was successful:
# 1. Compare the effective sampling frequency to the nominal sampling frequency
print(
    f"Nominal sampling frequency of gaze: {gaze_resampled.sampling_freq_nominal} Hz. "
    f"Actual (after interpolation): {gaze_resampled.sampling_freq_effective:.1f} Hz"
)
# 2. Check the number of unique time differences
print(f"Only one unique time difference: {np.unique(gaze_resampled.ts_diff)}")
# 3. Call the `is_uniformly_sampled` property (boolean)
print(
    f"The new gaze stream is uniformly sampled: {gaze_resampled.is_uniformly_sampled}"
)
print(gaze_resampled.data.dtypes)
Nominal sampling frequency of gaze: 200 Hz. Actual (after interpolation): 200.0 Hz
Only one unique time difference: [5000000]
The new gaze stream is uniformly sampled: True
gaze x [px]        float64
gaze y [px]        float64
worn                 Int32
fixation id          Int32
blink id             Int32
azimuth [deg]      float64
elevation [deg]    float64
dtype: object

Note that after intepolation, the data types of the columns are preserved.

Alternatively, one can also resample the gaze data to any desired timestamps by specifying the new_ts parameter. This is especially helpful when synchronizing different data streams. For example, we can resample the gaze data (~200Hz) to the timestamps of the IMU data (~110Hz).

[7]:
print(f"Original gaze data length: {len(gaze)}")
print(f"Original IMU data length: {len(imu)}")
gaze_resampled_to_imu = gaze.interpolate(new_ts=imu.ts)
print(f"Gaze data length after resampling to IMU: {len(gaze_resampled_to_imu)}")
Original gaze data length: 6091
Original IMU data length: 3459
Gaze data length after resampling to IMU: 3459

By default, float-type data is interpolated using the 'cubic' method. We are examine its behavior by comparing interpolated data with the raw data.

[8]:
gaze_resampled_begin = gaze_resampled.restrict(gaze_begin)

fig, ax = plt.subplots(figsize=(8, 2))
sns.lineplot(
    ax=ax,
    data=gaze_begin.data,
    x=gaze_begin.ts,
    y="gaze x [px]",
    marker="o",
    label="Raw",
)
sns.lineplot(
    ax=ax,
    data=gaze_resampled_begin.data,
    x=gaze_resampled_begin.ts,
    y="gaze x [px]",
    marker="^",
    alpha=0.6,
    label="Interpolated",
)
plt.legend()
plt.show()
../_images/tutorials_interpolate_and_concat_16_0.png

Concatenating different streams#

There might be cases where you want to concatenate different streams into a single one to facilitate further analysis (e.g., epoching). This is possible by interpolating the streams’ data to common timestamps and concatenating them into a single DataFrame.

The method concat_streams() of NeonRecording provides such functionality. It takes a list of stream names and resamples them to common timestamps, defined by the latest start and earliest end timestamps of the streams. The new sampling frequency can either be directly specified or taken from the lowest/highest sampling frequency of the streams.

In the following example, we concatenate the gaze, eye states, and IMU streams into a single DataFrame using the default parameters.

[9]:
concat_stream = recording.concat_streams(["gaze", "eye_states", "imu"])
print(concat_stream.data.head())
print(concat_stream.data.columns)
Concatenating streams:
        Gaze
        3D eye states
        IMU
Using lowest sampling rate: 110 Hz (['imu'])
Using latest start timestamp: 1732621490607650343 (['imu'])
Using earliest last timestamp: 1732621520979070343 (['gaze' '3d_eye_states'])
                     gaze x [px]  gaze y [px]  worn  fixation id  blink id  \
1732621490607650343   705.518843   554.990998     1            1      <NA>
1732621490616741252   704.882466   553.793144     1            1      <NA>
1732621490625832161   707.703787   556.712159     1            1      <NA>
1732621490634923070   711.389879   553.846843     1            1      <NA>
1732621490644013979   709.281775   555.543777     1            1      <NA>

                     azimuth [deg]  elevation [deg]  pupil diameter left [mm]  \
1732621490607650343      -7.085339         3.473196                  3.346414
1732621490616741252      -7.126717         3.550038                  3.363306
1732621490625832161      -6.944033         3.363048                  3.368352
1732621490634923070      -6.707339         3.547815                  3.365432
1732621490644013979      -6.842683         3.438366                  3.374732

                     pupil diameter right [mm]  eyeball center left x [mm]  \
1732621490607650343                   3.360563                  -32.282935
1732621490616741252                   3.359459                  -32.249418
1732621490625832161                   3.350918                  -32.216781
1732621490634923070                   3.361014                  -32.249155
1732621490644013979                   3.365781                  -32.234159

                     ...  acceleration x [g]  acceleration y [g]  \
1732621490607650343  ...           -0.067383           -0.340820
1732621490616741252  ...           -0.062619           -0.315917
1732621490625832161  ...           -0.052682           -0.329993
1732621490634923070  ...           -0.058795           -0.334090
1732621490644013979  ...           -0.060815           -0.322199

                     acceleration z [g]  roll [deg]  pitch [deg]   yaw [deg]  \
1732621490607650343            0.932129    1.923968   -20.230545  132.920122
1732621490616741252            0.925714    1.923949   -20.227639  132.924402
1732621490625832161            0.924432    1.920479   -20.228228  132.927574
1732621490634923070            0.931812    1.916587   -20.228839  132.931756
1732621490644013979            0.925476    1.916104   -20.227092  132.937429

                     quaternion w  quaternion x  quaternion y  quaternion z
1732621490607650343      0.395828     -0.085287     -0.154390      0.901227
1732621490616741252      0.395796     -0.085271     -0.154370      0.901246
1732621490625832161      0.395766     -0.085242     -0.154389      0.901258
1732621490634923070      0.395728     -0.085207     -0.154411      0.901275
1732621490644013979      0.395683     -0.085190     -0.154403      0.901297

[5 rows x 34 columns]
Index(['gaze x [px]', 'gaze y [px]', 'worn', 'fixation id', 'blink id',
       'azimuth [deg]', 'elevation [deg]', 'pupil diameter left [mm]',
       'pupil diameter right [mm]', 'eyeball center left x [mm]',
       'eyeball center left y [mm]', 'eyeball center left z [mm]',
       'eyeball center right x [mm]', 'eyeball center right y [mm]',
       'eyeball center right z [mm]', 'optical axis left x',
       'optical axis left y', 'optical axis left z', 'optical axis right x',
       'optical axis right y', 'optical axis right z', 'gyro x [deg/s]',
       'gyro y [deg/s]', 'gyro z [deg/s]', 'acceleration x [g]',
       'acceleration y [g]', 'acceleration z [g]', 'roll [deg]', 'pitch [deg]',
       'yaw [deg]', 'quaternion w', 'quaternion x', 'quaternion y',
       'quaternion z'],
      dtype='object')

We show an exemplary sampling of eye, imu and concatenated data below. It can be seen that the concatenated data is regularly sampled at the nominal sampling rate of IMU.

[10]:
concat_stream_middle = concat_stream.restrict(gaze_middle)
plot_timestamps(gaze_middle, eye_states_middle, imu_middle, concat_stream_middle)
../_images/tutorials_interpolate_and_concat_20_0.png

Despite different sampling from the original streams, the concatenated data still respects the original values of the data. For example, the gaze x [px] and acceleration x [g] data from the raw and concatenated stream are quite comparable.

[11]:
fig, axs = plt.subplots(2, 1, figsize=(8, 4), tight_layout=True)
sns.lineplot(
    ax=axs[0],
    data=gaze_middle.data,
    x=gaze_middle.ts,
    y="gaze x [px]",
    marker="o",
    label="Raw gaze",
)
sns.lineplot(
    ax=axs[0],
    data=concat_stream_middle.data,
    x=concat_stream_middle.ts,
    y="gaze x [px]",
    marker="^",
    label="Concatenated",
)
sns.lineplot(
    ax=axs[1],
    data=imu_middle.data,
    x=imu_middle.ts,
    y="acceleration x [g]",
    marker="o",
    label="Raw IMU",
)
sns.lineplot(
    ax=axs[1],
    data=concat_stream_middle.data,
    x=concat_stream_middle.ts,
    y="acceleration x [g]",
    marker="^",
    label="Concatenated",
)
plt.legend()
plt.show()
../_images/tutorials_interpolate_and_concat_22_0.png