Reading a Neon dataset/recording#

In this tutorial, we will show how to load a single Neon recording downloaded from Pupil Cloud.

Reading sample data#

We will use a sample recording produced by the NCC Lab called OfficeWalk. It’s a project with 2 recordings and multiple enrichments and can be downloaded with the get_sample_data() function:

[1]:
import sys
from pyneon import get_sample_data, NeonDataset, NeonRecording

sample_dir = get_sample_data("OfficeWalk")

The OfficeWalk data has the following structure:

OfficeWalk
├── Timeseries Data
│   ├── walk1-e116e606
│   │   ├── info.json
│   │   ├── gaze.csv
│   │   └── ....
│   ├── walk2-93b8c234
│   │   ├── info.json
│   │   ├── gaze.csv
│   │   └── ....
|   ├── enrichment_info.txt
|   └── sections.csv
├── OfficeWalk_FACE-MAPPER_FaceMap
├── OfficeWalk_MARKER-MAPPER_TagMap_csv
└── OfficeWalk_STATIC-IMAGE-MAPPER_ManualMap_csv

The Timeseries Data folder contains what PyNeon calls a NeonDataset. It contains multiple recordings, each with its own info.json file and data files. These recordings can either be loaded individually as NeonRecordings or as a wholist NeonDataset.

If loading a NeonDataset, specify the path to the Timeseries Data folder to create a NeonDataset object:

[2]:
dataset_dir = sample_dir / "Timeseries Data"
dataset = NeonDataset(dataset_dir)
print(dataset)
NeonDataset | 2 recordings

NeonDataset has a recordings attribute that contains a list of NeonRecording objects. These NeonRecording objects can be accessed by their index.

[3]:
first_recording = dataset[0]
print(type(first_recording))
<class 'pyneon.recording.NeonRecording'>

Alternatively, one can directly load a single NeonRecording by specifying the path to the recording’s folder:

[4]:
recording_dir = dataset_dir / "walk1-e116e606"
recording = NeonRecording(recording_dir)
print(type(recording))
<class 'pyneon.recording.NeonRecording'>

Data and metadata of a NeonRecording#

An overview of basic metadata and contents of a NeonRecording can be obtained by printing the object. An initiated NeonRecording locates data files in the recording directory but does not load them until requested to be memory efficient.

[5]:
print(recording)

Recording ID: e116e606-5f3f-4d34-8727-040b8762cef8
Wearer ID: bcff2832-cfcb-4f89-abef-7bbfe91ec561
Wearer name: Qian
Recording start time: 2024-08-30 17:37:01.527000
Recording duration: 98.213 s
                  exist              filename                                                                                                           path
3d_eye_states      True     3d_eye_states.csv     C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\3d_eye_states.csv
blinks             True            blinks.csv            C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\blinks.csv
events             True            events.csv            C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\events.csv
fixations          True         fixations.csv         C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\fixations.csv
gaze               True              gaze.csv              C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\gaze.csv
imu                True               imu.csv               C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\imu.csv
labels             True            labels.csv            C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\labels.csv
saccades           True          saccades.csv          C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\saccades.csv
world_timestamps   True  world_timestamps.csv  C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\world_timestamps.csv
scene_video_info  False                  None                                                                                                           None
scene_video       False                  None                                                                                                           None

As seen in the output, this recording contains every file other than the scene video. This is because we downloaded the “Timeseries Data” instead of “Timeseries Data + Scene Video” from Pupil Cloud. For more information on how to process video files, see the video tutorial.

Individual data streams can be accessed as properties of the NeonRecording object. For example, the gaze data can be accessed as recording.gaze, and upon accessing, the tabular data is loaded into memory.

[6]:
print(f"recording._gaze size before accessing `gaze`: {sys.getsizeof(recording._gaze)}")

gaze = recording.gaze
print(f"recording.gaze is of type: {type(gaze)}")
print(f"recording._gaze size after accessing `gaze`: {sys.getsizeof(recording._gaze)}")
recording._gaze size before accessing `gaze`: 16
recording.gaze is of type: <class 'pyneon.stream.NeonGaze'>
recording._gaze size after accessing `gaze`: 48

On the other hand, if you try to access unavailable data like the video, it will simply return None.

[7]:
video = recording.video
print(video)
None
C:\Users\qian.chu\Documents\GitHub\pyneon\pyneon\recording.py:273: UserWarning: Scene video not loaded because no video or video timestamps file was found.
  warnings.warn(

We can access the timeseries data in the gaze stream as a pandas DataFrame by accessing the data attribute of the gaze stream. The columns of the DataFrame include timestamp [ns] and channel data columns. During loading, PyNeon strips the redundant section id and recording id columns and adds a more human-readable time [s] column to represent the time of each sample in seconds relative to the start of the data stream.

[8]:
print(gaze.data.head())
        timestamp [ns]  gaze x [px]  gaze y [px]  worn  fixation id  blink id  \
0  1725032224852161732     1067.486      620.856  True            1      <NA>
1  1725032224857165732     1066.920      617.117  True            1      <NA>
2  1725032224862161732     1072.699      615.780  True            1      <NA>
3  1725032224867161732     1067.447      617.062  True            1      <NA>
4  1725032224872161732     1071.564      613.158  True            1      <NA>

   azimuth [deg]  elevation [deg]  time [s]
0      16.213030        -0.748998  0.000000
1      16.176285        -0.511733  0.005004
2      16.546413        -0.426618  0.010000
3      16.210049        -0.508251  0.015000
4      16.473521        -0.260388  0.020000

PyNeon also automatically sets the column datatype to appropriate types, such as Int64 for timestamps, Int32 for event IDs, and float64 for float data.

[9]:
print(gaze.data.dtypes)
timestamp [ns]       Int64
gaze x [px]        float64
gaze y [px]        float64
worn                  bool
fixation id          Int32
blink id             Int32
azimuth [deg]      float64
elevation [deg]    float64
time [s]           float64
dtype: object

Data streams and events#

Up to this point, PyNeon simply reads and re-organizes the raw .csv files. Let’s plot some samples from the gaze and eye_states streams and a saccade from the saccades events.

[10]:
import matplotlib.pyplot as plt
import seaborn as sns

gaze_color = "royalblue"
gyro_color = "darkorange"

imu = recording.imu
saccades = recording.saccades

# Create a figure
fig, ax = plt.subplots(figsize=(10, 5))
ax2 = ax.twinx()
ax.yaxis.label.set_color(gaze_color)
ax2.yaxis.label.set_color(gyro_color)

# Visualize the 2nd saccade
saccade = saccades.data.iloc[1]
ax.axvspan(
    saccade["start timestamp [ns]"], saccade["end timestamp [ns]"], color="lightgray"
)
ax.text(
    (saccade["start timestamp [ns]"] + saccade["end timestamp [ns]"]) / 2,
    1050,
    "Saccade",
    horizontalalignment="center",
)

# Visualize gaze x and pupil diameter left
sns.scatterplot(
    ax=ax,
    data=gaze.data.head(100),
    x="timestamp [ns]",
    y="gaze x [px]",
    color=gaze_color,
)
sns.scatterplot(
    ax=ax2,
    data=imu.data.head(60),
    x="timestamp [ns]",
    y="gyro x [deg/s]",
    color=gyro_color,
)
[10]:
<Axes: xlabel='timestamp [ns]', ylabel='gyro x [deg/s]'>
../_images/tutorials_read_recording_19_1.png

It’s apparent that at the beginning of the recording, there are some missing data points in both the gaze and imu streams. This is presumably due to the time it takes for the sensors to start up and stabilize. We will show how to handle missing data using resampling in the next tutorial. For now, it’s important to be aware of these gaps and that it will require great caution to assume the data is continuously and equally sampled.

PyNeon also calculates the effective (as opposed to the nominal) sampling frequency of each stream by dividing the number of samples by the duration of the recording.

[11]:
print(
    f"Gaze: nominal sampling frequency = {gaze.sampling_freq_nominal}, "
    f"effective sampling frequency = {gaze.sampling_freq_effective}"
)
print(
    f"IMU: nominal sampling frequency = {recording.imu.sampling_freq_nominal}, "
    f"effective sampling frequency = {recording.imu.sampling_freq_effective}"
)
Gaze: nominal sampling frequency = 200, effective sampling frequency = 197.8078038925275
IMU: nominal sampling frequency = 110, effective sampling frequency = 115.35532450871617

Visualizing gaze heatmap#

Finally, we will show how to plot a heatmap of the gaze/fixation data.

[12]:
fig, ax = recording.plot_distribution()
C:\Users\qian.chu\Documents\GitHub\pyneon\pyneon\recording.py:273: UserWarning: Scene video not loaded because no video or video timestamps file was found.
  warnings.warn(
../_images/tutorials_read_recording_23_1.png

we can neatly see that the recorded data shows a centre-bias, which is a well-known effect from eye statistics. In y, we can see that fixations tend to occur below the horizon, which is indicative of a walking task where a participant looks at the floor in front of them more often