Reading a Neon dataset/recording#
In this tutorial, we will show how to load a single Neon recording downloaded from Pupil Cloud.
Reading sample data#
We will use a sample recording produced by the NCC Lab called OfficeWalk
. It’s a project with 2 recordings and multiple enrichments and can be downloaded with the get_sample_data()
function:
[1]:
import sys
from pyneon import get_sample_data, NeonDataset, NeonRecording
sample_dir = get_sample_data("OfficeWalk")
The OfficeWalk
data has the following structure:
OfficeWalk
├── Timeseries Data
│ ├── walk1-e116e606
│ │ ├── info.json
│ │ ├── gaze.csv
│ │ └── ....
│ ├── walk2-93b8c234
│ │ ├── info.json
│ │ ├── gaze.csv
│ │ └── ....
| ├── enrichment_info.txt
| └── sections.csv
├── OfficeWalk_FACE-MAPPER_FaceMap
├── OfficeWalk_MARKER-MAPPER_TagMap_csv
└── OfficeWalk_STATIC-IMAGE-MAPPER_ManualMap_csv
The Timeseries Data
folder contains what PyNeon calls a NeonDataset
. It contains multiple recordings, each with its own info.json
file and data files. These recordings can either be loaded individually as NeonRecording
s or as a wholist NeonDataset
.
If loading a NeonDataset
, specify the path to the Timeseries Data
folder to create a NeonDataset
object:
[2]:
dataset_dir = sample_dir / "Timeseries Data"
dataset = NeonDataset(dataset_dir)
print(dataset)
NeonDataset | 2 recordings
NeonDataset has a recordings
attribute that contains a list of NeonRecording
objects. These NeonRecording
objects can be accessed by their index.
[3]:
first_recording = dataset[0]
print(type(first_recording))
<class 'pyneon.recording.NeonRecording'>
Alternatively, one can directly load a single NeonRecording
by specifying the path to the recording’s folder:
[4]:
recording_dir = dataset_dir / "walk1-e116e606"
recording = NeonRecording(recording_dir)
print(type(recording))
<class 'pyneon.recording.NeonRecording'>
Data and metadata of a NeonRecording#
An overview of basic metadata and contents of a NeonRecording
can be obtained by printing the object. An initiated NeonRecording
locates data files in the recording directory but does not load them until requested to be memory efficient.
[5]:
print(recording)
Recording ID: e116e606-5f3f-4d34-8727-040b8762cef8
Wearer ID: bcff2832-cfcb-4f89-abef-7bbfe91ec561
Wearer name: Qian
Recording start time: 2024-08-30 17:37:01.527000
Recording duration: 98.213 s
exist filename path
3d_eye_states True 3d_eye_states.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\3d_eye_states.csv
blinks True blinks.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\blinks.csv
events True events.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\events.csv
fixations True fixations.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\fixations.csv
gaze True gaze.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\gaze.csv
imu True imu.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\imu.csv
labels True labels.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\labels.csv
saccades True saccades.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\saccades.csv
world_timestamps True world_timestamps.csv C:\Users\qian.chu\Documents\GitHub\pyneon\data\OfficeWalk\Timeseries Data\walk1-e116e606\world_timestamps.csv
scene_video_info False None None
scene_video False None None
As seen in the output, this recording contains every file other than the scene video. This is because we downloaded the “Timeseries Data” instead of “Timeseries Data + Scene Video” from Pupil Cloud. For more information on how to process video files, see the video tutorial.
Individual data streams can be accessed as properties of the NeonRecording
object. For example, the gaze data can be accessed as recording.gaze
, and upon accessing, the tabular data is loaded into memory.
[6]:
print(f"recording._gaze size before accessing `gaze`: {sys.getsizeof(recording._gaze)}")
gaze = recording.gaze
print(f"recording.gaze is of type: {type(gaze)}")
print(f"recording._gaze size after accessing `gaze`: {sys.getsizeof(recording._gaze)}")
recording._gaze size before accessing `gaze`: 16
recording.gaze is of type: <class 'pyneon.stream.NeonGaze'>
recording._gaze size after accessing `gaze`: 48
On the other hand, if you try to access unavailable data like the video, it will simply return None
.
[7]:
video = recording.video
print(video)
None
C:\Users\qian.chu\Documents\GitHub\pyneon\pyneon\recording.py:273: UserWarning: Scene video not loaded because no video or video timestamps file was found.
warnings.warn(
We can access the timeseries data in the gaze stream as a pandas DataFrame by accessing the data
attribute of the gaze stream. The columns of the DataFrame include timestamp [ns]
and channel data columns. During loading, PyNeon strips the redundant section id
and recording id
columns and adds a more human-readable time [s]
column to represent the time of each sample in seconds relative to the start of the data stream.
[8]:
print(gaze.data.head())
timestamp [ns] gaze x [px] gaze y [px] worn fixation id blink id \
0 1725032224852161732 1067.486 620.856 True 1 <NA>
1 1725032224857165732 1066.920 617.117 True 1 <NA>
2 1725032224862161732 1072.699 615.780 True 1 <NA>
3 1725032224867161732 1067.447 617.062 True 1 <NA>
4 1725032224872161732 1071.564 613.158 True 1 <NA>
azimuth [deg] elevation [deg] time [s]
0 16.213030 -0.748998 0.000000
1 16.176285 -0.511733 0.005004
2 16.546413 -0.426618 0.010000
3 16.210049 -0.508251 0.015000
4 16.473521 -0.260388 0.020000
PyNeon also automatically sets the column datatype to appropriate types, such as Int64
for timestamps, Int32
for event IDs, and float64
for float data.
[9]:
print(gaze.data.dtypes)
timestamp [ns] Int64
gaze x [px] float64
gaze y [px] float64
worn bool
fixation id Int32
blink id Int32
azimuth [deg] float64
elevation [deg] float64
time [s] float64
dtype: object
Data streams and events#
Up to this point, PyNeon simply reads and re-organizes the raw .csv files. Let’s plot some samples from the gaze
and eye_states
streams and a saccade from the saccades
events.
[10]:
import matplotlib.pyplot as plt
import seaborn as sns
gaze_color = "royalblue"
gyro_color = "darkorange"
imu = recording.imu
saccades = recording.saccades
# Create a figure
fig, ax = plt.subplots(figsize=(10, 5))
ax2 = ax.twinx()
ax.yaxis.label.set_color(gaze_color)
ax2.yaxis.label.set_color(gyro_color)
# Visualize the 2nd saccade
saccade = saccades.data.iloc[1]
ax.axvspan(
saccade["start timestamp [ns]"], saccade["end timestamp [ns]"], color="lightgray"
)
ax.text(
(saccade["start timestamp [ns]"] + saccade["end timestamp [ns]"]) / 2,
1050,
"Saccade",
horizontalalignment="center",
)
# Visualize gaze x and pupil diameter left
sns.scatterplot(
ax=ax,
data=gaze.data.head(100),
x="timestamp [ns]",
y="gaze x [px]",
color=gaze_color,
)
sns.scatterplot(
ax=ax2,
data=imu.data.head(60),
x="timestamp [ns]",
y="gyro x [deg/s]",
color=gyro_color,
)
[10]:
<Axes: xlabel='timestamp [ns]', ylabel='gyro x [deg/s]'>
It’s apparent that at the beginning of the recording, there are some missing data points in both the gaze
and imu
streams. This is presumably due to the time it takes for the sensors to start up and stabilize. We will show how to handle missing data using resampling in the next tutorial. For now, it’s important to be aware of these gaps and that it will require great caution to assume the data is continuously and equally sampled.
PyNeon also calculates the effective (as opposed to the nominal) sampling frequency of each stream by dividing the number of samples by the duration of the recording.
[11]:
print(
f"Gaze: nominal sampling frequency = {gaze.sampling_freq_nominal}, "
f"effective sampling frequency = {gaze.sampling_freq_effective}"
)
print(
f"IMU: nominal sampling frequency = {recording.imu.sampling_freq_nominal}, "
f"effective sampling frequency = {recording.imu.sampling_freq_effective}"
)
Gaze: nominal sampling frequency = 200, effective sampling frequency = 197.8078038925275
IMU: nominal sampling frequency = 110, effective sampling frequency = 115.35532450871617
Visualizing gaze heatmap#
Finally, we will show how to plot a heatmap of the gaze/fixation data.
[12]:
fig, ax = recording.plot_distribution()
C:\Users\qian.chu\Documents\GitHub\pyneon\pyneon\recording.py:273: UserWarning: Scene video not loaded because no video or video timestamps file was found.
warnings.warn(
we can neatly see that the recorded data shows a centre-bias, which is a well-known effect from eye statistics. In y, we can see that fixations tend to occur below the horizon, which is indicative of a walking task where a participant looks at the floor in front of them more often