Preprocessing module#

pyneon.preprocess.crop(data: DataFrame, tmin: Number | None = None, tmax: Number | None = None, by: Literal['timestamp', 'time'] = 'timestamp') DataFrame#

Crop data to a specific time range.

Parameters:
  • data (pd.DataFrame) – Data to crop. Must contain a monotonically increasing timestamp [ns] or time [s] column.

  • tmin (number, optional) – Start time or timestamp to crop the data to. If None, the minimum timestamp or time in the data is used. Defaults to None.

  • tmax (number, optional) – End time or timestamp to crop the data to. If None, the maximum timestamp or time in the data is used. Defaults to None.

  • by ("timestamp" or "time", optional) – Whether tmin and tmax are UTC timestamps in nanoseconds or relative times in seconds. Defaults to “timestamp”.

Returns:

Cropped data.

Return type:

pd.DataFrame

pyneon.preprocess.interpolate(new_ts: ndarray, data: DataFrame, float_kind: str = 'linear', other_kind: str = 'nearest') DataFrame#

Interpolate a data stream to a new set of timestamps.

Parameters:
  • new_ts (np.ndarray, optional) – New timestamps to evaluate the interpolant at.

  • data (pd.DataFrame) – Data to interpolate. Must contain a monotonically increasing timestamp [ns] column.

  • float_kind (str, optional) – Kind of interpolation applied on columns of float type, by default “linear”. For details see scipy.interpolate.interp1d.

  • other_kind (str, optional) – Kind of interpolation applied on columns of other types, by default “nearest”.

Returns:

Interpolated data.

Return type:

pandas.DataFrame

pyneon.preprocess.concat_streams(rec: NeonRecording, stream_names: str | list[str] = 'all', sampling_freq: Number | str = 'min', interp_float_kind: str = 'linear', interp_other_kind: str = 'nearest', inplace: bool = False) DataFrame#

Concatenate data from different streams under common timestamps. Since the streams may have different timestamps and sampling frequencies, interpolation of all streams to a set of common timestamps is performed. The latest start timestamp and earliest last timestamp of the selected streams are used to define the common timestamps.

Parameters:
  • rec (NeonRecording) – NeonRecording object containing the streams to concatenate.

  • stream_names (str or list of str) – Stream names to concatenate. If “all”, then all streams will be used. If a list, items must be in {"gaze", "imu", "eye_states"} ("3d_eye_states") is also tolerated as an alias for "eye_states").

  • sampling_freq (float or int or str, optional) – Sampling frequency of the concatenated streams. If numeric, the streams will be interpolated to this frequency. If "min", the lowest nominal sampling frequency of the selected streams will be used. If "max", the highest nominal sampling frequency will be used. Defaults to "min".

  • interp_float_kind (str, optional) – Kind of interpolation applied on columns of float type, Defaults to "linear". For details see scipy.interpolate.interp1d.

  • interp_other_kind (str, optional) – Kind of interpolation applied on columns of other types. Defaults to "nearest".

  • inplace (bool, optional) – Replace selected stream data with interpolated data during concatenation if``True``. Defaults to False.

Returns:

concat_data – Concatenated data.

Return type:

pandas.DataFrame

pyneon.preprocess.concat_events(rec: NeonRecording, event_names: str | list[str]) DataFrame#

Concatenate different events. All columns in the selected event type will be present in the final DataFrame. An additional type column denotes the event type. If "events" is in event_names, its timestamp [ns] column will be renamed to start timestamp [ns], and the name and type columns will be renamed to message name and message type respectively to provide a more readable output.

Parameters:
  • rec (NeonRecording) – NeonRecording object containing the events to concatenate.

  • event_names (list of str) – List of event names to concatenate. Event names must be in {"blinks", "fixations", "saccades", "events"} (singular forms are tolerated).

Returns:

concat_events – Concatenated events.

Return type:

pandas.DataFrame

pyneon.preprocess.window_average(new_ts: ndarray, data: DataFrame, window_size: int | None = None) DataFrame#

Take the average over a time window to obtain smoothed data at new timestamps.

Parameters:
  • new_ts (np.ndarray) – New timestamps to evaluate the window average at. The median of the differences between the new timestamps must be larger than the median of the differences between the old timestamps. In other words, only downsampling is supported.

  • data (pd.DataFrame) – Data to apply window average to. Must contain a monotonically increasing timestamp [ns] column.

  • window_size (int, optional) – Size of the time window in nanoseconds. If None, the window size is set to the median of the differences between the new timestamps. Defaults to None.

Returns:

Data with window average applied.

Return type:

pd.DataFrame

pyneon.preprocess.map_gaze_to_video(rec: NeonRecording) DataFrame#

Map gaze data to video frames.

Parameters:#

recNeonRecording

Recording object containing gaze and video data.

resamp_float_kindstr

Interpolation method for float columns.

resamp_other_kindstr

Interpolation method for non-float columns.

pyneon.preprocess.estimate_scanpath(rec: NeonRecording, lk_params: None | dict = None) DataFrame#

Map fixations to video frames.

Parameters:
  • rec (NeonRecording) – Recording object containing gaze and video data.

  • lk_params (dict) – Parameters for the Lucas-Kanade optical flow algorithm.

pyneon.preprocess.overlay_scanpath_on_video(rec: NeonRecording, video_output_path: Path | str = 'sacnpath_overlay_video.mp4', circle_radius: int = 10, show_lines: bool = True, line_thickness: int = 2, show_video: bool = False, max_fixations: int = 10) None#

Overlay fixations and gaze data on video frames and save the resulting video.

Parameters:
  • rec (NeonRecording) – Recording object containing gaze and video data.

  • video_output_path (str) – Path where the video with fixations will be saved.

  • circle_radius (int) – Radius of the circle used to represent fixations.

  • line_thickness (int) – Thickness of the lines connecting successive fixations.

  • show_video (bool) – Flag to display the video with fixations overlaid in

pyneon.preprocess.create_epoch(data: DataFrame, times_df: DataFrame | None = None, t_refs: list | ndarray | None = None, t_before: ndarray | float | None = None, t_after: ndarray | float | None = None, description: ndarray | str | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#

Create epochs in the data streams based on the input epochs DataFrame or provided times.

Parameters:
  • data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.

  • times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, other time-related parameters are ignored.

  • t_refs (list or np.ndarray, optional) – List or array of reference times for the epochs. Units specified by time_unit.

  • t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.

  • t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.

  • description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_refs. Units specified by time_unit. Default is 0.

  • time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.

Returns:

  • epochs (pd.DataFrame) – DataFrame where each row corresponds to an epoch, containing the data belonging to the epoch as a nested DataFrame. Columns include: - ‘epoch id’: Unique identifier for the epoch. - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. - ‘epoch data’: DataFrame containing the data within the epoch.

  • annotated_data (pd.DataFrame) – Original data with added columns: - ‘epoch id’: Identifier of the epoch to which the data point belongs. - ‘t_rel’: Time relative to the epoch reference time, in nanoseconds. - ‘description’: Description or label associated with the epoch.

Notes

  • If times_df is provided, it is used to create epochs, and other time-related parameters are ignored.

  • If times_df is not provided, t_refs, t_before, t_after, and description must be provided.

  • The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.

pyneon.preprocess.extract_event_times(event_data: DataFrame, t_before: float, t_after: float, event_name: str = 'all') DataFrame#

Construct event times from a list or array of reference times.

Parameters:
  • t_refs (list or np.ndarray) – List or array of reference times. Units specified by time_unit.

  • t_before (float, np.ndarray, or list) – Time before the reference time to start the epoch, in seconds.

  • t_after (float, np.ndarray, or list) – Time after the reference time to end the epoch, in seconds.

  • description (str, np.ndarray, or list) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_refs. Units specified by time_unit. Default is 0.

  • time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.

Returns:

event_times – DataFrame containing the constructed event times with columns: - ‘t_ref’: Reference time of the event, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the event.

Return type:

pd.DataFrame

Notes

  • The t_refs and global_t_ref are combined and converted to nanoseconds according to time_unit.

  • The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.

pyneon.preprocess.construct_event_times(t_refs: list | ndarray, t_before: ndarray | float | None, t_after: ndarray | float | None, description: ndarray | str, global_t_ref: int | float = 0, time_unit: str = 'ns') DataFrame#

Construct event times from a list or array of reference times.

Parameters:
  • t_refs (list or np.ndarray) – List or array of reference times.

  • t_before (float or np.ndarray) – Time before the reference time to start the epoch, in seconds.

  • t_after (float or np.ndarray) – Time after the reference time to end the epoch, in seconds.

  • description (str or np.ndarray) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time, by default None.

  • time_unit (str, optional) – Unit of time for the reference times (‘ns’ or ‘s’), by default “ns”.

Returns:

event_times – DataFrame containing the constructed event times.

Return type:

pd.DataFrame

class pyneon.preprocess.Epoch(data: DataFrame, times_df: DataFrame | None = None, t_ref: ndarray | None = None, t_before: ndarray | Number | None = None, t_after: ndarray | Number | None = None, description: ndarray | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#

Bases: object

Class to create and manage epochs in the data streams.

Parameters:
  • data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.

  • times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, t_ref, t_before, t_after, description, global_t_ref, and time_unit are ignored.

  • t_ref (np.ndarray or list, optional) – Array or list of reference times for the epochs. Units specified by time_unit.

  • t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.

  • t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.

  • description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_ref. Units specified by time_unit. Default is 0.

  • time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.

Notes

  • If times_df is provided, it is used to create epochs, and the other time-related parameters are ignored.

  • If times_df is not provided, t_ref, t_before, t_after, and description must be provided.

  • The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.

to_numpy(sampling_rate=100, columns=None)#

Converts epochs into a NumPy array with dimensions (n_epochs, n_times, n_channels). Resamples epochs to a fixed sampling rate.

Parameters:
  • sampling_rate (int) – The sampling rate to resample the data to, in Hz (samples per second).

  • columns (list of str, optional) – List of column names to extract from the DataFrame. If None, all columns except ‘t_rel’ are used.

Returns:

  • epochs_np (np.ndarray) – NumPy array of shape (n_epochs, n_times, n_channels).

  • info (dict) – A dictionary containing: - ‘column_ids’: List of provided column names. - ‘t_rel’: The common time grid, in nanoseconds. - ‘nan_status’: String indicating whether NaN values were found in the data.

Notes

  • The time grid (t_rel) is in nanoseconds.

  • If NaN values are present after interpolation, they are noted in nan_status.