Preprocessing module#

pyneon.preprocess.interpolate(new_ts: ndarray, data: DataFrame, float_kind: str = 'linear', other_kind: str = 'nearest') DataFrame#

Interpolate a data stream to a new set of timestamps.

Parameters:
  • new_ts (np.ndarray) – An array of new timestamps (in nanoseconds) at which to evaluate the interpolant.

  • data (pd.DataFrame) – Data to interpolate. Must have a monotonically increasing index named timestamp [ns].

  • float_kind (str, optional) – Kind of interpolation applied on columns of float type, by default "linear". For details see scipy.interpolate.interp1d.

  • other_kind (str, optional) – Kind of interpolation applied on columns of other types, by default "nearest". For details see scipy.interpolate.interp1d.

Returns:

Interpolated data.

Return type:

pandas.DataFrame

pyneon.preprocess.window_average(new_ts: ndarray, data: DataFrame, window_size: int | None = None) DataFrame#

Take the average over a time window to obtain smoothed data at new timestamps.

Parameters:
  • new_ts (np.ndarray) – An array of new timestamps (in nanoseconds) at which to compute the windowed averages. The median interval between these new timestamps must be larger than the median interval between the original data timestamps, i.e., np.median(np.diff(new_ts)) > np.median(np.diff(data.index)). In other words, only downsampling is supported.

  • data (pd.DataFrame) – Data to apply window average to. Must have a monotonically increasing index named timestamp [ns].

  • window_size (int, optional) – The size of the time window (in nanoseconds) over which to compute the average around each new timestamp. If None (default), the window size is set to the median interval between the new timestamps, i.e., np.median(np.diff(new_ts)). The window size must be larger than the median interval between the original data timestamps, i.e., window_size > np.median(np.diff(data.index)).

Returns:

Data with window average applied.

Return type:

pd.DataFrame

pyneon.preprocess.concat_streams(rec: NeonRecording, stream_names: str | list[str] = 'all', sampling_freq: Number | str = 'min', interp_float_kind: str = 'linear', interp_other_kind: str = 'nearest', inplace: bool = False) DataFrame#

Concatenate data from different streams under common timestamps. Since the streams may have different timestamps and sampling frequencies, interpolation of all streams to a set of common timestamps is performed. The latest start timestamp and earliest last timestamp of the selected streams are used to define the common timestamps.

Parameters:
  • rec (NeonRecording) – NeonRecording object containing the streams to concatenate.

  • stream_names (str or list of str) – Stream names to concatenate. If “all”, then all streams will be used. If a list, items must be in {"gaze", "imu", "eye_states"} ("3d_eye_states") is also tolerated as an alias for "eye_states").

  • sampling_freq (float or int or str, optional) – Sampling frequency of the concatenated streams. If numeric, the streams will be interpolated to this frequency. If "min" (default), the lowest nominal sampling frequency of the selected streams will be used. If "max", the highest nominal sampling frequency will be used.

  • interp_float_kind (str, optional) – Kind of interpolation applied on columns of float type, Defaults to "linear". For details see scipy.interpolate.interp1d.

  • interp_other_kind (str, optional) – Kind of interpolation applied on columns of other types. Defaults to "nearest".

  • inplace (bool, optional) – Replace selected stream data with interpolated data during concatenation if``True``. Defaults to False.

Returns:

concat_data – Concatenated data.

Return type:

pandas.DataFrame

pyneon.preprocess.concat_events(rec: NeonRecording, event_names: str | list[str]) DataFrame#

Concatenate different events. All columns in the selected event type will be present in the final DataFrame. An additional type column denotes the event type. If "events" is in event_names, its timestamp [ns] column will be renamed to start timestamp [ns], and the name and type columns will be renamed to message name and message type respectively to prevent confusion between physiological events and user-supplied messages.

Parameters:
  • rec (NeonRecording) – NeonRecording object containing the events to concatenate.

  • event_names (list of str) – List of event names to concatenate. Event names must be in {"blinks", "fixations", "saccades", "events"} (singular forms are tolerated).

Returns:

concat_events – Concatenated events.

Return type:

pandas.DataFrame

pyneon.preprocess.create_epoch(data: DataFrame, times_df: DataFrame | None = None, t_refs: list | ndarray | None = None, t_before: ndarray | float | None = None, t_after: ndarray | float | None = None, description: ndarray | str | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#

Create epochs in the data streams based on the input epochs DataFrame or provided times.

Parameters:
  • data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.

  • times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, other time-related parameters are ignored.

  • t_refs (list or np.ndarray, optional) – List or array of reference times for the epochs. Units specified by time_unit.

  • t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.

  • t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.

  • description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_refs. Units specified by time_unit. Default is 0.

  • time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.

Returns:

  • epochs (pd.DataFrame) – DataFrame where each row corresponds to an epoch, containing the data belonging to the epoch as a nested DataFrame. Columns include: - ‘epoch id’: Unique identifier for the epoch. - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. - ‘epoch data’: DataFrame containing the data within the epoch.

  • annotated_data (pd.DataFrame) – Original data with added columns: - ‘epoch id’: Identifier of the epoch to which the data point belongs. - ‘t_rel’: Time relative to the epoch reference time, in nanoseconds. - ‘description’: Description or label associated with the epoch.

Notes

  • If times_df is provided, it is used to create epochs, and other time-related parameters are ignored.

  • If times_df is not provided, t_refs, t_before, t_after, and description must be provided.

  • The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.

pyneon.preprocess.extract_event_times(event_data: DataFrame, t_before: float, t_after: float, event_name: str = 'all') DataFrame#

Extract event times from the event data DataFrame.

Parameters:
  • event_data (pd.DataFrame) – DataFrame containing the event data.

  • t_before (float) – Time before the event to start the epoch, in seconds.

  • t_after (float) – Time after the event to end the epoch, in seconds.

  • event_name (str, optional) – Name of the event to extract times for. Default is ‘all’.

Returns:

event_times – DataFrame containing the extracted event times with the following columns: - ‘t_ref’: Reference time of the event, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the event.

Return type:

pd.DataFrame

pyneon.preprocess.construct_event_times(t_refs: list | ndarray, t_before: ndarray | float | None, t_after: ndarray | float | None, description: ndarray | str, global_t_ref: int | float = 0, time_unit: str = 'ns') DataFrame#

Construct event times from a list or array of reference times.

Parameters:
  • t_refs (list or np.ndarray) – List or array of reference times.

  • t_before (float or np.ndarray) – Time before the reference time to start the epoch, in seconds.

  • t_after (float or np.ndarray) – Time after the reference time to end the epoch, in seconds.

  • description (str or np.ndarray) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time, by default None.

  • time_unit (str, optional) – Unit of time for the reference times (‘ns’ or ‘s’), by default “ns”.

Returns:

event_times – DataFrame containing the constructed event times.

Return type:

pd.DataFrame

class pyneon.preprocess.Epoch(data: DataFrame, times_df: DataFrame | None = None, t_ref: ndarray | None = None, t_before: ndarray | Number | None = None, t_after: ndarray | Number | None = None, description: ndarray | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#

Bases: object

Class to create and manage epochs in the data streams.

Parameters:
  • data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.

  • times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, t_ref, t_before, t_after, description, global_t_ref, and time_unit are ignored.

  • t_ref (np.ndarray or list, optional) – Array or list of reference times for the epochs. Units specified by time_unit.

  • t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.

  • t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.

  • description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.

  • global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_ref. Units specified by time_unit. Default is 0.

  • time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.

Notes

  • If times_df is provided, it is used to create epochs, and the other time-related parameters are ignored.

  • If times_df is not provided, t_ref, t_before, t_after, and description must be provided.

  • The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.

to_numpy(sampling_rate=100, columns=None)#

Converts epochs into a NumPy array with dimensions (n_epochs, n_times, n_channels). Resamples epochs to a fixed sampling rate.

Parameters:
  • sampling_rate (int) – The sampling rate to resample the data to, in Hz (samples per second).

  • columns (list of str, optional) – List of column names to extract from the DataFrame. If None, all columns except ‘t_rel’ are used.

Returns:

  • epochs_np (np.ndarray) – NumPy array of shape (n_epochs, n_times, n_channels).

  • info (dict) – A dictionary containing: - ‘column_ids’: List of provided column names. - ‘t_rel’: The common time grid, in nanoseconds. - ‘nan_status’: String indicating whether NaN values were found in the data.

Notes

  • The time grid (t_rel) is in nanoseconds.

  • If NaN values are present after interpolation, they are noted in nan_status.