Preprocessing module#
- pyneon.preprocess.interpolate(new_ts: ndarray, data: DataFrame, float_kind: str = 'linear', other_kind: str = 'nearest') DataFrame #
Interpolate a data stream to a new set of timestamps.
- Parameters:
new_ts (np.ndarray) – An array of new timestamps (in nanoseconds) at which to evaluate the interpolant.
data (pd.DataFrame) – Data to interpolate. Must have a monotonically increasing index named
timestamp [ns]
.float_kind (str, optional) – Kind of interpolation applied on columns of float type, by default
"linear"
. For details seescipy.interpolate.interp1d
.other_kind (str, optional) – Kind of interpolation applied on columns of other types, by default
"nearest"
. For details seescipy.interpolate.interp1d
.
- Returns:
Interpolated data.
- Return type:
- pyneon.preprocess.window_average(new_ts: ndarray, data: DataFrame, window_size: int | None = None) DataFrame #
Take the average over a time window to obtain smoothed data at new timestamps.
- Parameters:
new_ts (np.ndarray) – An array of new timestamps (in nanoseconds) at which to compute the windowed averages. The median interval between these new timestamps must be larger than the median interval between the original data timestamps, i.e.,
np.median(np.diff(new_ts)) > np.median(np.diff(data.index))
. In other words, only downsampling is supported.data (pd.DataFrame) – Data to apply window average to. Must have a monotonically increasing index named
timestamp [ns]
.window_size (int, optional) – The size of the time window (in nanoseconds) over which to compute the average around each new timestamp. If
None
(default), the window size is set to the median interval between the new timestamps, i.e.,np.median(np.diff(new_ts))
. The window size must be larger than the median interval between the original data timestamps, i.e.,window_size > np.median(np.diff(data.index))
.
- Returns:
Data with window average applied.
- Return type:
pd.DataFrame
- pyneon.preprocess.concat_streams(rec: NeonRecording, stream_names: str | list[str] = 'all', sampling_freq: Number | str = 'min', interp_float_kind: str = 'linear', interp_other_kind: str = 'nearest', inplace: bool = False) DataFrame #
Concatenate data from different streams under common timestamps. Since the streams may have different timestamps and sampling frequencies, interpolation of all streams to a set of common timestamps is performed. The latest start timestamp and earliest last timestamp of the selected streams are used to define the common timestamps.
- Parameters:
rec (
NeonRecording
) – NeonRecording object containing the streams to concatenate.stream_names (str or list of str) – Stream names to concatenate. If “all”, then all streams will be used. If a list, items must be in
{"gaze", "imu", "eye_states"}
("3d_eye_states"
) is also tolerated as an alias for"eye_states"
).sampling_freq (float or int or str, optional) – Sampling frequency of the concatenated streams. If numeric, the streams will be interpolated to this frequency. If
"min"
(default), the lowest nominal sampling frequency of the selected streams will be used. If"max"
, the highest nominal sampling frequency will be used.interp_float_kind (str, optional) – Kind of interpolation applied on columns of float type, Defaults to
"linear"
. For details seescipy.interpolate.interp1d
.interp_other_kind (str, optional) – Kind of interpolation applied on columns of other types. Defaults to
"nearest"
.inplace (bool, optional) – Replace selected stream data with interpolated data during concatenation if``True``. Defaults to
False
.
- Returns:
concat_data – Concatenated data.
- Return type:
- pyneon.preprocess.concat_events(rec: NeonRecording, event_names: str | list[str]) DataFrame #
Concatenate different events. All columns in the selected event type will be present in the final DataFrame. An additional
type
column denotes the event type. If"events"
is inevent_names
, itstimestamp [ns]
column will be renamed tostart timestamp [ns]
, and thename
andtype
columns will be renamed tomessage name
andmessage type
respectively to prevent confusion between physiological events and user-supplied messages.- Parameters:
- Returns:
concat_events – Concatenated events.
- Return type:
- pyneon.preprocess.create_epoch(data: DataFrame, times_df: DataFrame | None = None, t_refs: list | ndarray | None = None, t_before: ndarray | float | None = None, t_after: ndarray | float | None = None, description: ndarray | str | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#
Create epochs in the data streams based on the input epochs DataFrame or provided times.
- Parameters:
data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.
times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, other time-related parameters are ignored.
t_refs (list or np.ndarray, optional) – List or array of reference times for the epochs. Units specified by time_unit.
t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.
t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.
description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.
global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_refs. Units specified by time_unit. Default is 0.
time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.
- Returns:
epochs (pd.DataFrame) – DataFrame where each row corresponds to an epoch, containing the data belonging to the epoch as a nested DataFrame. Columns include: - ‘epoch id’: Unique identifier for the epoch. - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. - ‘epoch data’: DataFrame containing the data within the epoch.
annotated_data (pd.DataFrame) – Original data with added columns: - ‘epoch id’: Identifier of the epoch to which the data point belongs. - ‘t_rel’: Time relative to the epoch reference time, in nanoseconds. - ‘description’: Description or label associated with the epoch.
Notes
If times_df is provided, it is used to create epochs, and other time-related parameters are ignored.
If times_df is not provided, t_refs, t_before, t_after, and description must be provided.
The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.
- pyneon.preprocess.extract_event_times(event_data: DataFrame, t_before: float, t_after: float, event_name: str = 'all') DataFrame #
Extract event times from the event data DataFrame.
- Parameters:
- Returns:
event_times – DataFrame containing the extracted event times with the following columns: - ‘t_ref’: Reference time of the event, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the event.
- Return type:
pd.DataFrame
- pyneon.preprocess.construct_event_times(t_refs: list | ndarray, t_before: ndarray | float | None, t_after: ndarray | float | None, description: ndarray | str, global_t_ref: int | float = 0, time_unit: str = 'ns') DataFrame #
Construct event times from a list or array of reference times.
- Parameters:
t_refs (list or np.ndarray) – List or array of reference times.
t_before (float or np.ndarray) – Time before the reference time to start the epoch, in seconds.
t_after (float or np.ndarray) – Time after the reference time to end the epoch, in seconds.
description (str or np.ndarray) – Description or label associated with the epoch.
global_t_ref (int or float, optional) – Global reference time to be added to each reference time, by default None.
time_unit (str, optional) – Unit of time for the reference times (‘ns’ or ‘s’), by default “ns”.
- Returns:
event_times – DataFrame containing the constructed event times.
- Return type:
pd.DataFrame
- class pyneon.preprocess.Epoch(data: DataFrame, times_df: DataFrame | None = None, t_ref: ndarray | None = None, t_before: ndarray | Number | None = None, t_after: ndarray | Number | None = None, description: ndarray | None = None, global_t_ref: int | float = 0, time_unit: str = 'ns')#
Bases:
object
Class to create and manage epochs in the data streams.
- Parameters:
data (pd.DataFrame) – Data stream to create epochs from. Must contain a ‘timestamp [ns]’ or ‘start timestamp [ns]’ column.
times_df (pd.DataFrame, optional) – DataFrame containing epoch information with the following columns: - ‘t_ref’: Reference time of the epoch, in nanoseconds. - ‘t_before’: Time before the reference time to start the epoch, in nanoseconds. - ‘t_after’: Time after the reference time to end the epoch, in nanoseconds. - ‘description’: Description or label associated with the epoch. If provided, t_ref, t_before, t_after, description, global_t_ref, and time_unit are ignored.
t_ref (np.ndarray or list, optional) – Array or list of reference times for the epochs. Units specified by time_unit.
t_before (float, np.ndarray, or list, optional) – Time before the reference time to start the epoch, in seconds.
t_after (float, np.ndarray, or list, optional) – Time after the reference time to end the epoch, in seconds.
description (str, np.ndarray, or list, optional) – Description or label associated with the epoch.
global_t_ref (int or float, optional) – Global reference time to be added to each reference time in t_ref. Units specified by time_unit. Default is 0.
time_unit (str, optional) – Unit of time for the reference times and global_t_ref (‘ns’ for nanoseconds or ‘s’ for seconds). Default is ‘ns’.
Notes
If times_df is provided, it is used to create epochs, and the other time-related parameters are ignored.
If times_df is not provided, t_ref, t_before, t_after, and description must be provided.
The t_before and t_after parameters are always expected in seconds and will be converted to nanoseconds internally.
- to_numpy(sampling_rate=100, columns=None)#
Converts epochs into a NumPy array with dimensions (n_epochs, n_times, n_channels). Resamples epochs to a fixed sampling rate.
- Parameters:
- Returns:
epochs_np (np.ndarray) – NumPy array of shape (n_epochs, n_times, n_channels).
info (dict) – A dictionary containing: - ‘column_ids’: List of provided column names. - ‘t_rel’: The common time grid, in nanoseconds. - ‘nan_status’: String indicating whether NaN values were found in the data.
Notes
The time grid (t_rel) is in nanoseconds.
If NaN values are present after interpolation, they are noted in nan_status.