Stratified Multivariate Multiscale Dispersion Entropy for Physiological Signal Analysis

Multivariate entropy quantification algorithms are becoming a prominent tool for the extraction of information from multi-channel physiological time-series. However, in the analysis of physiological signals from heterogeneous organ systems, certain channels may overshadow the patterns of others, resulting in information loss. Here, we introduce the framework of Stratified Entropy to prioritize each channels' dynamics based on their allocation to respective strata, leading to a richer description of the multi-channel time-series. As an implementation of the framework, three algorithmic variations of the Stratified Multivariate Multiscale Dispersion Entropy are introduced. These variations and the original algorithm are applied to synthetic time-series, waveform physiological time-series, and derivative physiological data. Based on the synthetic time-series experiments, the variations successfully prioritize channels following their strata allocation while maintaining the low computation time of the original algorithm. In experiments on waveform physiological time-series and derivative physiological data, increased discrimination capacity was noted for multiple strata allocations in the variations when benchmarked to the original algorithm. This suggests improved physiological state monitoring by the variations. Furthermore, our variations can be modified to utilize a priori knowledge for the stratification of channels. Thus, our research provides a novel approach for the extraction of previously inaccessible information from multi-channel time series acquired from heterogeneous systems.


I. INTRODUCTION
I NCREASED amounts of physiological data are becoming available due to the advances in physiological recording technology across a range of applications from wearable devices to clinical environments [1]- [3]. The analysis of these data can contribute to effective prognosis, early stage intervention, personalised treatments, and improved clinical decision making. However, for the successful development of algorithms capable of extracting viable information, certain characteristics of the data have to be considered. These include their multivariate nature due to the interaction of multiple organ systems in human physiology [4]- [8], the potential nonlinear nature of their dynamics [9]- [14], and the low dataquality arising from the recording conditions [15]- [17].
Entropy quantification algorithms are becoming a prominent tool for the measurement of dynamics from uni-and multichannel time-series [18]. These algorithms can be broadly characterised into those based on Shannon Entropy [19] or on Conditional Entropy defined as the quantity of information observed in a sample at a time-point n that cannot be explained based on previous samples up to time point n − 1 [20]. They have been successful in a variety of applications such as the monitoring of machine operation [21], [22] and the analysis of financial time-series [23], [24].
The quantification of entropy -as a measure of physiological signals' complexity -is of direct interest for the monitoring of a system's physiological states, particularly when considering the Critical Slow Down (CSD) and Loss of Complexity (LoC) paradigms as well as their combination within the scope of the "entropy pump" (EP) hypothesis. The CSD paradigm considers that during frail or pathological states, a slowing down is observed in the capacity of the system to recover from external stressors resulting in increased output complexity for certain regulatory variables [25]- [27]. The LoC paradigm suggests that when the equilibrium of a system is disrupted, multiple processes that displayed multiscale complexity produce output measurements of reduced complexity indicating a loss in the system's flexibility and capacity to adapt in the presence of external stressors [10], [28]. These seemingly opposite paradigms are combined in the EP hypothesis that separates physiological parameters in regulated and effector variables. Based on this hypothesis, an "entropy pump" is observed, thanks to which homeostasis is achieved, by maintaining a stable, low complexity output for regulated variables through the complex and variable outputs of effector variables [8], [29]. A pathological state is observed when its direction is disrupted and an increase in the complexity of regulated variables is observed as per the CSD paradigm, while a decrease in the complexity of the effector variables is observed in accordance with the LoC paradigm.
Entropy has been extensively used to analyse physiological signals. Examples include algorithms based on Shannon Entropy such as Permutation Entropy (PEn) [30] and Dispersion Entropy (DisEn) [31], [32] to analyse electoencephalogram (EEG) signals to track the state of consciousness of patients under the effect of anaesthetic drugs [33], and to analyse blood pressure signals to quantify the effect of aging in the reduction of the recorded signal's variability [31], respectively. Algorithms based on Conditional Entropy have also been utilized, such as Approximate Entropy (ApEn) [34] for the investigation of abnormalities in respiratory function caused by panic disorders [35], Sample Entropy (SampEn) [36] for the analysis of neonatal heart rate variability to diagnose sepsis [37], and Fuzzy Entropy (FuzzyEn) on surface electromyography (EMG) signals for the detection of motion [38].
For the effective analysis of physiological dynamics, multichannel time-series have to be analyzed both in a univariate and a multivariate manner. This is a necessary step to ensure that, cross-channel dynamics can be quantified to allow the study of dynamics developed across different components of the same organ system as well as across distinct systems [4]- [8]. For this reason, recent research has focused on producing multivariate variations of entropy algorithms to extract features from two or more channels: DisEn [39], PEn [40], ApEn, SampEn [41], and FuzzyEn [42].
However, while multivariate algorithms can extract an output feature from a multi-channel time-series, the approach is limited with regards to the total information retrieved. The dynamics of certain input channels may overshadow those of others due to the potentially different dominant frequencies amongst the physiological signals of each channel. This becomes apparent when multi-channel time-series are comprised of signals that arise from heterogeneous organ systems such as the combination of electrocardiograms (ECG) [43], EEG [44], arterial blood pressure (BP) [45], and nasal respiratory (RESP) signals [46], [47], whose dominant frequencies and temporal structures display clear differences.
As a step towards addressing this challenge, recent studies have suggested non-uniform multiscale embedding between the input channels. This approach aims to find the optimal combination of scales for the analysis of the multi-channel time-series so that each channel is analyzed at the scale where most of its dynamics would arise, by modifying the time-delay [48] or the embedding dimension used for each channel [49]. While this approach offers an interesting and modular configuration of analysis, it faces challenges that limit its applicability. These are the potential mismatch of each channel's data length with the optimal scale values, the limitation of multiscale analysis to specific scales for each channel resulting in an incomplete multiscale output, the instability of the method for increased number of channels, and the potential for overshadowing to occur even at optimal scale combinations.
A different approach for the analysis of interdependencies within a group of multi-channel time-series arises from the utilization of Cross-Entropy algorithms, developed for ApEn, SampEn [36], FuzzyEn [50], and PEn [51]. With them, an entropy based feature quantifies the coupling between two channels. The variations of SampEn and FuzzyEn are nondirectional, while the variations of ApEn and PEn are directional. In the latter cases, one of the two channels acts as a "designated" channel in the measurement. Thus, the potential overshadowing of each channels' dynamics could be avoided since each channel has the opportunity to be designated. However, this approach is limited to bivariate measurements between two channels. Therefore, it cannot capture higherorder dynamics arising jointly from three or more channels. A second limitation is that, by definition, it measures the coupling between the two channels and is not a measurement of their combined dynamics.
In this study, we propose the framework of Stratified-Entropy to combine positive elements of both the Multivariate and Cross-Entropy algorithms to increase the information that can be extracted from a set of multi-channel time-series by allowing each channel's dynamics to have a different level of prioritization during the quantification of the output entropy value based on its allocation to a respective stratum. Namely, the main contributions of the presented work are: • The introduction of the Stratified-Entropy framework as a new form of multivariate and multiscale analysis that increases the amount of information extracted from a multichannel time-series via entropy quantification algorithms. • The implementation of the Stratified-Entropy framework through the introduction of three novel algorithms of Stratifed Multivariate Multiscale Dispersion Entropy (SmvMDE) that prioritize channels during the calculation of the output entropy value based on their allocation to hierarchical strata. • The analysis and benchmarking of the SmvMDE algorithms through experiments applied to synthetic timeseries, waveform physiological time-series, and derivative physiological data.

A. Stratified Entropy Framework
Within the framework of Stratified Entropy, strata are defined with a clear hierarchy of prioritization. The number of strata can vary based on the implementation of the framework. Each channel is allocated to one of the available strata and every channel has a weighted contribution in the calculation of the output entropy feature based on their allocated stratum.
For the purposes of this study, all three variations have been designed based on a two strata configuration, a core stratum (prioritized) and a periphery stratum. The potential extension to configurations with higher numbers of strata is discussed in Subsection III-E.3. The variations differ in how the dynamics of channels allocated to the core stratum are prioritised over the periphery channels.
The following subsections start with a description of the original mvMDE algorithm, continue with the introduction of the SmvMDE variations and the changes they introduce to mvMDE, and describe the experiments conducted to analyse and benchmark their operation.

B. Multivariate Multiscale Dispersion Entropy
DisEn arises from the integration of Shannon Entropy with symbolic dynamics, aiming to quantify the degree of irregularity in an input time-series segment. It achieves good discrimination capacity between different types of physiological activity while maintaining a low computational time [31], [32]. The mvMDE algorithm allows the multivariate quantification of DisEn from multi-channel time-series taking into consideration both temporal and spatial dynamics, across multiple time scales [39].

1) Coarse-Graining Process for Multiscale Implementation:
For the successful quantification of a time-series' complexity across multiple time scales, a number of coarse graining procedures have been suggested. These include the widely used moving average approach [52]- [54], low-pass Butterworth filtering [54], [55], and empirical mode decomposition [55]. This study builds upon the original algorithmic implementation of mvMDE and therefore utilizes the moving average coarse graining approach for simplicity [39], although other alternatives provide better frequency responses. Based on this approach, in a set of p-channel time-series Y = {y k,b } b=1,2,··· ,N k=1,2,··· ,p , each channel is processed separately and divided into non-overlapping segments of length equal to the defined time scale factor, τ . For each segment, an average value is calculated and used to derive the coarse-grained multichannel time-series as follows: (1) where L is the original channel length and N the resulting coarse-grained channel length.
2) Application of Mapping Function: For the implementation of mvMDE, a recommended step is the application of a non-linear mapping function to each channel, such as the normal cumulative distribution function (NCDF) [39]. The selection of a non-linear over a linear mapping function seeks to ensure that maximum and minimum amplitude values, that can be significantly larger or smaller than the mean value of the channel, do not disrupt the allocation of samples to classes by forcing the majority of samples to be assigned to a small number of classes [31], [32], [56]. For multiscale implementations using NCDF, the mean and standard deviation of the original non coarse-grained time-series are used and remain constant for the mapping process across all temporal scale factors. This ensures that the mapping based on the NCDF remains fixed and is not affected by the averaging taking place during the coarse graining process [39].
3) Algorithm for mvMDE: For a set of p-channel time-series For each of the c m unique dispersion patterns, their relative frequency is calculated as follows, with # being the symbol that denotes the cardinality of the set: (2) 5) Calculation of Multivariate Dispersion Entropy: Utilizing the relative frequencies of the dispersion patterns considering both temporal and spatial domains as above, the output entropy value for X is calculated based on Shannon's entropy and is normalized in the range of 0 to 1 by dividing with ln c m :

C. Stratified-Dispersion Entropy Variations
Building on the original mvMDE, we introduce three variations of SmvMDE as implementations of the Stratified Entropy Framework. With their two strata configuration, the SmvMDE variations separate the channels in two sets. The set of one or more designated channels, which are allocated to the "core" stratum, and the set of secondary channels which are allocated to the "periphery" stratum.
The original mvMDE treats all embedded subvectors as equal. Instead, the SmvMDE variations prioritise subvectors that contain samples retrieved from designated channels. The Threshold (T-SmvMDE), Soft Threshold (ST-SmvMDE), and Proportional (P-SmvMDE) variations use distinct approaches for adjusting the contribution of each combination by modifying the third and fourth steps of the original mvMDE algorithm, as described in Subsection II-B.3. The code implementing the SmvMDE variations in Matlab is publicly available at: https://github.com/ EvangelosKafantaris/SmvMDE.git. 1) Threshold Variation: T-SmvMDE defines the minimum number of samples extracted from designated channels that each subvector should contain in order to be considered. This is achieved through a new input parameter: the threshold (t). The initially m·p m subvectors utilized in the case of the original mvMDE are reduced to a subset of length l t that only includes subvectors that meet or surpass the threshold of having t or more samples in the patterns of length m. As a result, for each multivariate embedded vector Z(j) only φ q (j)(q = 1, . . . l t ) subvectors are mapped to dispersion patterns. This results in the reduction of dispersion pattern instances to (N − (m − 1)d)l t . Fig. 1 displays a diagram illustrating the T-SmvMDE subvector selection process.
For each unique dispersion pattern, their relative frequency is calculated with a modified denominator to match the reduced number of dispersion patterns: 2) Soft Threshold Variation: As an intermediate algorithm between T-SmvMDE and mvMDE, ST-SmvMDE combines the t input parameter with the additional reduced weight (w) parameter to reduce the contribution of subvectors that do not meet the threshold of t, without removing them completely. The possible values of the w parameter range from a minimum value of 0, where the output value will match that of T-SmvMDE, to a maximum value of 1, where the output will match that of the original mvMDE algorithm, since no reduction of contribution will occur.
Based on t, the subvectors are split into two subsets: A primary subset with length l p whose contribution to the calculation of a dispersion pattern's frequency remains unchanged; and a secondary subset with length l s whose impact is reduced by multiplying the number of respective dispersion pattern instances with w. Consequently, for each Z(j): φ p (j)(p = 1, . . . l p ) subvectors are formulated from the primary and φ s (j)(s = 1, . . . l s ) from the secondary subset, respectively. Therefore, the maximum value of instances for a dispersion pattern becomes (N − (m − 1)d)(l p + (l s w)). As a result, for each unique dispersion pattern, their relative frequency is: The third variation, P-SmvMDE, requires no additional parameters. Instead of utilizing a threshold to filter subvectors, it allocates them in subsets based on the number of samples contained in each combination that are retrieved from designated channels and applies a proportional factor to each category. With m being the length of each subvector and h being the number of samples extracted from designated channels, this factor is defined as h m . Therefore, the values of the proportional factor range from a minimum of 0 to a maximum of 1 and the total number of subsets in which the subvectors are allocated is equal to m + 1. Consequently, for each Z(j): φ h (j)(h = 1, . . . l h ) subvectors are formulated from each subset with l h being the length of the respective subset. Hence, the maximum value of instances (α) for a dispersion pattern becomes The relative frequency of each unique dispersion pattern is calculated by counting dispersion pattern instances in subvectors of each subset multiplied by their respective ( h m ) factor, divided by the maximum value of instances:

D. Synthetic Time-Series Experiments
The SmvMDE variations and the original mvMDE are applied to synthetic time-series, to study the differences in their operation and their multiscale outputs.
1) Uncorrelated white Gaussian and 1/f noise: We use combinations of uncorrelated white Gaussian noise (WGN) and 1/f noise due to their differences in complexity and irregularity. Complexity in a time-series arises from consistent structural dynamics and therefore, when measured, is expected to follow a stable multiscale profile [52], [57]. Irregularity consists of random fluctuations that do not arise from underlying structural dynamics and is expected to have a decreasing multiscale profile. The complexity of 1/f noise is higher than WGN while the irregularity of WGN is higher than 1/f [41], [58]. Thus, multivariate combinations of WGN and 1/f timeseries have been used in previous research to test multiscale entropy quantification algorithms [59], [60]. Considering the operation of SmvMDE, the output entropy value will be affected to a larger degree by channels allocated to the core stratum over the periphery. This would not affect experimental setups 1) and 4). However, it would lead to different results for setups 2) and 3) which contain both WGN and 1/f channels based on their allocation to strata. Therefore, for SmvMDE variations, experimental setups 2) and 3) are expanded. In a first iteration, the designated channel assigned to the core is one of the WGN channels, followed by a second iteration where a 1/f channel is designated. This results in a total of six experimental setups for SmvMDE: 1) Three WGN channels.
2) Two WGN and one 1/f channels with WGN designated.
3) One WGN and two 1/f channels with WGN designated. 4) Two WGN and one 1/f channels with 1/f designated. 5) One WGN and two 1/f channels with 1/f designated. 6) Three 1/f channels.
3) Statistical Analysis: Each experimental setup is repeated 40 times independently and the respective mean and standard deviation are calculated for each τ value (1 to 20). All experimental setups are replicated for channel lengths of 15,000 and 300 samples to assess potential differences due to long versus short time-series. The parameter values used for mvMDE and SmvMDE are chosen based on the limitations introduced by the short length time-series and match those used in the original mvMDE study to allow for easy comparison between both studies [39]. They are displayed in Table I. 4) Computational Time Experiments: To ensure that SmvMDE variations maintain the low computation time properties of the original mvMDE, 2-channel, 5-channel, and 8channel time-series are formulated from uncorrelated WGN with channel lengths ranging from 1,000 up to 100,000 samples. Each experimental setup is repeated over 20 independent realizations and the average computation time is calculated and reported for the mvMDE and SmvMDE algorithms. For the implementation of SmvMDE algorithms, an arbitrary designated channel is selected. The computations are carried out using a PC with Intel(R) Core(TM) i7-8750H CPU @ 2.2 GHZ, 16 GB RAM running MATLAB R2018b. The parameter values of mvMDE and SmvMDE remain the same with the exception of τ max being reduced from 20 to 10 to be consistent with [39].

E. Waveform Physiological Time-Series Experiments
Experiments are conducted on waveform physiological time-series to study the extend to which the SmvMDE algorithms have increased discrimination capacity between physiological states. We benchmark the effect size difference of output distributions extracted using SmvMDE to those extracted using mvMDE.
1) MIT-BIH Polysomnographic Database: To access multichannel time-series formulated from high sampling rate signals recorded from different organs, the publicly available MIT-BIH Polysomnographic Database is used. It contains a total of 18 records of multiple physiological waveforms, initially recorded for the evaluation of chronic obstructive sleep apnea (OSA) syndrome and sampled at 250 Hz [61], [62].
For the purpose of this study, we select the records slp41 and slp45 due to the availability of extensive sections of healthy stage 2 sleep; and the records slp04 and slp16 due to the existence of multiple incidents of OSA with arousal during stage 2 sleep. All records contain complete and synchronized recordings of EEG, ECG, BP, and RESP signals. The EEG signal is split into the frequency bands of: delta (0.5-3.5 Hz), theta (4-7.5 Hz), alpha (8-11.5 Hz), sigma (12-15.5 Hz), and beta (16-19.5 Hz) [63]. Hence, 8-channel time-series are extracted from each record consisting of the channels: Delta, Theta, Alpha, Sigma, Beta, ECG, BP, and RESP.
2) Formulation and Selection of Analysis Windows: These time-series are split into 8-channel non-overlapping windows with 7,500 samples per channel corresponding to the 30second annotation interval of the database. Based on the annotations, we extracted 235 multi-channel "healthy" windows corresponding to healthy stage 2 sleep (slp41 = 96 windows, slp45 = 139 windows), and 235 multi-channel "apnea" windows corresponding to OSA with arousal during stage 2 sleep (slp04 = 140 windows, slp16 = 95 windows).
3) Calculation of DisEn: The parameter values for the extraction of multiscale entropy distributions from the 235 "healthy" and 235 "apnea" windows are chosen based on the considerations discussed in Subsection III-E and displayed in Table I under the waveform physiological time-series (PT) column. Per window, we obtain ten values, one for each τ (1 to 10).
We use mvMDE to obtain one multiscale distribution from the "healthy" and one from the "apnea" datasets. For the effective study of SmvMDE variations (T, ST, and P), the variations are applied in eight iterations each per dataset. During each iteration a different channel is designated. This leads to the extraction of eight multiscale distributions from each dataset to study how the prioritization of each channel's dynamics affects the output entropy values and the physiological differentiation capacity of SmvMDE. 4) Statistical Analysis: To effectively benchmark the differentiation capacity of SmvMDE variations to mvMDE, the following steps are completed for each τ separately: 1) We compute the Hedges'g effect size [64] for the "healthy" versus "apnea" output distributions. 2) We calculate the effect size difference when moving from mvMDE to a certain SmvMDE variation with a particular designated channel. 3) We estimate the confidence intervals for each calculated effect size difference to verify their significance. In Step 3, bootstrapping is applied to the "healthy" and "apnea" output distributions to estimate the confidence intervals. The bootstrapping is implemented by sampling with replacement the sets of 235 multiscale entropy values in each output distribution. For each output distribution of the SmvMDE variations, 40 independent realizations of bootstrapped distributions are generated. No bootstrapping is applied to the output distribution of mvMDE since we seek to benchmark the SmvMDE distributions to the same, original mvMDE results.
To implement this analysis, the bootstrapped distributions of each SmvMDE and the original distribution of mvMDE are used in the following steps, which are applied for each designated channel selection and at each τ (1 to 10): 1) Each of the 40 bootstrapped "healthy" distributions is paired at random with one of the 40 bootstrapped "apnea" distributions. (This pairing is kept the same across all SmvMDE variations for consistency.) 2) The Hedges' g effect size is calculated between the two distributions of each pair, resulting in 40 sets of Hedges' g effect size values. 3) Hedges'g effect size values are also computed between the "healthy" and "apnea" distributions of mvMDE. 4) The benchmarking effect size values of mvMDE are subtracted from the effect size values extracted from each pair of boostrapped distributions. This results in 40 multiscale sets of effect size differences whose mean and 95% confidence intervals are calculated. We plot the mean and 95% confidence intervals of the effect size difference separately for each designated channel selection and τ value (1 to 10).

F. Derivative Physiological Data Experiments
The operation of SmvMDE is also studied for low-temporal resolution, derivative data. The performance of SmvMDE variations is benchmarked to that of mvMDE via the difference in output entropy for separate individuals, when moving from physiological states of low to high external stress.
1) Maximal Exercise Dataset: For the application of SmvMDE to derivative physiological data the publicly available Treadmill Maximal Exercise Test Dataset is used [17], [65]. This dataset was collected, curated, and published by the Exercise Physiology and Human Performance Lab of the University of Malaga. The recordings include five cardiorespiratory variables: heart rate (HR, in beats per min), oxygen consumption (VO2, in mL/min), carbon dioxide production (VCO2, in mL/min), respiration rate (RR, in respirations/min), and pulmonary ventilation (VE, in L/min). All variables were recorded in a synchronized manner with the sampling event being each breath measurement, resulting in a varied sampling period (usually in the range of 1-4 s).
Each test consisted of an individual walking and running on a treadmill, starting with a warm-up period of treadmill speeds close to 5 km/h, followed by a period of gradual speed increase that reached speeds in the range of 14 to 17 km/h, and completed with a cool-down period with speeds close to 5km/h. A total of 857 individuals participated in the study with some people having more than one test, resulting in 992 recordings. The participants' ages ranged from 10 to 63 y.o.
2) Formulation and Selection of Analysis Windows: Two physiological state classes are formulated: a low speed (LS) class that corresponds to data recorded during warm-up until the speed reached 7 km/h; and a high speed (HS) class that corresponds to data recorded while the treadmill speed was higher than 15 km/h. We selected recordings with at least 120 synchronised samples for each class to ensure an adequate window size for analysis. In the few cases where an individual had more than one eligible test, the first one was selected. A total of 98 eligible recordings with age range 14 to 50 y.o. are extracted.
3) Calculation of DisEn: The extracted data are 98 pairs of multivariate 120-sample windows, with each pair including one segment from the LS class and one from the HS. Due to the low temporal resolution of the data and the consequent small window size, the analysis is done only at temporal scale τ = 1. The selected parameter values are displayed in Table I. All SmvMDE variations (T, ST, and P) are applied in five iterations each, during which a different channel is designated. Consequently, for each algorithm and designated channel selection 98 pairs of DisEn values are extracted. 4) Statistical Analysis: For each experimental setup and within each of the 98 pairs of DisEn values, the entropy difference observed when moving from the LS to the HS state is recorded. Boxplots are generated to compare the output difference distributions between SmvMDE and mvMDE for each designated channel selection. Additionally, the mean absolute difference observed in each difference distribution and the number of entries that displayed an increased absolute value of difference during each SmvMDE configuration, compared to their mvMDE values, are reported. Finally, to highlight a potential directionality that could match the EP hypothesis [8], [29], the number of entries with a higher entropy value in the LS state than in the HS state are also reported for each configuration.

A. Synthetic Time-Series Experiments
The results of the application of mvMDE and SmvMDE on 3-channel time-series of WGN and 1/f noise, are presented in Fig. 2 and Fig. 3 for univariate length of 15,000 and 300 samples, respectively. For each experimental setup, replicated for 40 independent iterations, the mean and standard deviation of DisEn values are plotted for each τ (1 to 20).
1) mvMDE Operation: The operation of the mvMDE matches the patterns that have been verified by prior research [39]. As τ increases, the output entropy value has a stronger decline for the 3-channel WGN time-series. As the number of 1/f channels increase, the output entropy value follows a more stable profile with the 3-channel 1/f time-series being the most stable.
2) SmvMDE Operation: For experimental setups that contain solely WGN channels and 1/f channels respectively, the operation of all three SmvMDE variations is identical to mvMDE, as expected. In contrast in the other experiments, a stronger decline of output entropy is observed as τ increases when a WGN channel is designated. Instead when a 1/f channel is designated, the output follows a more stable profile for increasing values of τ .
When comparing the results of the three SmvMDE variations for the same experimental setup: 1) Using the mvMDE output values as reference, the largest deviations are observed by the P-SmvMDE variation, followed by the T-SmvMDE, and then the ST-SmvMDE variation.
2) The ST-SmvMDE outputs are between those of T-SmvMDE and mvMDE as expected by its design and the w value set to 0.5.

3) The higher deviation of the P-SmvMDE outputs from T-
SmvMDE is expected when considering that for an m = 2 the P-SmvMDE variation gives a higher prioritization to the core stratum than the respective implementation of T-SmvMDE with m = 2 and t = 1. Fig. 3 displays the results for the 300 sample length experiments. For all tested algorithms, the outputs follow the same patterns as their 15,000 sample length equivalent, indicating that the operation of SmvMDE remains the same regardless of time-series length. However, for all experimental setups, the standard deviation values are increased, with the increase being stronger for larger τ values, as expected. Consequently, between the outputs of SmvMDE variations, overlapping can be observed between experimental setups that combine WGN and 1/f . This indicates that during the analysis of multi-channel time-series, the sample size of the window being analyzed should be larger than the respective minimum size for mvMDE.

B. Computational Time
The results in Table II indicate that SmvMDE variations  maintain the low computational time of the original mvMDE, as expected, since no computationally critical operations have been modified and the linear time complexity is maintained. Across all variations the main factor affecting the computation time is the univariate length of the time-series. When comparing the results for experimental setups with the same univariate length, the differences in computation time between the original mvMDE and the SmvMDE variations become more noticeable for higher number of channels.
The maximum differences in computation time are noted in the experimental setup with a time-series length of 100,000 samples and 8-channels. The maximum increase of 1.672 seconds (4.98%) is noted when moving from the mvMDE to the ST-SmvMDE algorithm while the maximum decrease of 1.135 seconds (3.38%) is noted when moving from mvMDE to T-SmvMDE. The decrease of computation time in the case of T-SmvMDE is an expected benefit due to the lower number of subvectors utilised in that variation.

C. Waveform Physiological Time-Series Experiments
The results of the statistical analysis implemented on the output entropy distributions extracted from the 235 "healthy" and 235 "apnea" 8-channel windows using T-SmvMDE and P-SmvMDE, are presented in Fig. 4, with each subplot corresponding to a different designated channel selection. The mean Hedges'g effect size difference and the 95% confidence intervals are plotted for each τ (1 to 10).For clarity, only the confidence intervals that do not overlap with 0 are plotted.
ST-SmvMDE is, by design, an intermediary variation between T-SmvMDE and mvMDE. Thus, its outputs also follow an intermediary pattern, closer to the operation of mvMDE, leading to smaller effect size differences.
1) T-SmvMDE Operation: The benchmarking of T-SmvMDE, indicates that the prioritization of the following channels leads to consistent increases in differences between the output entropy distributions extracted from "healthy" vs "apnea" windows when moving from the application of mvMDE to T-SmvMDE: • ECG, RESP, and Alpha channels across all values of τ . • Beta channel for τ values of 2 to 10. • BP channel for τ values of 5 to 10. The multiple cases of increase in effect size indicate that the T-SmvMDE variation may quantify differences between the two states that the direct application of mvMDE was not able to highlight.
2) P-SmvMDE Operation: The respective benchmarking results for P-SmvMDE indicate that increases in difference between the output entropy distributions are observed when prioritizing one of the following channels: • Alpha across all values of τ . • Beta for τ values of 3 to 8.
• ECG for τ values of 7 to 10. Consequently, the designated channels displaying increased discrimination capacity for P-SmvMDE were also highlighted by T-SmvMDE. However, increases observed by moving to P-SmvMDE were smaller in magnitude and for fewer designated channel selections compared to T-SmvMDE. Considering the parameter values used for the SmvMDE variations in this  setup, the T-SmvMDE sets a higher prioritization to the core stratum over the periphery compared to P-SmvMDE. This may indicate that this particular application benefited from implementations that defined stronger prioritization. Furthermore, within the framework of Stratified Entropy, the detection of certain prioritization cases as more effective in extracting distinct feature distributions between physiological states, highlights the potential for the development of feature selection methodologies that would aim to optimize physiological classification tasks.

D. Derivative Physiological Data Experiments
The DisEn differences observed when moving from the LS to the HS state for each of the 98 exercise tests are displayed in the boxplots of Fig. 5. Each panel corresponds to a different designated channel selection and includes the boxplots with the distributions of differences observed through the application of mvMDE, T-SmvMDE, and P-SmvMDE. The mean absolute difference observed during the application of mvMDE is equal to 0.0894 while the respective mean absolute differences, number of entries with increased entropy difference compared to mvMDE and number of entries with larger entropy during LS versus HS are displayed for each SmvMDE and designated channel in Table III. 1) SmvMDE Operation: When benchmarking the operation of SmvMDE to mvMDE, an improvement in differentiation capacity is noted when the VCO2, RR, and VE channels are designated. This improvement is consistent for both T-SmvMDE and P-SmvMDE with increases in the mean absolute difference and the entries with increased LS-HS difference. The selection of HR as a designated channel displayed increased differentiation capacity for P-SmvMDE. Similarly to Subsection III-C, the majority of designated channels for which an increase in the discrimination capacity of SmvMDE is noted are common between T-SmvMDE and P-SmvMDE, indicating that while the two variations provide different ways to prioritise strata, they have the capacity of highlighting similar dynamics that were overshadowed by traditional multivariate analysis.
It is important to note that when designating the RR channel, the largest mean absolute difference is observed for both T-SmvMDE and P-SmvMDE as well as the largest number of entries where the LS DisEn values are higher than the HS ones. This points towards a LoC process [10], [28] when moving from a steady state to a state that induces increased stress in the system, in alignment with the EP hypothesis [8], [29].
E. On the implementation of Stratified Entropy 1) Input Window Length: For the implementation of entropy quantification, the selection of the c and m parameters defines the minimum length of each channel within the input window. The univariate DisEn algorithm is capable of analysing shortlength time-series [32] with minimum length (L) being L > c m · τ max . The mvMDE variation of the algorithm further improved its capacity to operate on short-length time-series due to the utilization of larger-multivariate embedding vectors compared to their univariate counterparts [39] leading to a minimum length of: L > c m ·τmax ( m·p m ) . For SmvMDE variations, the minimum input length is between the limits of univariate DisEn and mvMDE. As shown in Subsection III-A, overlapping is observed in the short-length time-series among the large τ value outputs while analyzing the same time-series with different channels being prioritized. Consequently, we would recommend the utilization of a stricter minimum, closer to the univariate DisEn: L > c m · τ max when deploying SmvMDE variations.
2) Number of designated channels: With m being an exponent in defining the minimum input window length, it is important to consider that during the implementation of Stratified Entropy, an increased value of m might be needed when increasing the number of designated channels. In such case, it is important ensure that there are not multiple subvectors consisting entirely of samples retrieved from designated channels which would lead to them being treated equally and result to an output profile that would resemble that of mvMDE. Furthermore, while an increase in the value of m would allow additional designated channels this might not be an optimal approach, since an overshadowing of dynamics would now be possible to occur within the core stratum itself. Thus, we recommend that the majority of Stratified Entropy applications follow a conservative approach when allocating channels to the core stratum.
3) Number of Strata: The total number of strata defined in a Stratified Entropy implementation affects both its design and its implementation since appropriate algorithmic steps have to be formulated for the prioritization of channels based on their strata allocation, while proper selection of parameter values is required for effective operation. For example, in the case of expanding the presented SmvMDE variations from a two to a three strata configuration, the T-SmvMDE and the ST-SmvMDE variations could be modified to operate with two different t and w (in the case of ST) values based on which strata are prioritized, while P-SmvMDE could be modified with having two tiers of proportional factors respectively. This design modification could be complemented with an appropriate increase of the m value to allow samples of varied prioritization to be included in the same subvectors similarly to the process discussed for having multiple designated channels.
However, while the expansion of the total number of strata is possible, it increases algorithmic complexity and restricts the range of effective parameter values, particularly of m as discussed above. Therefore, configurations with increased numbers of strata might be more relevant for applications that would clearly benefit from their utilization despite the increased complexity, such as for example when a priori knowledge exists with regards to a hierarchy of channels.

F. Limitations and Future Work
Our algorithmic variations illustrate successful implementations of the Stratified Entropy framework, with effective prioritization of the channels allocated to the core stratum over the periphery, and the extraction of novel features. However, it is important to expand its implementation using additional entropy quantification algorithms, such as PEn, to acquire a more complete perspective on the utility that the framework offers. Furthermore, due to its capacity to be applied in a modular manner and with low computational cost, it would be worthwhile to combine Stratified Entropy with other variations of entropy algorithms to target specific applications. Examples include its integration with the aforementioned non-uniform multiscale embedding to incorporate a priori knowledge, with optimal scale selection for each channel, or the utilization of a fuzzy membership function in DisEn [66].
The results in the experiments of derivative physiological data indicate a directionality in agreement with the LoC paradigm and the EP hypothesis. Hence, it would be important to replicate the analyses in other datasets and study the capacity of SmvMDE to quantify the directionality of EP phenomena. Moreover, the combination of SmvMDE with machine learning would allow for physiological state classification and prediction tasks. Consequently, strata allocations should be selected with appropriate justification or through effective feature selection processes to avoid data dredging or overfitting.

IV. CONCLUSION
We introduce the framework of Stratified Entropy and present three algorithmic variations for its implementation. Stratified Entropy allows the extraction of features that would not be accessible through traditional multivariate entropy analysis by allowing the prioritization of certain channels' dynamics over others' based on the allocation of channels to different strata. The SmvMDE variations significantly extend mvMDE through the inclusion of algorithmic steps that prioritize samples extracted from channels in the prioritized stratum during the calculation of the entropy value.
The results from the application of SmvMDE to timeseries consisting of uncorrelated WGN and 1/f noise indicate that the variations successfully prioritize the dynamics of the designated channel. The low computation time profile of the original mvMDE variation is maintained due to no computationally critical steps being modified. When applying the SmvMDE variations to 8-channel waveform physiological time-series, certain SmvMDE features produce distributions with higher statistical difference between healthy versus OSA sleep of stage 2, indicating increased discrimination capacity of SmvMDE over mvMDE for applications that would benefit from a stratification of the time-series' channels. The respective results from low temporal resolution derivative physiological data further highlight the increased discrimination capacity of SmvMDE and its potential use to detect the directionality of "entropy pump phenomena".
The presented framework is flexible with regards to the number of channels allocated to the prioritized stratum and the total number of strata. Furthermore, it can be extended to other entropy quantification algorithms and combined with machine learning. Consequently, with appropriate algorithmic design and parameter configuration, we expect the framework of Stratified Entropy to provide novel and effective methodologies for the extraction of viable physiological information.