DynDSE: Automated Multi-Objective Design Space Exploration for Context-Adaptive Wearable IoT Edge Devices

We describe a simulation-based Design Space Exploration procedure (DynDSE) for wearable IoT edge devices that retrieve events from streaming sensor data using context-adaptive pattern recognition algorithms. We provide a formal characterisation of the design space, given a set of system functionalities, components and their parameters. An iterative search evaluates configurations according to a set of requirements in simulations with actual sensor data. The inherent trade-offs embedded in conflicting metrics are explored to find an optimal configuration given the application-specific conditions. Our metrics include retrieval performance, execution time, energy consumption, memory demand, and communication latency. We report a case study for the design of electromyographic-monitoring eyeglasses with applications in automatic dietary monitoring. The design space included two spotting algorithms, and two sampling algorithms, intended for real-time execution on three microcontrollers. DynDSE yielded configurations that balance retrieval performance and resource consumption with an F1 score above 80% at an energy consumption that was 70% below the default, non-optimised configuration. We expect that the DynDSE approach can be applied to find suitable wearable IoT system designs in a variety of sensor-based applications.


Introduction
Autonomous wearable IoT devices are being used for physiological and behavioural health-monitoring [1] and provide relevant health status information to their wearers [2,3]. Miniaturised electronics embedded in wearable accessories, garments, etc., provide the resources to retrieve pattern events from streaming sensor data and to interact with the wearer, which led to the concept of edge computing [4]. Edge computing aims to process data at the devices end, rather than the cloud to reduce network load and service response time. Furthermore, reducing communication bandwidth often lowers energy consumption, as well as privacy, and security concerns. For example, in medical IoT monitoring applications [5], a device may retrieve relevant events using embedded machine learning methods, thus sending only abstract event information to the cloud. Nevertheless, resource constrains are a key feature of IoT devices. A wearable IoT device typically consists of multiple sensors, a microcontroller (µC), which runs data processing algorithms, memory, and a radio module for data communication ( Figure 1). Therefore, the optimisation of resource-constrained IoT edge devices has high priority in the design process [6].
The design of an IoT device can be interpreted as the selection of an optimal hardware and software configuration from the wide design space of possible options. Certain system configurations may be not compatible with specific system requirements, e.g., energy saving and retrieval performance. DynDSE procedure for wearable IoT edge devices. X |E , Ω: design space; X |E, Ω: system configuration; π benefit metrics set; ρ: cost metrics set; z π : benefit requirement set; z ρ : cost requirement set; s i,k : data sample i from channel k.
Available µCs present differences in terms of resource consumption, execution time, and energy consumption. With the complex interplay of hardware and software components, it is a difficult task to quantify resource use and to manually identify optimal configurations that fulfil system requirements best. The size of the architectural design space often makes manual exploration of embedded systems unfeasible. Automated Design Space Exploration (DSE) [7,8] provides a computational framework to identify optimal configurations. The design problem is exacerbated when the system does include functions that cannot be statically approximated, and by those with a dynamic effect on the resource-performance trade-off. Sampling strategies, e.g., context-adaptive sampling [9], are a basic dynamic function of wearable IoT devices. A context-adaptive sampling is a dynamic sampling strategy that tunes the sensor's sampling rate based on a context measure, thus aiming at minimising energy consumption. The stochastic and variable nature of human behavioural patterns induces variability into context-adaptive sampling behaviour, which, in turn, drastically affects the resource-perfomance trade-off. Therefore, the main challenge for the design of a context-adaptive wearable IoT device lies in the identification of viable configurations that fulfil the system requirements under dynamically varying conditions. With DynDSE, we explicitly incorporate context-adaptive system behaviour in the design exploration and simulate systems with actual sensor data.
In this paper, we provide the following contributions: 1. We present a simulation-based Design Space Exploration (DynDSE) procedure for wearable IoT devices that employ context-adaptive pattern recognition algorithms for event retrieval, see Figure 1.
We provide a formal characterisation of the design space, given a set of system functionalities, components, and their parameters. An iterative search evaluates configurations according to system requirements. The inherent trade-off embedded in conflicting objectives are explored to find an optimal configuration. 2. We perform a wearable IoT application evaluation to analyse the resource-performance trade-off considering static and dynamic design aspects through simulations with actual data of Electromyography (EMG)-monitoring eyeglasses in automated dietary monitoring.
In the present work, we formally introduce the DynDSE exploration framework in Section 3 and relevant optimisation metrics in Section 4. Subsequently, a comprehensive wearable IoT application case is presented and analysed in Sections 5 and 6 to detail the potential of DynDSE.

Related Work
DSE frameworks are used for hardware/software co-design of heterogeneous multiprocessor and system-on-chip architectures [10], embedded systems [11], or Field Programmable Gate Array (FPGA) platforms [12]. In conventional DSE approaches, multiple metrics, e.g., energy consumption, memory demand, and cost, must be optimised concurrently according to some application requirements. The conflicting nature of objectives, which reflect many system characteristics, produces trade-offs inherent to the overall system performance. A decision on the most adequate system configuration needs to be taken according to a multi-objective optimisation process. The majority of DSE methods belongs to one of three analysis and evaluation categories, i.e., prototype-based, analytics-based, and simulation-based. The three categories differ in terms of a design time-modelling accuracy trade-off [8]. For example, the prototype-based evaluation provides adequate modelling accuracy, but requires development time with limited exploration capability. The analytics-based evaluation relies on analytical description of component interactions, which allows designers to explore larger design space portions in acceptable time. However, especially with complex architectures, the modelling accuracy of analytics-based evaluations is limited. The simulation-based evaluation is the most versatile approach, as time-modelling accuracy trade-offs of different designs can be achieved by tuning the simulation characteristics. For example, a lower abstraction simulation level, i.e., simulating digital signals between registers and combinational logic, yields higher accuracy but lowers simulation time for analysing software stacks. A higher abstraction simulation level, i.e., simulating system components at the cycle level, computes more efficient simulation, at the cost of averaging inter-cycle system state information. Moreover, simulation-based DSE enables dynamic profiling at run time, which allows the designer to quantify and optimise complex dynamic component interactions and workload.
In last two decades, many DSE approaches have been proposed to design wearable devices. For example, Bharatula et al. [13] proposed a design method to achieve a resource-performance trade-off for a highly-miniaturised multi-sensor context recognition system. Their evaluation showed that, through variations of system design space, the optimisation method was able to extend the battery lifetime by several hours. The same research group introduced multiple metrics to analyse the resource-performance trade-off of a wearable system, i.e., an accelerometer, a microphone, a light sensor, and a TI MSP430F1611 µC [14]. The authors presented an experimental validation in which a manual multi-objective optimisation was applied to find the best system configuration. In contrast to the works of Bharatula et al., we propose an analytical characterisation of the system architecture with the aim to automate the DSE modelling. Anliker et al. [15,16] presented an automatic design methodology based on abstract modules and task-dependent constraint sets. A multi-objective function incorporated recognition accuracy, energy consumption, and wearability, applied to classification of three modes of locomotion. In contrast, our simulation-based analysis is based on a realistic dataset collected in daily living. We evaluated the relevant effects of the free-living settings on the metrics. Beretta et al. [17] presented a model-based design optimisation to analytically characterise the energy consumption of a wearable sensor node. The authors described a multi-objective exploration algorithm to evaluate system configurations and relative trade-off. The method was application-driven with a fixed system architecture. Stäger et al. [18] took in account several system configuration options, including sensor types, sensor parameters, features, and classifiers. A case study was described related to detection of interactions with household appliances by means of a wrist worn microphone and accelerometer. Evidences were presented for an improvement in battery lifetime by a factor 2-4 with only little degradation in recognition performance.
The aforementioned DSE approaches focused on the evaluation and exploration of wearable device architectures under static workloads. However, todays wearable systems may adopt opportunistic sensing strategies to balance energy consumption and information acquired by a wearable or mobile systems. For example, Rault et al. [19] provided an analysis of techniques for energy consumption reduction in wearable sensors for healthcare applications. Opportunistic sensing strategies consider dynamic effects on the resource consumption trade-off, not considered in a static DSE. For example, an adaptive sampling scheme may reduce sampling rate according to a detected lower signal entropy. For instance, Mesin [20] proposed an adaptive sampling scheme based on sample prediction, where a non-uniform schedule increased the sampling rate only during bursts of physical activity. A multi-layer perceptron predicted subsequent samples and their uncertainties, triggering a measurement when the uncertainty of the prediction exceeded a threshold. In contrast, our approach employs a transparent state-based reactive model to estimate relevance of future samples. Scarabottolo et. al. [21] presented a dynamic sampling strategy for low-power embedded devices. The sampling rate tuning was based on the analysis of the signal's spectral content. Rieger and Taylor [22] proposed a low-power analog system for real-time adaptive sampling rate tuning, proportional to the signal curvature. Different from our approach, a-priori knowledge was required. Moreover, in contrast to Rieger and Taylor's [22] low-power analog system, we consider a pattern spotting problem to analyse performance that is frequently required in wearable IoT systems.

Design Space Representation
The design space configurations consist of set X of functionalities, realised by a set E of components, which are characterised by the set Ω of parameters. An example is provided at the end of this section. Formally, the set of functionalities X is expressed as: where the functionality Ξ ξ is the element of the set X , ξ is the index set to X , and N Ξ is the number of elements of ξ.
A functionality Ξ ξ is carried out by one or more components grouped in the set ξ , indexed by the index set q ξ , expressed as: where the component ξ,q is the element of the set ξ , q ξ is the index set to ∼ , and N ξ is the number of components associated to the functionality ξ.
Overall, the design space consists of a collection E of system components sets, indexed by the collection Q of index sets, expressed as: A component ξ,q is characterised by one or more component parameters grouped in the set ω ξ,q , indexed by the index set w ξ,q , expressed as: where the component parameter ω ξ,q,w is the element of the set ω ξ,q , w ξ,q is the index set to ω ξ,q , and N ξ,q is the number of component parameters associated to the functionality ξ and the component q. Overall, the design space consists of a collection Ω of component parameters sets, indexed by the collection W of index sets, expressed as: For example, a spotting algorithm Ξ 1 is represented by a component 1,1 , e.g., a FFT-based algorithm, characterised by a component parameter ω 1,1,1 , e.g., the data frame size. Data sampling Ξ 2 is represented by a component 2,1 , e.g., an uniform sampling. A processing unit Ξ 3 is represented by a component 3,2 , e.g., a Texas Instrument µC, characterised by a component parameter ω 3,2,1 , e.g., the µC's clock frequency. In a compact form, the design space is expressed as:

Configuration Generation
The configuration generation selects a design candidate to be evaluated in the simulation, i.e., see Section 3.3. The configuration generation is composed by two main stages. In the first stage, for each functionality Ξ ξ , a component set c ξ , indexed by the index set q c ξ , is selected as: where q c ξ represents a subset of the the index set q ξ . Overall, a configuration consists of a collection E of system components sets, indexed by the collection Q c of index sets, expressed as: In the second stage, for each component ξ,q ∈ c ξ a component parameters set ω c ξ,q , indexed by the index set w c ξ,q , is selected as: where w c ξ,q represents a subset of the the index set w c ξ,q . Overall, a system configuration consists of a collection Ω of system component parameters sets, indexed by the collection W c of index sets, expressed as: In a compact form, a system configuration is expressed as:

Configuration Evaluation
A metric estimates benefits π or costs ρ of a configuration. The configuration evaluation is based on two sets of metrics. The benefit metric set is defined as: where the element π p (X |E, Ω) is a benefit metric and N p is the number of benefit metrics. The benefit metric set π is subjected to the benefit requirement set as: Similarly, the cost metric set is defined as: where the element ρ r (X |E, Ω) is a cost metric and N r is the number of cost metrics. The cost objective set ρ is subjected to the cost requirement set as: Each π p (X |E, Ω) and ρ r (X |E, Ω) maps a configuration X |E, Ω to the real space IR, i.e., X |E, Ω → IR. The overall optimisation process is formally described as follows. Given a design space X |E , Ω, the goal of DynDSE is to find the configuration X |E, Ω, which maximises benefits π and minimises costs ρ, while respecting the respective set of requirements. The problem can be interpreted as a constrained multi-objective optimisation: The optimisation provides a set of mutually conflicting solutions, which reflects the trade-offs in the design. To define optimality, one can usually exploit the concept of Pareto-dominance, i.e., a decision maker prefers a configuration to another if it is equal or better in all objectives and strictly better in at least one. As Künzli et al. [23] pointed out, several approaches exists to solve the multi-objective optimisation, e.g., exploration by hand [24], exhaustive search [25], or reduction to a single objective [26], as done in this work. Table 1 lists the metrics introduced in this Section and their respective requirements. Table 1. Our metrics and their relative system requirements. The table also indicates the elements which affect the system requirements. T denotes the runtime duration and m the data frame length.

Optimisation Metrics
Variable (see Table 5) X

Retrieval Performance Metric
The retrieval performance of an event retrieval algorithm is expressed through Precision-Recall metric [27], as follows: From the application perspective, the algorithm must be able to keep an adequate level of retrieval performance. The retrieval performance requirements is usually defined by expert knowledge and we refer to it as z 1 π and z 2 π .

Execution Time Metric
A measure of computational complexity denotes the execution time of an algorithm, by pairing the algorithm module and the processing module. An algorithm's abstraction is the decomposition of an algorithm in distinct stages, in which each stage is composed by one or more functions. Each function f is broken down by counting additions and subtractions (Add), multiplications (Mult), divisions (Div), square roots (Root), exponentials (Exp), and comparisons (Comp).
The number of machine cycles n cyc f to compute a function f on a µC is defined as: where n op f ,x is the number of executions related to an arithmetical operation x to compute a function f and n cyc x is the number of cycles to execute an arithmetical operation x on a µC. To estimate the execution time ET f of a function f , the number of machine cycles n cyc f is divided by the clock frequency ν of the µC, as follows: The definition of the execution time ET depends on the runtime mode. When the runtime mode is real time, ET is computed as: When the runtime mode is online, ET is computed as: where f r is a data frame process. We refer to the system requirements for ET as z 1 ρ .

Energy Consumption Metric
µC energy consumption: Most µCs support an active state and a stand-by state. The average energy consumption in active state EC µC f , related to the computation of a function f , is proportional to the ET f and defined as: where EC µC f is expressed in Wh, P µC act is the power consumption of the µC in active mode expressed in W, and ET f is expressed in hours.
The average energy consumption in stand-by state EC µC stb is modelled as: where T stb is the time period of inactivity expressed in hours, and P µC stb is the power consumption in stand-by mode expressed in W.
The µC energy consumption was calculated as: where ∑ f r ∑ f EC µC f denotes the active state energy. Sensor energy consumption: The average instantaneous energy consumption EC s t for a single sensing component is computed by applying the following equations: where I s act is the sensor's average current in active state, I s stb is the sensor's average current in stand-by state, D t is the instantaneous duty cycle rate, EC s t is expressed in Wh, V is the voltage level of the sensing component, and t r is the temporal resolution expressed in hours.
The sensor energy consumption was calculated as: fFlash/Non-Volatile Memory Energy Consumption: To estimate the flash and programmable memory, we formulated an energy model inspired by Konstantakos et al. [28]. Writing energy consumption was determined by: where b indicates a block, I m write is the average current consumption in writing mode, and write time t w is the time required to write a memory block. The energy required to read a memory block was neglected. A static memory energy consumption term was computed as: where I m b is the stand-by state current value, and T is the total simulation time. The memory energy consumption was calculated as: Radio transmission energy consumption measure: To estimate the energy consumption of the wireless communication, we relied on the energy model described by Prayati et al. [29]. The model considered the following three stages for transmission: (1) initialisation of transmission and transferring of the frame data from memory to the radio chip FIFO buffer, (2) back-off timeout, and (3) packet transmission via the wireless channel.
In order to calculate the energy consumption to transmit a packet, the following formula was applied: where I trans is the transmission current, p indicating packets, and t trans is the time requested to prepare and send a packet. As EC r p is expressed in Wh, t trans must be converted in hours. In this work we considered I trans = 21.7 mA, when a transmission power threshold 0 dBm is chosen, and t trans = 16 ms is the time requested to prepare and send a packet size of 114 Bytes.
The radio transmission energy consumption was calculated as: Total energy consumption: The energy consumption metric EC is computed as follows: The application context imposes an energy budget requirement to the behavioural monitoring. For example, while monitoring dietary behaviour, the wearable system should be able to work uninterruptedly for the entire day, in order to not miss relevant activities. The system requirement z 2 ρ for EC indicates the energy required to deploy the application for the entire runtime. We defined z 2 ρ as the quantity in mW calculated as: where BC is the battery capacity in mWh, RT is the required runtime of the application expressed in hours, e.g., 16 h, and the factor 0 < φ ≤ 1 considers the effect of external factors, which can affect the battery life.
In this work we considered φ = 0.9.

Memory Demand Metric
The memory demand MD is an upper bound of the memory required by the system to execute an event retrieval. MD is computed by considering four terms, i.e., the code memory m c , the data memory m d , the processing memory m f , and the event memory m e .
The memory demand is defined as: and m e = m e,Int + m e,Float , where m f ,Int and m e,Int are the memory required to store integer values, and m f ,Float and m e,Float are the memory required to store float values. We refer to the system requirement for MD as z 3 ρ , which represents the maximum amount of memory available to store information on a certain µC.

Communication Latency Metric
The communication latency is the time span between the event-related raw sensor data measurements and the delivery moment at the receiver of the retrieved event information. The communication latency metric CL is computed as follows: where MPS is the transmitter's maximum payload size, e.g., 216 bits for Bluetooth Low Energy (BLE), connInterval defines the time of connection events, i.e., ranges from 7.5 ms to 4.0 s with steps of 1.25 ms for BLE, and ν tr is the transmission data rate. We refer to the system requirement for CL as z 4 ρ , which represents the communication latency tolerance for the application.

Smart Eyeglasses to Monitor Eating in Free-Living
We implemented our DynDSE for the design optimisation of 3D-printed regular-looking eyeglasses, which accommodate processing electronics, EMG electrodes, antenna, and power supply. Smart eyeglasses are particularly suited for automated dietary monitoring to unobtrusively detect intake and eating events from activities around the head throughout everyday life, thus replacing classic food journaling and supporting disease management [30,31]. As typical wearable IoT devices, smart eyeglasses could process data locally and provide estimates to other body-worn devices, e.g. smartphones. Furthermore, the monitoring task is a typical example of wearable IoT applications in remote health assistance.
Two pairs of electrodes were symmetrically integrated at the eyeglasses frame, located around the temple ear bends. Contraction of temporalis muscles was monitored bilaterally resulting in two EMG signal channels.
The EMG sensor data stream was segmented into eating and non-eating periods by pattern spotting algorithms. The spotting algorithms were designed to extract features in continuous EMG sensor data and perform one-class classification to identify eating events (i.e., time span between the start and the end of an eating activity), see Section 5.2.1.

Design Space Representation
We considered N ξ = 4 system functionalities: Ξ 1 , an algorithm for event retrieval, Ξ 2 , a data sampling strategy, Ξ 3 , a µC and Ξ 4 , a runtime mode. Table 2 presents the design space considered in this work. Our design space included two spotting algorithms, i.e., 1,1 and 1,2 , which were considered for execution on three µCs, i.e., 3,1 , 3,2 and 3,3 . Moreover, two data sampling strategies, i.e., 2,1 and 2,2 , were considered, while applying two runtime modes, i.e., 4,1 and 4,2 . To deal with the requirements in this case study, we relaxed the optimisation problem of Equation (18) into one of maximisation as follows: max X |E,Ω⊆X |E,Ω ∑ p π p (X |E, Ω) The above constrained optimisation problem was solved by a grid search-based approach, evaluating a grid of possible configurations with an exhaustive search. At each iteration, a system configuration was generated and the sensor data processed through the simulation. The metrics were estimated and compared with the respective requirements.

Algorithm (Ξ 1 )
FFT-based spotting: The first pattern spotting method was introduced by Zhang and Amft [32] in order to identify eating moments. An online non-overlapping sliding window segmentation with length m expressed in seconds was used to extract features in continuous EMG data. A one-class classification was performed by a one-class SVM (oc-SVM). Details of the feature extraction and one-class classification can be found in [32]. The ocSVM was trained applying the leave-one-participant-out (LOPO) cross-validation strategy. Hyperparameter optimisation was performed using grid search approach.
WPD-based spotting: We designed the second pattern spotting method inspired by the classification task presented in [33]. An online non-overlapping sliding window segmentation with length m expressed in seconds was used. From each segmented frame, the maximum sample value was extracted and compared to a threshold experimentally found. When the sample value was lower than the threshold, the frame was not fed to the spotting pipeline. In the pre-processing module, the EMG signals were passed through a notch filter of 50 Hz to remove power line's interferences, likely to occur in free-living, also de-trended by a digital high pass filter of 20 Hz and rectified. In the feature extraction module, the signal was passed through a Wavelet Packet Decomposition (WPD), to extract c features, i.e., the WPD coefficients, in the time-frequency domain. The depth level of the tree decomposition was kept constant, i.e., l = 2. A principal component analysis (PCA) was subsequently used to reduce the number of features from c to d. After normalisation, the features were used as discriminant of the target class by using a ocSVM, with number of support vectors v = 1500.
The ocSVM was trained applying the LOPO cross-validation strategy. Hyperparameter optimisation was performed using grid search approach. Table 3 presents a breakdown of the spotting algorithms.

Data Sampling (Ξ 2 )
A context-adaptive sampling algorithm needs two main components, i.e., a context measure and a response model, to adapt the sampling rate depending on an estimation of relevance of future samples. As a context measure, we employed a basic representation of EMG signal energy. A feedforward state-based model that alternates between attentive and sleep states was adopted as response model. Our sampling strategy was based on the n-shots measure paradigm: the sensor wakes up, takes n samples, and goes to sleep again. The energy content from the n samples was used to compute the context measure as: where e k is the signal energy for the k th channel, s i,k is the i th value sampled from the n-shots measurement, θ t is the context measure, t is the time-step, K is the number of channels that connect to the same system. A linear mapping function converted θ t to a candidate duty cycle rate D * t+1 : where D l is the minimum duty rate set to θ l , which was estimated from the signals noise. A maximum duty rate D h was set to θ h . We adjusted the model's sensitivity by tuning θ h . The behaviour of the response model is described by Equation (42). The computation of the duty cycle rate for the next period was based on a comparison between the candidate duty cycle rate D * t+1 and the threshold value D TH . As the D * t+1 exceeds the threshold D TH , the response model switches from inattentive state to attentive state. The attentive state is characterised by a monotonically increasing duty cycle rate. As the D * t+1 drops below the threshold D TH and the attention time τ expires, the response model switches back from attentive state to inattentive state. The duty cycle rate's decision rules for the two states were computed as follows: (1) Inattentive state If (D * t+1 < D TH ) and τ elapsed) : Table 4 presents a break down of the context-adaptive sampling algorithm. More details can be found in our previous work [34]. Table 4. Analytical breakdown of the context-adaptive sampling algorithm. The variable n denotes the number of samples taken during the n-shots measure, and g denotes the number of channels. The function is executed at any n-shot measure.

Add Mult Div Root Comp Exp Integers Floats
Context measure 2n · g -2 - The processing module simulates the behaviour of an energy-efficient µC while computing the algorithm's functions at a certain clock frequency. We considered three commonly widely used µCs, i.e., PSoC1 M8C, TI MSP430F1611, and ARM CortexM3. A Li-Ion polymer battery provided energy to the system. Each µC was provided with memory for local data processing, i.e., Flash and non-volatile memory. Table 5 shows the number of machine cycles n cyc x related to the arithmetical operation x for different µCs. Two runtime modes were considered in this work, i.e., online and real time. In our application, real time mode implies that as soon as a data frame has been recorded, the output of the data processing must be available. With online data processing, the temporal requirement is more relaxed, as the output of the data processing must be available by the end of the runtime.

Sensor Dataset
Ten healthy volunteers (4 females, 6 males) aged between 20 and 30 years wore the EMG-monitoring eyeglasses for one day. The application data used in the simulation were sensor data collected at uniform sampling rate, i.e., 256 Hz. The eyeglasses were attached after getting up in the morning and kept on till bed time. When a risk of contamination with water existed, the participants were allowed to remove the eyeglasses. Participants manually logged the occurrence of eating events in a diet journal with a one minute resolution.

Multi-Objective Computation
Our framework is based on a simulation which reproduces the functionalities of a wearable IoT system. The sensor's and µC's behaviour can be emulated with a finite-state machine approach, e.g., Buschhoff et al. [35], in order to reproduce dependencies between the system components and between hardware and software.
We evaluated the algorithm's retrieval performance considering the total number of samples of all eating events according to ground truth labels, the total number of samples of all retrieved eating events, and the sum of the number of samples correctly retrieved belonging to eating events segments.
The number of machine cycles for an instance of a oc-SVM for the WPD-based spotting algorithm, with hyper-parameters (m = 256, d = 20), applying Equation (20) For simplicity, we assumed that all operations were performed using float data type. Accordingly to Equation (21), the ET f is computed from the number of machine cycles n cyc f , as inversely proportional to the clock frequency ν of a considered µC. Table 6 lists the clock frequencies of the three candidate µCs.
Time ET f to execute the oc-SVM on a ARM CortexM3 was 114.3 ms.
Accordingly to Equation (24), the EC µC f is computed multiplying the ET f and the µC energy consumption P µC act . The µC switching behaviour is regulated by the time constraints given by the ET and the sensor data frequency. Table 6 lists the current consumptions I µC act in active state, I µC stb in stand-by state, and the voltage level V, of the three candidate µCs. For example, the energy consumption EC µC f to execute an instance of the oc-SVM on a ARM CortexM3 was 0.732 µWh.   Tables 5 and 6 list the memory specifications and the data resolution for the three candidate µCs. Each µC had a RAM memory and a larger flash support, and their capacity represented a system constraint.
By switching runtime mode, i.e., real-time mode or online mode, m d was defined as follows. In real-time mode, the m d was defined as the amount of data memory for a window size m. In online mode, the m d was estimated as the peak of memory footprint required to process the data stream continuously without data loss, using a ring buffer.
Depending on their characteristics, the considered µCs had a certain latency that shaped the distribution of their memory footprint. For example, Figure 3 shows the distribution of the memory m d footprint related to a daylong processing. The BLE transmission's specifications defined the maximum ν tr as 305 kbps and the MPS as 215 bits [36]. Equation (38) defines the actual ν tr by tuning the connInterval parameter, which ranged from 7.5 ms to 4.0 s with steps of 1.25 ms. For example, a retrieved eating event was represented by two time stamps, which represent the start and end of an eating event. A time stamp can be represented as an unsigned short variable type in 2 bytes (i.e., 5 bits for day, 5 bits for hour, and 6 bits for minute). The communication latency CL to deliver an event retrieved by the FFT-based spotting (m = 13) on a ARM CortexM3 was 886 ms, according to Equation (37).
Considering the typical event frequencies in human daily behaviour and the negligible memory footprint for event information, communication latency was omitted from the following analyses.

Multi-Objective Visualisation
To visualise whether the metrics lie within the system requirements' boundary conditions or not, we exploited radar plots. Radar plots support the representation of the trade-off across the design space. Each y-axes is related to an objective, i.e., P, R, EC, ET, and MD, and normalised for the respective requirements, yielding a feasibility region within the unitary axes. Thus, a radar plot point beyond the unitary axes indicates unacceptable configurations. The unitary axes represents an upper bound regarding the requirements.

Multi-Objective Analysis
For FFT-based, by tuning the parameter m, the precision's variation ranged from 48.8% for m = 1, to 95.7% for m = 33. The recall R variation ranged from 62.9% for m = 29, to 99.2% for m = 1. The F1-score variation ranged from 63.3% for m = 1, to 85.2% for m = 9. For WPD-based, by tuning the parameters m and d the precision P variation ranged from 38.7% for m = 0.25, d = 50, to 95.7% for m = 0.5, d = 15. The recall R variation ranged from 52.7% for m = 0.25, d = 28, to 86.2% for m = 1, d = 40. The F1-score variation ranged from 49.1.0% for m = 0.25, d = 50, to 85.9% for m = 1, d = 40. The system requirements z 1 π and z 2 π are indicated as horizontal lines. It is evident that only a subset of spotting parameters respects the requirements. Figure 4a,b show the results of the cross-validated FFT-based and WPD-based spotting algorithms in uniform sampling mode, when tuning their spotting parameters. Specifically, precision P, recall R, and F1-score, are reported for all spotting parameters combinations. The degree of sampling reduction was changed by tuning θ h in order to find a balance between retrieval performance and resource consumption. Figure 5a,b show the trade-off between F1-score and sampling reduction for the FFT-based and WPD-based spotting algorithms. It is clear that we can keep the retrieval performance up over 80% while performing a sampling reduction over 70%, in both methods. Comparing the two methods, the WPD-based spotting appears to be more robust against the down-sampling effect, presenting a lower degradation of retrieval performance for a given sampling reduction degree.  Individual lines correspond to the spotting parameters, which respect P and R requirements in Table 1. The F1-score requirement is derived by the same P and R requirements. The used parameters for the context-adaptive sampling were: D h = 1, D l = 0.1, D TH = 0.6, n = 4, τ = 3 s, θ l = 10 mV. (b) WPD-based spotting. Average retrieval performance vs. sampling reduction when varying θ h . Individual lines correspond to the spotting parameters that respect P and R requirements in Table 1. The F1-score requirement is derived from the same P and R requirements. The used parameters for the context-adaptive sampling were: D h = 1, D l = 0.1, D TH = 0.6, n = 4, τ = 3 s, θ l = 10 mV. From Figures 6-11, the trade-off across the metrics are shown for different system configurations. Figures 6 and 7 show results for the FFT-based spotting in real-time for uniform and context-adaptive sampling. In uniform sampling, i.e., Figure 6, the largest requirement breach on any µC was due to the energy consumption EC. In context-adaptive sampling, i.e., Figure 7, the reduction of the energy consumption EC determined a set of feasible configurations on the ARM Cortex M3. In both sampling modalities, the execution time ET fulfilled the real-time requirements on any µC, although the memory demand MD compromised the feasibility of the configurations on the PSoC1 M8C.   Figures 8 and 9 show results for the WPD-based spotting in real-time for uniform and context-adaptive sampling. In uniform sampling, i.e., Figure 8, the largest requirement breaches on any µC were due to the execution time ET and the energy consumption EC. In context-adaptive sampling, i.e., Figure 9, the reduction of the execution time ET determined a set of feasible configurations on the ARM Cortex M3. Overall, the memory demand MD was neglectable in all configurations, as only a data frame had to be stored.   Figures 10 and 11 show results for the WPD-based spotting in online mode for uniform and context-adaptive sampling. The online mode required a more intensive memory use, due to the longer ring buffer, that in turn made the z 3 ρ a more stringent requirement. In uniform sampling, i.e., Figure 10, the largest requirement breaches on any µC were due to the energy consumption EC and the memory demand MD. In context-adaptive sampling, i.e., Figure 11, the reduction of the energy consumption EC determined a set of feasible configurations on the ARM Cortex M3.   Figure 12 depicts the energy consumption estimated from the best system configuration, related to the individual participants. For three participants, the boundary conditions are not respected, implying a reduced application's runtime with respect to the system requirements.

Discussion
The simulation-based DynDSE presented here targets wearable IoT device design, which run time-variable recognition algorithms on the device. Processing a large volume of data locally and enabling local inference is key to a scalable IoT network. Furthermore, manual tuning of hardware and algorithms in a physical implementation is tedious. The DynDSE simulation-based approach can cover a wide configuration space to identify a balance between resources and event retrieval performance. Our case study demonstrated interactions and dependencies among hardware and algorithm components, and justified the need for co-designing and developing of associated functionalities. We chose two example retrieval algorithms to illustrate different effects on the optimisation result (cf. Figures 9-12). FFT-based oc-SVM spotting was published before [32]. The WPD-based spotting showed performance improvements P and R, but also implications for execution time ET, thus further illustrating the design trade-off features of our methodology. The processing functionality (Ξ 3 ) affected the execution time ET, due to the µC type and speed, and the memory demand MD, due to its memory capacity. The µC's energy consumption EC heavily depended on the algorithmic complexity, see Figure 12. Executing the WPD-based spotting, the µC's energy consumption EC was comparable with the sensor's energy consumption EC. As the computational complexity was shrunk by employing the FFT-based spotting, the energy consumption EC became neglectable. The algorithm (Ξ 1 ) and data sampling (Ξ 2 ) functionality were the most influencing configuration elements on the system's metrics. Tuning the algorithm parameters affected precision P and recall R, and the required memory demand MD. Data sampling had the highest impact on the energy consumption EC and the tuning of the algorithm parameters had the lowest.
We found that our context-adaptive sampling strategy kept the performance of the spotting pipeline at a average F1-score over 80% while reaching almost 70% reduction in resource consumption.
DynDSE requirements in the application evaluation were set according to Table 1. While in our analysis, retrieval performances (P, R) larger than 80% were reached for optimised parameter settings, our requirements z 1 π and z 2 π followed literature recommendations [37] suggesting that even 70% in retrieval performance has relevant application value. Energy consumption requirement z 2 ρ was set considering the capacity and size of standard lithium-ion batteries, and the application runtime, according to Equation (35). Execution time and memory demand requirements z 1 ρ and z 3 ρ were dictated by the algorithm and the µC characteristics, respectively. While the retrieved system configurations and their performances appear relevant, a direct comparison to prior work is not feasible, due to the diversity in analysis goals, applications, and dataset characteristics. First, many investigations optimise for a fraction of the DynDSE metrics only, e.g., recognition performance. Second, sensor and algorithm choice span a wide value space for performance metrics. Our investigation aimed at defining a generalisable procedure, which provides trade-off indicators across a variety of design space options and could thus assist designers in taking decisions and investigate details depending on application relevance. Figures 7 and 8 show the analysis of the variance in the resource-perfomance trade-off. Under the same P and R, higher sampling reduction can be achieved, which corresponds to lower energy consumption EC. In context-adaptive sampling mode, the resource consumption is proportional to the event frequency and the duration of event patterns. Consequently, for one configuration, resource saving varies according to individual behaviour. Population-averaged models do not guarantee to fulfil the system requirements for every individual. In our case study, a homogeneous study group of university students was included, however the resource consumption estimation did not respect the boundary conditions for all participants, as shown in Figure 12.
Personalising models increases computational complexity and entails more complicated deployment. A reasonable approach is to take into account the heterogeneity of the population by defining subpopulations having similar behaviour and include a safety margin.
We derived approximate machine cycle numbers, which limit accuracy of execution time estimation. The exact number of cycles is highly dependent on the algorithm implementation and compiler. Thus, our analysis could be integrated with extended target-dependent hardware and machine instruction simulators. Another source of inaccuracy are the energy consumption measures, as we did not consider overhead of the electronic circuits. We considered for simplicity only floating point operations. Differentiating between integer and floating point operations would result in higher modelling accuracy. Also, differentiating the range of variables and datatypes may improve the modelling.
The metric set serves as mapping between the design space and the application specifications, whose definition largely depends on the application requirements. Therefore, a direct comparison of metric outcomes is limited. However, depending on the defined metric set, well-determined functional implications and system properties can be identified and compared.
For example, Bharatula et al. [14] defined four conflicting metrics and analysed the inherent trade-off on an activity recognition task: Flexibility, which included estimation of memory demand and µC's operating frequency, electronic packaging, relative recognition performance, and energy consumption measures. The conflicting nature of the four metrics was highlighted as orthogonal, meaning that optimising all the four metrics at the same time is not feasible. Similar to our work, the authors embedded recognition algorithm characteristics in the trade-off analysis, namely classification accuracy. However, we included execution time ET to highlight dynamic design aspects that appear during runtime. The ET metric linked the algorithmic computational complexity with the hardware µC characteristics in time. Understanding of the system temporal constraints enables DynDSE to leverage dynamic system behaviour for context-adaptivity. Azariardi's [38] DSE included ET and classification accuracy in the metric set but omitted energy consumption. The authors were able to estimate temporal constraints for SVM processing and investigated how the DSE solution matches with application requirements and free-living user. However, assumptions were needed to compensate for the missing energy consumption (EC) metric. The application of Beretta et al. [17] consisted of a wearable node transmitting sensor data using compressive sensing. Objectives were the node's energy consumption, the percentage root-mean-square difference (PRD) to approximate the information loss due to compression, the communication delay and the packet error rate (PER) of the radio transmission. The solution space was compared with the one reported by Kumar et al. [39], which optimised only energy consumption and communication delay. Under the same energy consumption and communication delay solution, the PRD and PER were significantly higher. Moreover, Kumar et al. were able to discover only the 2.3% of the solution space with respect to Beretta's work.
From the above comparison, it appears that the descriptive power of the trade-off analysis in DSE depends on a careful selection of the metrics. Neglecting metrics may result in misleading results. The same conclusion can be drawn with regard to our work. For example, consider Figures 9 and 10: When omitting EC, it may seem that the choice of sampling mode does not affect the system behaviour. As the EC is included in our trade-off analysis, it becomes evident how the uniform sampling mode does not provide any feasible configuration. DSE frameworks that consider hardware-software co-design, in principle, achieve higher system performances as a consequence of the flexibility given by a finer model granularity. For example, Shoaib et al. [40] optimised the individual processing stages of a SVM pipeline by exploring hardware architectures based on custom instructions and coprocessor computations. The authors reported a reduction in energy consumption of almost three orders of magnitude compared to that of a low-power µC, as targeted by our work. The energy consumption metric was computed as the sum of several real measurements related to hardware components involved in the SVM processing stage. The design space solutions included specific hardware to run kernel-based classification in varying contexts. The optimisation potential of hardware-software co-design comes at the cost of an expensive design, which includes custom-made platforms, and design space and metrics definition that rely on hardware-specific knowledge. Overall, dynamic system configurations have been rarely considered in DSE for wearable systems. The inclusion of data sampling strategies into the design space enabled us to adapt system designs to context. Moreover, memory demand has been infrequently included into the DSE objective set, although memory limitations are common in µCs and represent a bottleneck for embedded recognition algorithm deployment, as evident from Figures 9 and 10. We argue that memory demand should be considered in the design phase.
This work focuses on one typical wearable IoT application in order to derive a detailed analysis of the design space spanned by two retrieval algorithms, three µCs, and two sampling procedures introducing dynamic variations. Nevertheless, we kept the DynDSE design space formalism general, such that a wide variety of other components, system architectures, metrics, and IoT applications could be explored, including other hardware, data, and recognition algorithms. Thus, the DynDSE approach does not depend on the particular application considered nor does the method require modifications for other applications. Rather, we deem it essential to match the DynDSE approach with appropriate sensor data to drive the simulation.
For larger design spaces than the one considered here, DynDSE may require approximate rules. Nevertheless, the exhaustive search deployed here remains a suitable option for coarse design selections before investigating further design variables in subsequent, local explorations.

Conclusions and Future Work
We introduced a general methodology for multi-objective DynDSE applied to context-adaptive wearable IoT edge devices, which retrieve events from streaming sensor data using pattern recognition algorithms. We provided a formal characterisation of the configuration space given a set of system functionalities, components and their parameters. A constrained optimisation problem was formulated to identify an optimal system configuration according to application-dependent system requirements. The simulation can provide crucial information about the compatibility of system components. The method is particularly suitable to analyse design options at an early stage of the development process, to approximate key system design aspects, e.g., size of wireless battery powered devices, to confirm software and hardware choices under given design constraints, and to review designs under varying data patterns.
Further investigations may consider automated, on-demand resource distribution between functions of an embedded system that incorporates the DynDSE methodology. Dynamic resource management may result in wearable IoT systems that reconfigure themselves at runtime according to dynamic conditions. Furthermore, the increasing ubiquity and interconnection among wearable IoT devices rise concerns about security and privacy, as malicious interactions are more likely to happen. System security objectives could be incorporated into the dynamic optimisation to represent varying privacy concerns. Nevertheless, further research is needed to effectively quantify security and privacy concerns in metrics. Funding: This work was partially funded by the EU H2020 MSCA ITN ACROSSING project (GA no. 616757).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results'.