# A Case Study on Minimum Energy Operation for Dynamic Time Warping Signal Processing in Wearable Computers

Javad Birjandtalab, Qingxue Zhang, Roozbeh Jafari Department of Electrical Engineering University of Texas at Dallas, Richardson, TX 75080–3021 Email: {birjandtalab, qingxue.zhang, rjafari}@utdallas.edu

Abstract- Miniaturization and form factor reduction in wearable computers leads to enhanced wearability. Power optimization typically translates to form factor reduction, hence of paramount importance. This paper demonstrates power consumption analysis obtained for various operating modes in circuits suitable for wearable computers which are typically equipped with sensors that provide time series data (e.g., acceleration, ECG). Dynamic time warping (DTW) is considered a suitable signal processing technique for wearable computers, particularly due to its lower computational complexity requirement and the robustness to speed variations (acceleration and de-acceleration) in time series data. Wearable computers usually have very low computational performance requirements, which is explored in this work to minimize the system level energy consumption. We provide a comparison among three modes of operations, namely minimum energy operating point (MEOP), minimum voltage operation point (MVOP) and nominal voltage operating point (NVOP) all leveraging sleep transistors when circuits are inactive. The results show that the MVOP, in conjunction with sleep transistors, provides the least energy budget and leads to a reduction in energy consumption compared to the MEO, which is known as a suitable operating mode for ultra-low power circuits.

Keywords—Minimum energy operating point; Movement monitoring; Dynamic time warping, Low Performance application

#### I. INTRODUCTION

Wearable computers offer enormous opportunities for monitoring human, activities of daily living and pervasive computing thanks to recent advances in technology miniaturization. However, still several challenges must be addressed before wearable computers are seamlessly integrated into our daily life. The form factor and size of wearable computers impact their wearability. Power optimization not only leads to smaller batteries and smaller systems, but also provides opportunities to leverage new sources of energy harvesting, such as body heat and movement, leading to batteryless units. Energy harvesting from human body is known to provide small power budgets (approximately 10's of  $\mu$ W) [1].

Wearable computers are typically composed of sensors, processing units and wireless transceivers. Despite the wireless transceivers appear to require higher power budgets, duty cycling and activating the wireless transceivers less frequently can reduce their power budget significantly. There are also numerous efforts in reducing the power budget for sensors [2]. The objective of our study is to focus on the processing units and determine the feasible operating mode for circuits that execute signal processing algorithms. There are numerous circuit level techniques that investigate power and performance trade-offs for ultra-low power digital systems [3-5]. Near-threshold operation has shown to mitigate some of the challenges of scaling and steadily increasing of the number of transistors [6]. It is expected that the lower performance applications, in particular in wearable computers, show different energy consumption behaviors in term of optimal operating point. Sub-threshold and near-threshold operations appear to have merits, despite the challenges concerning large delay variations in the presence of process, voltage, and temperature (PVT) variations [7]. Operating at near threshold voltage (NTV), offer a 1-2 orders of magnitude reduction in power [8-10]. The minimum energy operating point mostly occurs in sub-threshold region [6]. Similar investigations have also been carried out for memory modules [11].

Several operating modes for circuits have been offered, each offering advantages and disadvantages in terms of the power, delay and robustness. Nominal voltage operating point (NVOP), operating with a voltage of 1V has been used extensively. Typically, NVOP-based circuits perform the computations as fast as possible and enter the sleep mode, when the circuits are not in operation. NVOP offers several advantages including robustness to PVT variations, and the ability to implement frequency scaling. Minimum energy operating point (MEOP) is another alternative operating at near/sub-threshold region. It has been shown that the total energy consumption including dynamic and leakage for MEOP is minimized; however, the circuits are expected to be operational at all times. If sleep modes are also taken into account, it is not clear if the total energy consumption is still minimum. The third least explored alternative is minimum voltage operation point (MVOP) which reduces the operating voltage aggressively to the smallest possible choice. This paper provides thorough analyses on the effectiveness of each operating mode, in particular targeting applications of wearable computers. Finally, the paper proposes the use of the minimum voltage operation point (MVOP) for several applications of wearable computers instead of the minimum energy operating point (MEOP) which is commonly believed to be the most power efficient operating condition. Dynamic time warping (DTW) is a commonly used algorithm that measures similarity between two vectors or time-series. DTW has been used in various applications including

handwritten signature, speech, and video recognition. It has also been used for human movement monitoring for the applications of health-care and wellness using wearable computers. Several investigations have focused on system level power optimization in DTW-based architectures [12, 13]. In this paper, we consider a system level view and attempt to make a connection between system-level and circuit-level power optimization based on the fact that the performance requirement is relatively low for wearable computers. We study several operating modes and show their effectiveness in circuits that are targeted towards DTW-based architectures. This investigation provides cues for design techniques aimed at creating the future hardware accelerators for wearable computers.

The rest of the paper is structured as follows: In Section II, we present an overview for human movement monitoring and the details of DTW algorithms. In Section III, we present a minimum energy operating point concept. We present the experimental results on the trade-offs between various operating modes in Section V.

## II. BACKGROUND

## A. Human Movement monitoring

Human movement monitoring using wearable computers has been used for gait analysis, rehabilitation, fall prevention, sport medicine and recovery monitoring [14]. Wearable computers, equipped with motion sensors, namely accelerometers, gyroscopes, magnetometers, pressure and flex sensors offer novel paradigms to monitor human movements and extract relevant information about the speed, stability and control when performing the movements. Movement monitoring is performed by observing time-series data generated by sensors, potentially placed at different parts of body (*e.g.*, glasses, watches, and straps placed on ankle). Dynamic time warping has proved to be a very effective technique in processing timeseries data and identifying movements of interest.

## B. Review of Dynamic Time Warping

Dynamic time warping (DTW) is an algorithm for measuring similarity between two sequences which may vary in time or amplitude. We call one time-series sensor input, that is data obtained from motion sensors, and the other time-series templates, that is the pattern of the movement of interest. Assume the sensor input and the template are defined by  $X = x_1, x_2, \dots, x_n$  and  $T = t_1, t_2, \dots, t_m$  respectively. The objective of DTW is measuring the distance D(X, T) between the two sequences, X and T. This is accomplished by finding the optimal warping path in the  $n \times m$  cost matrix F(i, j) where  $0 \le i \le n$ ,  $0 \le j \le m$ . The cost matrix is calculated as follows [15]:

$$F(i,j) = dist(x_i, t_j) + \min \begin{cases} F(i-1, j-1) \\ F(i, j-1) \\ F(i-1, j) \end{cases}$$



Fig. 1. DTW hardware block diagram

The optimal warping path essentially finds the best alignment between the two sequences, such that their distance is minimized, while allowing acceleration or de-acceleration in sequences. This feature in particular is very useful as movements might be performed slower or faster, and the DTW automatically accounts for that. This operation can be accomplished by a series of adders in approach similar to dynamic programming. The hardware block diagram of DTW is shown in Fig. 1. Typically, software based DTW approaches used Euclidean distance; however we opted for Manhattan distance due to its hardware friendliness, not requiring multiplications and square root.

As the optimal warping path is calculated, eventually the final distance cost of the warping path is compared to a threshold. If the distance costs appears to be smaller than the threshold; that indicates that the two time-series are similar to each other, leading to the identification of the movement of interest.

### III. MINIMUM ENERGY OPERATING POINT

## A. Dynamic and leakage energy

Power dissipation and circuit delays are two factors that vary when changing the supply voltage  $V_{DD}$ . The total energy can be divided to dynamic energy and leakage energy. Dynamic energy is represented with:

$$E_{Dyn} = f \times N \times C \times V_{DD}^2$$

Where *f* is switching frequency, N is the number of clock cycles required to complete an operation, C is the total switching capacity, and  $V_{DD}$  is the supply voltage. The dynamic energy decreases when supply voltage is reduced. The leakage energy is modeled using:

$$E_{Leak} = V_{DD} \times I_s \times e^{\frac{V_{gs} - V_{th}}{nV_T}} \times \left(1 - e^{\frac{-V_{ds}}{V_T}}\right) \times T$$

Where  $I_s$  is technology dependent,  $V_{gs}$  is the gate to source voltage,  $V_{th}$  is threshold voltage,  $V_T$  is Thermal voltage, n is sub threshold parameter, and T is the delay of circuit [16]. The delay or the latency of the circuits increases when reducing the voltage, leading to increased leakage energy. The total energy is characterized by the summation of the dynamic and leakage energy. We will later show the trade-offs between the leakage and the dynamic energy and their contribution to the total energy when the supply voltage is varied.

# B. Sleep transistors

Sleep modes, also known as power gating techniques, are utilized to deactivate cells that are idle. Sleep transistors also show similar trade-offs in terms of the delay and the leakage. The key factor is the width of sleep transistors. While larger transistors provide higher leakage, the delay decreases. The product of delay and leakage affect the total leakage energy. Sleeps transistors can be added using various configurations such as footer, header or both [17]. An earlier investigation presents challenges in using sleep transistors [18]. We are using both of header and footer sleep transistors in our architecture. Fig. 2 shows how the sleep transistors are incorporate.



Fig. 2. Notion of sleep transistor

## IV. VARIOUS OPERATING POINTS

A directed acyclic graph (DAG), G(V,E) is used to model our signal processing algorithm, namely, DTW. Each node, V, defines a computation and each edge, E, defines the data dependency and the communication between two nodes. The computation starts when all corresponding inputs are available and it takes a specific amount of time to finish. Each node has an intrinsic delay D. The critical path defines as the longest path from the input to the final output. The critical path delay limits the application throughput and the maximum frequency of operation,  $f_{max}$ . For example, if the critical path delay tends to be 10 $\mu$ S, the maximum operation frequency  $f_{max}$ , will be 100kHz. The application delay, D<sub>app</sub>, as defined in [19], is determined by the application, in particular how frequently the inputs are available and how fast the final results must be computed. For example, in case of wearable computers, if specific sensors are sampled at 20Hz, the output of the DAG or the final results must be determined before the next sample arrives. Therefore, the application delay, Dapp, will be 50ms. As seen from this example, there is often a mismatch between the application delay, Dapp, and the critical path or circuit delay, Dcir (in this example, 50ms vs. 10µS). If the application delay, which would be often fixed, appears to be orders of magnitude larger than the critical path delay, the circuits will complete the computation early and will either have to go to the sleep mode or remain active leaking until the next set of inputs (and computations) are available. This introduces an interesting trade-off to select the operating supply voltage and the critical path delay such that the leakage current, especially for the applications of wearable computers, is minimized. If the circuit operates at the NVOP mode with a small critical path delay (VDD at approximately 1V), while the application delay is a few orders of magnitude larger, the circuit consumes significantly larger leakage energy than the MEOP (VDD at around 300-400mV). If the operating voltage is pushed down to the MVOP (VDD at around 200mV), the critical path delay increases while the leakage current decreases, Please note that for all these scenarios, the application delay remain constant. For example, assume the application delay dictates an adder to operate at 10 kHz ( $D_{app} = 100 \ \mu s$ ). The adder is synthesized and the critical path delay is determined to be 0.2 µs, 50 µs, and 200 us for NVOP, MEOP and, MVOP, respectively. Therefore, the operating point should be selected somewhere between NVOP and MEOP to meet the requirements of the application, that is  $D_{cir} = D_{app} = 100 \ \mu s$ . If the application requires the adder to operate at 1 kHz, the application delay would be 1ms which can be accommodated by all three operating modes (NVOP, MEOP and MVOP). In this case, the optimal operating point could be MVOP since a higher supply voltage leads to increased leakage energy consumption. This will be later shown in this paper. Since the low performance signal processing algorithms for wearable computers typically require long sleep time, larger margins or guard-bands in the clock can be included to compensate for PVT variation. This will not impact the correctness of the signal processing, nor the power analysis. Power reduction is extremely important for wearable computers. If a watch-like wearable computer is expected to operate as a pedometer, a 2X power reduction reduces the number of battery recharges in half. On the other hand, wearable computers typically do not require highly complex

computation in wearable computers. We illustrate our case study focusing on dynamic time warping (DTW) which is commonly used for many wearable computing applications. The critical path delay varies by changing the operating point voltage. Fig. 3 shows the application delay,  $D_{app}$ , along with  $(P_{\nu}, d_{\nu})$ ,  $(P_E, d_E)$ , and  $(P_N, d_N)$  the active power-delay pairs for MVOP, MEOP, and NVOP modes, respectively. The power includes both the leakage and dynamic power when the computation is being executed. Following the completion of the computation, the circuit goes to sleep for the remaining duration of D<sub>app</sub> until the next computation is required. Nominal voltage operating point (NVOP) has the highest power consumption since the leakage and dynamic power both increase by the supply voltage. Minimum voltage operating point (MVOP) has the largest delay since the delay is inverse proportional to the voltage. Minimum energy operating point (MEOP) provides a trade-off between the performance and the delay.

signal processing or computations. These two characteristics

suggest that MEOP and MVOP could be ideal modes for

Please note that for all these scenarios, although the leakage power is reduced close to zero during sleep mode, the amount of leakage is still a function of the supply voltage and increases for higher voltages. We illustrate the total energy for each operating mode by  $E_{total}$ . Dynamic current,  $I_{dyn}$ , and leakage current,  $I_{leak}$ , are both extracted. The total energy is the

summation of the energy during the computation (dynamic and leakage) and during the sleep.

$$P_{leak} = I_{leak} \times V_{DD}$$
$$P_{Dyn} = I_{Dyn} \times V_{DD}$$
$$P = P_{leak} + P_{Dyn}$$

$$E_{total} = P \times D_{cir} + P_{leak} \times (D_{app} - D_{cir})$$



Fig. 3. Power versus delay for different operating mode

With higher performance applications where  $D_{app} \approx D_{cir}$ , the circuit never goes to sleep and therefore  $P_E \times d_E$  is the dominant term, making MEOP a suitable solution. In lower performance applications however where  $D_{app} \gg D_{cir}$ ,  $P_E \times (D_{app} - D_{cir})$  becomes dominant, hence the need to reduce the leakage current as much as possible.

TABLE I. shows several application configurations for DTW algorithm. DTW algorithm is expected to operate on a 3-axis accelerometer, where sampling frequency can vary from 4 to 100Hz. Each sample can have 4, 6 or 12 bits. Four target movements are expected to be detected each having duration of approximately one second. As each sample is acquired, 6 additions and 3 inversions are required per sample. Although the application configurations may vary, our proposed application configurations will facilitate investigating the power consumption for the three operating modes, namely, NVOP, MEOP and MVOP. This analysis can be easily scaled to other application configurations. The notion of minimum voltage operation point is applicable to many low performance configurations in which the application frequency  $D_{app}$  is much smaller than the circuit frequency  $D_{cir}$ .

| TABLE I. | DTW APPLICATION CONFIGURA | TIONS |
|----------|---------------------------|-------|
|          |                           |       |

| Bit resolution           | 4,6,12 bit                 |  |  |
|--------------------------|----------------------------|--|--|
| Input sampling frequency | 4 ,10,100Hz                |  |  |
| Sensor                   | 3-axis accelerometer       |  |  |
| Movement Duration        | 1 Second                   |  |  |
| Number of Movements      | 4                          |  |  |
| Operations per sample    | 6 Additions + 3 Inversions |  |  |

## V. RESULTS

## A. Accuracy Analysis

We create a set of experiments to measure the accuracy of DTW algorithms for various bit resolutions and sampling frequencies when a 3-axis accelerometers is used. Five subjects wear a sensor node on the right thigh and perform two movements thirty times. The target movements are approximately onesecond and include sit-to-stand and kneeling. The samples are acquired from the 3-axes accelerometer with 16-bit resolution. We convert the 16bit input data to 12bit, 6bit, and 4bit data. The sampling frequency for the experiments was set at 100 Hz. To create 10Hz and 4Hz sampling frequencies, the original data were down sampled. In order to show the accuracy of DTWbased movement recognition algorithm for various bit resolution and sampling frequencies, we report the precision and recall as two significant accuracy metrics. We assume  $t_p$  as the true positive, that is number of correct target movements detected,  $t_n$  as the number true negatives,  $f_n$  as the number of false negatives and  $f_p$  as the number of false positives. The precision and recall are defined as:

$$Precision = \frac{t_p}{t_p + t_n}$$
$$Recall = \frac{t_p}{t_n + f_n}$$

As the bit resolution and sampling frequency are decreased, it is expected that the precision and recall decrease. It is also expected that the power consumption would decrease. TABLE II. and TABLE III. show the precision and recall for various bit resolution and input frequency for sit-to-stand and kneeling movements.

TABLE II. DETECTION OF SIT-TO-STAND (PRECISION AND RECALL)

|       | 12Bit     |        | 6Bit      |        | 4Bit      |        |
|-------|-----------|--------|-----------|--------|-----------|--------|
|       | Precision | Recall | Precision | Recall | Precision | Recall |
| 100Hz | 96.77     | 96.77  | 96.77     | 96.77  | 96.77     | 96.77  |
| 10Hz  | 96.77     | 96.77  | 93.75     | 96.77  | 90.90     | 96.77  |
| 4Hz   | 93.77     | 96.77  | 90.90     | 96.77  | 90.90     | 96.77  |

TABLE III. DETECTION OF KNEELING (PRECISION AND RECALL)

|       | 10D:+     |        | (D:+      |        | 4D:+      |        |
|-------|-----------|--------|-----------|--------|-----------|--------|
|       | 12Bit     |        | 6B1       |        | 4B1t      |        |
|       | Precision | Recall | Precision | Recall | Precision | Recall |
| 100Hz | 100       | 100    | 100       | 100    | 90.47     | 100    |
| 10Hz  | 86.36     | 100    | 79.16     | 100    | 72.72     | 84.21  |
| 4Hz   | 68.00     | 89.47  | 65.38     | 89.47  | 52.00     | 68.42  |

# B. Energy Analysis

We create the behavioral code for DTW in Verilog. The logic blocks are also verified by the Hspice. The DTW core was synthesized using the TI 45nm technology process. In order to extract leakage energy, we calculate the leakage current in idle time. We extracted the longest path using statistical timing analysis which is done by the Synopsys Prime-time tool. Statistical timing analysis calculates the delay of each internal logic unit and computes the expected timing of the entire circuit. Next, we identified the critical path. All paths from primary input to primary output are used in worst-case delay extraction. We use the same net-list for all operating modes since there is no considerable difference in area when implement circuits with different very low frequency specifications. Fig. 4 shows the critical path delay D<sub>cir</sub> for various bit resolution and operating modes. The sub-threshold region leads to longer delays, almost 3 or 4 order of magnitudes larger than nominal voltage delay. Please note that the delays are shown on a logarithmic scale. We assumed the clock frequency would be twice as much as the critical path delay. Under this assumption, we calculated the leakage current,  $I_{leak}$ , by averaging the current in idle periods. We also calculate the total current,  $I_{total}$ , when the circuit was operational, and the dynamic current,  $I_{Dynamic}$ , was determined by subtracting the leakage current from the total current. We applied several random test signals and measured both the dynamic and leakage currents by averaging the current over all test signals.

Fig. 5 shows energy consumption for supply voltages ranging from 200 to 800mV. We observe that the minimum energy point occurs at 350mV and the corresponding energy per DTW operation is 43.8fJ. The leakage energy is increased for subthreshold voltages, as the supply voltage decreases, as expected. Please note that these measurements do not include sleep modes. In other words, Fig. 5 shows the total energy consumption versus supply voltage with the assumption that the circuit delay and the application delay are equal ( $D_{cir} = D_{app}$ ). In this case, total energy consumption at 0.2 V and 0.6 V are the same. Please note that the leakage energy will be higher at 0.6V compared to 0.2 V. Therefore, operating at 0.6 V does not lead to better results at lower frequencies especially when  $D_{app} >>$  $D_{cir}$ .

We calculate the total energy in three different operating modes: NVOP+Sleep, MEOP+Sleep and MVOP+Sleep. We repeat simulations for different sampling frequencies and different bit resolutions. We considere 4Hz, 10Hz, and 100Hz for data sampling frequency, which was subsequently used to determine  $D_{app}$  (250ms, 100ms and 10ms, respectively).



Fig. 4. Critical path delay for variouse operating modes



Total, dynamic, and leakage energy of DTW

Fig. 5.



Fig. 6. Energy consumption for 4-bit DTW in various operating modes



Fig. 7. Energy consumption for 6-bit DTW in various operating modes



Fig. 8. Energy consumption for 12-bit DTW in various operating modes

Fig. 6, Fig. 7 and Fig. 8 show the total energy consumption for various operating mode for 4bit and 6bit and 12 bit DTW modules for detecting one target movement, respectively. The energy consumption is shown in logarithmic scale. Decreasing the sampling frequency does not significantly impact the energy consumption as the idle time ( $D_{app} - D_{cir}$ ) appears to be the dominating factor. There is almost two orders of magnitude difference in energy consumption between NVOP and MEOP. Operating in MEOP or MVOP depends on the application delay requirements. The results shows that for 4Hz and 10Hz sampling frequencies which are both relative slow DTW, MVOP appears to be more suitable with an average of 50% improvement while as the sampling frequency increases to 100Hz, MEOP wins showing 10% energy reduction compared to MVOP.

# VI. CONCLOSION

Wearable computers introduce an interesting class of computing where the computational requirements are modest and the power optimization is of significant importance. In this paper, we consider a popular signal processing algorithm for wearable computers called dynamic time warping. We investigate several operating modes namely nominal voltage operating point (NVOP), where the VDD is approximately 1V, minimum energy operating point (MEOP), where the VDD is approximately 300-400mV, and minimum voltage operating point (MVOP), where the VDD is approximately 200mV. We synthesize a hardware module core for DTW and showed that MVOP appears to be most suitable where the leakage (and the total power consumption) is reduced compared to the MEOP. The total power consumption corresponding to the NVOP appears to be two to three orders of magnitude higher than the MEOP and MVOP, making it a less appealing alternative. NVOP is the current practice in state-of-the-art low-power microcontrollers, which leads to completing the computations as fast as possible and going into the sleep.

#### VII. ACKNOWLEDGMENT

This work was supported in part by the National Science Foundation, under grant CNS-1150079. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding organizations.

#### References

[1] J. Olivo, D. Brunelli, and L. Benini, "A kinetic energy harvester with fast start-up for wearable body-monitoring sensors," in *Pervasive Computing Technologies for Healthcare (PervasiveHealth), 2010 4th International Conference on-NO PERMISSIONS*, pp. 1–7, March 2010.

[2] H. Ghasemzadeh, E. Guenterberg, K. Gilani, and R. Jafari, "Action coverage formulation for power optimization in body sensor networks," in *Design Automation Conference, 2008. ASPDAC 2008. Asia and South Pacific*, pp. 446–451, March 2008.

[3] A. Chandrakasan, S. Sheng, and R. Brodersen, "Low-power cmos digital design," *Solid-State Circuits, IEEE Journal of*, vol. 27, pp. 473–484, Apr 1992.

[4] R. W. Brodersen, Low power digital CMOS design. Springer, 1995.

[5] A. Tajalli and Y. Leblebici, "Design trade-offs in ultra-low-power digital nanoscale cmos," *Circuits and Systems I: Regular Papers, IEEE Transactions on*, vol. 58, no. 9, pp. 2189–2200, 2011.

[6] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, "Near-threshold computing: Reclaiming moore's law through energy efficient integrated circuits," *Proceedings of the IEEE*, vol. 98, no. 2, pp. 253–266, 2010.

[7] D. Markovic, C. C. Wang, L. P. Alarcon, T.-T. Liu, and J. M. Rabaey, "Ultralow-power design in near-threshold region," *Proceedings of the IEEE*, vol. 98, no. 2, pp. 237–252, 2010.

[8] S. Hanson, M. Seok, Y.-S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw, "A low-voltage processor for sensing applications with picowatt standby mode," *Solid-State Circuits, IEEE Journal of*, vol. 44, no. 4, pp. 1145–1155, 2009.

[9] M. Pons, J.-L. Nagel, D. Severac, M. Morgan, D. Sigg, P.-F. Ruedi, and C. Piguet, "Ultra low-power standard cell design using planar bulk cmos in subthreshold operation," in *Power and Timing Modeling, Optimization and Simulation (PATMOS), 2013 23rd International Workshop on*, pp. 9–15, IEEE, 2013.

[10] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, D. Blaauw, and T. Austin, "A 2.60 pj/inst subthreshold sensor processor for optimal energy efficiency," in *VLSI Circuits, 2006. Digest of Technical Papers. 2006 Symposium on*, pp. 154–155, IEEE, 2006.

[11] Y. Chen, Z. Yu, H. Nan, and K. Choi, "Ultralow power sram design in near threshold region using 45nm cmos technology," in *Electro/Information Technology (EIT), 2011 IEEE International Conference on*, pp. 1–4, IEEE, 2011.

[12] R. Jafari and R. Lotfian, "A low power wake-up circuitry based on dynamic time warping for body sensor networks," in *Body Sensor Networks* (*BSN*), 2011 International Conference on, pp. 83–88, IEEE, 2011.

[13] R. Lotfian and R. Jafari, "An ultra-low power hardware accelerator architecture for wearable computers using dynamic time warping," in *Proceedings of the Conference on Design, Automation and Test in Europe*, pp. 913–916, EDA Consortium, 2013.

[14] L. Atallah, G. G. Jones, R. Ali, J. J. Leong, B. Lo, and G.-Z. Yang, "Observing recovery from knee-replacement surgery by using wearable sensors," in *Body Sensor Networks (BSN), 2011 International Conference on*, pp. 29–34, IEEE, 2011.

[15] M. Muller, "Dynamic Time Warping," in Information retrieval for music and motion, Springer, 69-84, 2007

[16] A. Wang and A. Chandrakasan, "A 180-mv subthreshold fft processor using a minimum energy design methodology," *Solid-State Circuits, IEEE Journal of*, vol. 40, no. 1, pp. 310–319, 2005.

[17] K. Agarwal, K. Nowka, H. Deogun, and D. Sylvester, "Power gating with multiple sleep modes," in *Proceedings of the 7th International Symposium on Quality Electronic Design*, pp. 633–637, IEEE Computer Society, 2006.

[18] K. Shi and D. Howard, "Challenges in sleep transistor design and implementation in low-power designs," in *Proceedings of the 43rd annual Design Automation Conference*, pp. 113–116, ACM, 2006.

[19] D. Markovic and R. W. Brodersen, *DSP Architecture Design Essentials*. Springer, 2012.