# Optimizing the Power Delivery Network in a Smartphone Platform 

Woojoo Lee, Student Member, IEEE, Yanzhi Wang, Student Member, IEEE, Donghwa Shin, Member, IEEE, Naehyuck Chang, Fellow, IEEE, and Massoud Pedram, Fellow, IEEE


#### Abstract

Smartphones consume a significant amount of power. Indeed, they can hardly provide a full day of use between charging operations even with a 2000 mAh battery. While power minimization and dynamic power management techniques have been heavily explored to improve the power efficiency of modules (processors, memory, display, GPS, etc.) inside a smartphone platform, there is one critical factor that is often overlooked: the power conversion efficiency of the power delivery network (PDN). This paper focuses on dc-dc converters, which play a pivotal role in the PDN of the smartphone platform. Starting from detailed models of the dc-dc converter designs, two optimization methods are presented: 1) static switch sizing to maximize the efficiency of a dc-dc converter under statistical loading profiles and 2) dynamic switch modulation to achieve the high efficiency enhancement under dynamically varying load conditions. To verify the efficacy of the optimization methods in actual smartphone platforms, this paper also presents a characterization procedure for the PDN. The procedure is as follows: 1) group the modules in the smartphone platform together and use profiling to estimate their average and peak power consumption levels and 2) build an equivalent dc-dc converter model for the power delivery path from the battery source to each group of modules and use linear regression to estimate the conversion efficiency of the corresponding equivalent converter. Experimental results demonstrate that the static switch sizing can achieve $6 \%$ power conversion efficiency enhancement, which translates to $19 \%$ reduction in power loss general usage of the smartphone. The dynamic switch modulation accomplishes similar improvement at the same condition, while also achieving high efficiency enhancement in various load conditions.


[^0]Index Terms-DC-DC power converter, low-power design, power delivery network (PDN), smartphone.

## I. INTRODUCTION

GROWING demand for increased smartphone functionality and the need to support all kinds of popular applications on the smartphone have been driving the trend toward including many high-performance modules (such as high-speed processors, fast wireless interface, large and high resolution display, sophisticated sensors) on the smartphone platform. The usability of smartphones has, however, been hampered by their low service time between successive charging operations. This is because the electrical energy storage density of modern batteries has been advancing at a relatively low pace compared to rate at which functional and performance improvements have been made to the smartphone platform and components. The latter, however, comes at the expense of increased power consumption in the smartphone platform.

Consequently, there has been a surge of interest in reducing power consumption of the smartphone platform. Some recent works have focused on developing power macromodels for the modules in the smartphone platforms [2]-[5]. Similarly, dynamic power management (DPM) techniques [6], [7] have been widely investigated and employed in various platforms, including smartphones.

While power modeling and DPM in the smartphone platforms have been heavily investigated, there is one critical factor that has often been overlooked, and that is the power conversion efficiency of the power delivery network (PDN) in smartphones. The PDN provides the battery power to all the modules. The conceptual diagram of the PDN in Fig. 1 shows that it consists of dc-dc converters. In reality, dc-dc converters in the PDN of a smartphone inevitably dissipate power, and power dissipations from all converters inside the platform can result in a considerable amount of power loss. Given that the overall PDN's power efficiency is the ratio of the power consumed by all the smartphone modules to the power drawn from the smartphone battery, Fig. 2 shows that the overall power efficiency of a real smartphone platform is around $60 \%-75 \%$. Improving the power conversion efficiency can thus ensure appreciably longer battery life. This paper focuses on power conversion efficiency in the smartphone platform and introduces an optimization procedure for improving it.

Modern dc-dc converters exhibit high peak power conversion efficiency, but their efficiency can drop dramatically


Fig. 1. Conceptual diagram of the PDN in a smartphone platform.


Fig. 2. Measure of traces of the power conversion efficiency of the PDN in the Qualcomm Snapdragon MDP MSM8660.
under adverse load conditions (i.e., out-of-range output current levels) [1], [8]. In other words, a state-of-the-art switching dc-dc converter can exhibit low conversion efficiency when there is a mismatch between the converter characteristics and its load. To tackle this drawback, a few approaches have been proposed. To design an optimal structure of the dc-dc converters, two methods have been introduced [9], [10]. A multiswitching scheme has been proposed to adaptively change the converter characteristics according to the load conditions [11], [12]. Component tuning in a dc-dc converter to ensure that the converter operates with high efficiency under the given load condition has been suggested in [1] and [13].

Starting with detailed models of the dc-dc converter designs, this paper presents two optimization methods to minimize the power loss due to the dc-dc converters according to the load conditions. First, we propose a static switch sizing (S3) method. The objective is to statically perform optimal sizing on the output stage drivers of the converter (i.e., the power MOSFET switches) at design time, according to statistical information about the load behavior. Next, we extend the multiswitching scheme to adaptively turn on/off the switches inside the dc-dc converter, depending on the required amount of load current. This method, called dynamic switch modulation (DSM), enables the dynamic control of the dc-dc converter so as to minimize its power loss under dynamically changing load conditions. This paper provides sophisticated control policies of the multiple switches as well as design optimization algorithms to find the number of switches and their optimum sizes.

To apply the proposed optimization methods to the actual smartphone platform, we perform the PDN characterization [1]. This paper proposes a characterization procedure, based on: 1) development of an equivalent dc-dc converter model; 2) module grouping; and 3) linear regression. The proposed equivalent dc-dc converter model can effectively model different types of converters and their cascade connections to represent a power delivery path from the battery cell to a
collection of load devices. Each equivalent dc-dc converter model has its own conversion efficiency coefficients, and we perform characterization to identify these coefficients. The module grouping procedure enhances the accuracy of linear regression used for the conversion efficiency characterization.

This paper also provides extensive experimental results. We verify the accuracy of power conversion efficiency characterization with real measurement data. The results point out to the fact that power conversion efficiency of the target smartphone platform is quite low. Next, the load current profiles for each module in the smartphone platform are collected. Finally, we apply the two proposed optimization methods (i.e., S3 and DSM) to ensure that the converters operate at the most energyefficient points. The experimental results demonstrate that the S3 achieves 6\% overall efficiency enhancement, which translates to $19 \%$ power loss reduction for the general smartphone usage pattern. The results of DSM show that it can accomplish the efficiency enhancements as high as the S3. Furthermore, DSM can perform the efficiency enhancement for the whole load current range conditions.

The remainder of this paper is organized as follows. Section II provides some background on the dc-dc converter model. In Section III, the two optimization methods are presented. Section IV introduces the characterization procedures of the power conversion efficiency. Section V is dedicated to the experimental work, while Section VI concludes this paper.

## II. DC-DC CONVERTER MODEL

Typical dc-dc converters in the smartphone platforms can be classified into three types, inductive dc-dc converters, lowdropout linear regulator (LDO), and capacitive dc-dc converters, according to the circuit implementation and operation principles. The inductive dc-dc converters achieve very high power conversion efficiencies for wide range of their output loads. These types of converters can step up the output voltage so that it becomes higher than the input voltage (i.e., boost), or step down the output voltage so that it is lower than the input voltage (i.e., buck). On the other hand, the output voltage of an LDO can only be lower than its input voltage. In general, LDOs are good at low-noise output voltage, low areaoverhead, and ease of integration. However, their limitation of low power conversion efficiencies makes them normally used to provide power for some noise-sensitive RF or analog modules in smartphones. The capacitive dc-dc converters have lower area overhead than the inductive dc-dc converters, and achieve better power conversion efficiency than LDOs. However, unlike the inductive dc-dc converters where the power conversion efficiencies depend only on parasitics of their components, the conversion efficiencies of the capacitive dc-dc converters are limited by their output resistance. Thus, it drops significantly as the conversion ratio moves away from the ideal ratio of a given topology and operating mode [14]. In this paper, we consider only the buck type inductive dc-dc converters and the LDOs. Using those dc-dc converters can appropriately provide the low-noise output voltages with high conversion efficiency to the various modules in the smartphone platforms.


Fig. 3. Circuit diagram of a buck-type inductive dc-dc converter.

## A. Inductive $d c-d c$ Converter Model

The inductive dc-dc converter consists of an inductor, a capacitor, two MOSFET switches, and a pulse-widthmodulation (PWM) controller. Fig. 3 shows the simplified schematics of the buck type inductive dc-dc converter (simply called dc-dc converter in the remainder of this paper). The PWM appropriately charges or discharges the output node to keep the output voltage of the converter at a desired target level. The high frequency switching noise is rejected by the L-C filter, whereas a small but important portion of the noise appears as output voltage ripples. Major power losses arise from the on-resistance of power switches and the parasitic resistance of passive elements in the design.

In Fig. 3, the pMOS switch is shown as swl. Its ONresistance and ON -state gate charge are denoted by $R_{s w 1}$ and $Q_{s w 1}$, respectively. Similarly, the nMOS switch, shown as $s w 2$ in the figure, has an ON-resistance $R_{s w 2}$ and gate charge $Q_{s w 2}$, accordingly. Parasitic series resistances of the inductor, $L$, and the capacitor, $C$, are denoted by $R_{L}$ and $R_{C}$, respectively. Depending on the physical source of power consumption, the equation for the dc-dc converter power losses may be derived from the following three models: conduction loss, switching loss, and controller power consumption, denoted by $P_{\text {conduction }}, P_{\text {switching }}$, and $P_{\text {controller }}$, respectively [1], [8]. The power loss in the dc-dc converter, $P_{\text {loss }}$, is the sum of the three terms

$$
\begin{align*}
P_{\text {inductive }}= & P_{\text {conduction }}+P_{\text {switching }}+P_{\text {controller }}  \tag{1}\\
= & I_{\text {out }}^{2}\left(R_{L}+D R_{s w 1}+(1-D) R_{s w 2}\right) \\
& +(\Delta I)^{2}\left(R_{L}+D R_{s w 1}+(1-D) R_{s w 2}+R_{C}\right) / 12 \\
& +V_{i n} f_{s w}\left(Q_{s w 1}+Q_{s w 2}\right)+V_{\text {in }} I_{\text {controller }} \tag{2}
\end{align*}
$$

where the first and second terms of (2) account for dc and ac conduction losses, respectively, the third and fourth terms of (2) are the switching loss and controller power consumption, respectively, $I_{\text {out }}$ is the output current, $V_{\text {in }}$ and $V_{\text {out }}$ are the input and output voltages, $D$ and $(1-D)$ are the PWM duty ratios of the pMOS and nMOS switches, respectively, $f_{s w}$ is the switching frequency, $I_{\text {controller }}$ is the current used in the control logic section of the converter, and $\Delta I=(1-D) V_{\text {out }} /\left(L_{f} f_{s w}\right)$ is the amplitude of the maximum current ripple at the inductor.

Finally, the conversion efficiency of a dc-dc buck converter, $\eta$, can be written as

$$
\begin{equation*}
\eta_{\text {inductive }}=\frac{P_{\text {out }}}{P_{\text {in }}}=\frac{V_{\text {out }} I_{\mathrm{out}}}{V_{\text {out }} I_{\mathrm{out}}+P_{\text {inductive }}} 100(\%) . \tag{3}
\end{equation*}
$$



Fig. 4. Circuit diagram of a low-dropout linear regulator (LDO).

From (2), the power losses due to the pMOS switch, $P_{\text {nMOS }}$, and nMOS switch, $P_{\text {nMOS }}$, may be expressed as

$$
\begin{align*}
& P_{\mathrm{pMOS}}=C_{\mathrm{ox}} W_{p} L_{\min } \frac{m}{m-1} V_{i n}^{2} f_{s w}+\frac{D I_{\mathrm{out}}^{2}}{\mu_{p} C_{\mathrm{ox}} \frac{W_{p}}{L_{\min }}\left(V_{\mathrm{in}}-\left|V_{\mathrm{pth}}\right|\right)}  \tag{4}\\
& P_{\mathrm{nMOS}}=C_{\mathrm{ox}} W_{n} L_{\min } \frac{m}{m-1} V_{i n}^{2} f_{s w}+\frac{(1-D) I_{\mathrm{out}}^{2}}{\mu_{n} C_{o x} \frac{W_{n}}{L_{\min }}\left(V_{\mathrm{in}}-V_{n t h}\right)} . \tag{5}
\end{align*}
$$

In (4) and (5), $C_{\mathrm{ox}}$ is the gate capacitance per unit area. $W_{p}$ is the gate width of the pMOS power FET, and $W_{n}$ is the gate width of the nMOS power FET. $L_{\text {min }}$ is the minimum gate length of the given technology. $\mu_{p}$ is the hole mobility in the pMOS device, and $\mu_{n}$ is the electron mobility in the nMOS device. $V_{p t h}$ and $V_{n t h}$ are the threshold voltages of the pMOS and nMOS devices, respectively. $m$ is the tapering factor for the (super buffer-like) gate driver of the power FETs. The output ripple of the converter, $\Delta V$, is strictly limited by the normal operating conditions of the processor. Typically, $\Delta V$ must be less than $10 \%$ of the nominal output level. The PWM frequency, $f_{s w}$, and values of the passive components $L$ and $C$ significantly affect the magnitude of $\Delta V$. Using the same notation as in the previous subsection, $\Delta V$ may be expressed as [15]

$$
\begin{equation*}
\Delta V=\frac{\left(V_{\mathrm{out}}+V_{s w 2}+V_{L}\right)\left(1-\frac{V_{\mathrm{out}}+V_{s w 2}+V_{L}}{V_{i n}-V_{s w 1}+V_{s w 2}}\right)}{8 L C f_{s w}^{2}} \tag{6}
\end{equation*}
$$

where $V_{s w 1}, V_{s w 2}$, and $V_{L}$ are the voltage-drops by $s w 1, s w 2$, and $L$, respectively.

According to (4), (5), and (6), the higher $f_{s w}$ is, the smaller $\Delta V$ is, but the power dissipation $P_{\text {switching }}$ goes up. On the other hand, a smaller value of $f_{s w}$ gives rise to a need for bigger $L$ or $C$ in order to meet the specified $\Delta V$ requirement.

## B. LDO Power Loss Model

A typical LDO consists of an error amplifier, a pass transistor, and a feedback resistor network. The power loss of the LDO, denoted by $P_{L D O}$, is given by

$$
\begin{equation*}
P_{L D O}=I_{\text {out }}\left(V_{\text {in }}-V_{\text {ref }} \sigma\right)+I_{q} V_{\text {in }} \tag{7}
\end{equation*}
$$

where $V_{r e f}$ is the reference voltage in the error amplifier, $\sigma=\left(R_{1}+R_{2}\right) / R_{2}$ corresponds to the voltage divider's gain coefficient, and $I_{q}$ denotes the quiescent current of the LDO. Unlike the switching converter in which the MOSFET switches dominate the total power loss, the pass transistor in


Fig. 5. Load current distributions of one core in MSM 8660 and a result of the derived $f\left(I_{\text {out }}\right)$.
the LDO has a negligible impact on its total power loss [8]. Therefore, the power loss due to internal resistance of the pass transistor does not need to be explicitly accounted for in the model. Thus the conversion efficiency of the LDO, $\eta_{L D O}$, may be expressed as

$$
\begin{equation*}
\eta_{L D O}=\frac{V_{\mathrm{out}} I_{\mathrm{out}}}{V_{\text {in }} I_{\text {in }}}=\frac{\sigma V_{\mathrm{ref}} I_{\mathrm{out}}}{V_{\text {in }}\left(I_{\mathrm{out}}+I_{q}\right)} \tag{8}
\end{equation*}
$$

## III. DC-DC Converter Optimization

Optimizing dc-dc converters has the goal of reducing the power losses without incurring any performance degradation. This is because, unlike typical low-power design techniques that often exploit a tradeoff between performance, service


Fig. 6. Statistical data for the smartphone usages patterns, sourced from [19]. (a) Pattern I. (b) Pattern II.
quality, and power efficiency, the converter optimization technique does not shut off or slow down the overall system.

Enhancement of the overall efficiency of a dc-dc converter can greatly increase the overall system power efficiency [16], [17]. DC-DC converters show very high overall efficiency under desirable operating conditions. However, their efficiency can be low if they are operating outside the recommended range of input and output voltages and load currents [1], [8]. Therefore, ensuring that each dc-dc converter in the system is operating under the desirable operating conditions is an effective way of improving the system power efficiency. For example, [9] presents a dynamic programming-based approach to design the structure of the PDN in a system, while at the same time selecting the most suitable dc-dc converter or LDO for each node of the PDN. Reference [18] proposes the concept of parallel connections of high frequency dc-dc converters for distributed energy storage systems. In contrast, this paper starts with a fixed conversion tree structure, but performs MOSFET switch reconfiguration based on the load current demands and converter characteristics, so as to improve the overall power conversion efficiency in a smartphone platform.

## A. Static Switch Sizing (S3)

Gate widths of the switches have a substantial impact on the efficiency of the dc-dc converter. From (4) and (5), $P_{\text {pMOS }}$ and $P_{\mathrm{nMOS}}$ are convex functions of the change in gate width. The smaller gate width reduces the switching loss, but increases the conduction loss, and vice versa for the larger gate width. For a given $I_{\text {out }}$, the function to find the optimum pMOS gate width is thus obtained by solving $d P_{\mathrm{pMOS}} / d W_{p}=0$ [12], [19]

$$
\begin{equation*}
W_{p, o p t}\left(I_{\mathrm{out}}\right)=\frac{I_{\mathrm{out}}}{C_{\mathrm{ox}} V_{\mathrm{in}}} \sqrt{\frac{D(m-1)}{\mu_{p}\left(V_{i n}-\left|V_{p t h}\right|\right) f_{s w} m}} . \tag{9}
\end{equation*}
$$

The function to find the optimum nMOS gate width is derived in a similar manner, and its expression is as follows:

$$
\begin{equation*}
W_{n, o p t}\left(I_{\mathrm{out}}\right)=\frac{I_{\mathrm{out}}}{C_{o x} V_{i n}} \sqrt{\frac{(1-D)(m-1)}{\mu_{n}\left(V_{i n}-V_{n t h}\right) f_{s w} m}} . \tag{10}
\end{equation*}
$$

It is important that the obtained optimum gate widths from (9) and (10) satisfy a design constraint whereby the resulting output ripple of the converter, $\Delta V$, is less than its allowed limit. As described in (6), changing the switch sizes can affect $\Delta V$. If the derived optimum switch sizes violate the $\Delta V$ constraint, we will increase $L$ or $C$ for the converter. Finally,
the power loss of the converter in (2) is recalculated to ensure that the overall transistor sizing plus potential change to $L$ and $C$ reduce the net power loss. For reference, our experimental work in this paper shows that the worst case of $\Delta V$ increment from the default switch sizes to the optimum switch sizes is $14 \%$. In other words, if $\Delta V$ for the default switch sizes is $5 \%$, and then the resulting $\Delta V$ should be less than $5.7 \%$ (i.e., $5+5 \cdot 0.14$ ). We thus assume $\Delta V$ changes are enough small to satisfy the voltage ripple constraints. Detailed results of the $\Delta V$ increment are presented in Section V-D.

In (9) and (10), the optimum gate widths are derived for a fixed output current, $I_{\text {out }}$. However, $I_{\text {out }}$ in the smartphone is different depending on its usage pattern. Therefore, the goal here is to find the optimum gate widths such that the high-conversion-efficiency operating conditions for the converter match with the current distribution that is produced by the actual usage profile of common smartphone applications. The optimization objective is thus to maximize the overall conversion efficiency of the smartphone based on its typical (expected) daily usage. Treating the total current used in the smartphone as a continuous random variable, we denote its probability density function by $f\left(I_{\text {out }}\right)$. Because there are many mobile use cases generating various $I_{\text {out }}$ distributions, finding a general case of $f\left(I_{\text {out }}\right)$ is challenging. We propose a method utilizing the statistical data of mobile device usage patterns and measured data from running mobile applications as benchmarks as detailed next. First, we obtain a fine-grained classification of diverse mobile use cases. Next, we find mobile applications, representing each distinct class of use cases. We perform extensive measurement of output currents of the dc-dc converters in the smartphone platform when different applications are running. In addition, to derive the correct probability distribution of $f\left(I_{\text {out }}\right)$, we acquire the average runtime of each class of use cases (applications) from the previous studies published in [20]-[22].
Fig. 5 shows example results of the derived $f\left(I_{\text {out }}\right)$ distribution for a processor core in the Qualcomm's MDP. To derive $f\left(I_{\text {out }}\right)$, we ran ten representative mobile applications (they are call, Facebook, Skype-videochat, clock, camera, Google-map, Neocore, SMS, system setting, and Youtube) on the MDP. Next, we classified the ten applications into seven classes presented in [21]: 1) communication (contains SMS, call, and Skype-videochat); 2) browsing (contains Web browsing); 3) media (contains camera and Youtube); 4) productivity (contains clock); 5) system (contains system setting); 6) games (contains Neocore); and 7) maps (includes Google Maps). We determined the average usage time of each class of applications based on the statistical data for the mobile device usage patterns [21]. As shown in Fig. 6, the reference introduced two representative smartphone use patterns (patterns I and II), each of which has its own proportions of the usage time for the aforesaid application classes.

From the derived $f\left(I_{\text {out }}\right),(9)$ is modified to find the expected value of the optimum pMOS width from the $\mathrm{S} 3\left(W_{p, S 3}\right)$

$$
\begin{equation*}
W_{p, S 3}=\frac{\sqrt{\int I_{\mathrm{out}}^{2} f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}}}}{C_{o x} V_{\text {in }}} \sqrt{\frac{D(m-1)}{\mu_{p}\left(V_{i n}-\left|V_{p t h}\right|\right) f_{s w} m}} . \tag{11}
\end{equation*}
$$



Fig. 7. Circuit diagram for dynamic switch modulation.


Fig. 8. Concept of DSM operation with two parallel-connected pMOS switches.

Similarly, the expected value of the optimum nMOS width from the $\mathrm{S} 3\left(W_{n, S 3}\right)$ can be calculated as follows:

$$
\begin{equation*}
W_{n, S 3}=\frac{\sqrt{\int I_{\mathrm{out}}^{2} f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}}}}{C_{o x} V_{\text {in }}} \sqrt{\frac{(D-1)(m-1)}{\mu_{n}\left(V_{i n}-V_{n t h}\right) f_{s w} m}} . \tag{12}
\end{equation*}
$$

## B. Dynamic Switch Modulation (DSM)

The S3 is only applicable when the load condition is given a priori. Any fixed sizing solution tends to result in suboptimal dc-dc conversion efficiency under dynamically changing load conditions, which may be very different from the one for which the static sizing solution was originally obtained. Furthermore, the higher the variance of the load current distribution is, the lower is the guarantee of optimality of the S3 solution. The optimum efficiency under dynamically changing load conditions can be obtained by adaptively turning on or off some of the multiple parallel-connected switches [11], [12]. However, the different gate voltages needed for each switch set in [11] require additional dc-dc converters, which tends to cause area/control overheads. Furthermore, the number of switches (which was fixed to three in [11] and [12]) and their sizes should be determined judiciously in order to achieve the maximum efficiency under given design specifications (i.e., for possible ranges of the load currents of various smartphone modules). Our proposed approach is an extension of the multiple switch scheme, which we call DSM. This task is to find the optimum number of parallel-connected output driver switches, their sizes, and on/off conditions under dynamically varying load conditions.

Fig. 7 shows a simple schematic drawing of the loadadaptive dc-dc converter. There are $N$ pairs of switches connected in parallel. These switches are arranged such that the first switch has the minimum width (denoted by $W_{p 1}$ and $W_{n 1}$ ), and the last switch has the maximum width (denoted


Fig. 9. Simulated power conversion efficiencies by changing the widths of the pMOS switch in Fig. 3.
by $W_{p N}$ and $W_{n N}$ ). The maximum effective width (i.e., the sum of widths of all parallel-connected FETs of the same type) is large enough to support the maximum output current, $I_{\text {out, max }}$. For a smaller $I_{\text {out }}$ value, some of the nMOS and pMOS switches are turned off. Depending on the $I_{\text {out }}$ value, a different on/off combination of the switches can be used to achieve the maximum dc-dc conversion efficiency (which is equivalent to minimizing $P_{\mathrm{pMOS}}$ and $P_{\mathrm{nMOS}}$ ).

We denote the effective width of the turned-on switch combination as $W_{\text {eff } f \text { type, } i}$, where type implies the switch type, i.e., $p$ (pMOS) or $n$ (nMOS), and $i$ denotes the $i$ th smallest effective width for the switch configuration (among all possible combinations of the same type of switch).

Fig. 8 is an example of the DSM on a dc-dc converter using two parallel-connected pMOS switches, which can independently be turned on or off at any time. The two pMOS switches give rise to three effective widths for the pMOS switch, $W_{e f f, p, 1}$, $W_{e f f, p, 2}$ and $W_{e f f, p, 3}$. Consequently, the output current range is divided into three operation ranges. The result of DSM in the figure, identified as a thick (red) line, shows that the maximum efficiency in each output current range is achieved by adaptively turning on the appropriate combination of two pMOS switches. It then follows that, for each output current range, the optimum switch combination must be found.

Note that the output current range can be divided into a larger number of bins by increasing the number of parallelconnected switches of the same type. A larger bin count greatly increases the flexibility to achieve high efficiency over a wider range of output current values. However, the increased area and power consumption due to higher complexity of the control circuitry is an important consideration in determining the optimal number of switches $(N)$. To determine the optimum $N$ and the size of each switch, we first investigate and determine the maximum and minimum effective widths of each type of switch. For the maximum effective width, we use the constraint that it should be large enough to drive $I_{\text {out, max }}$. Therefore, the maximum effective width of pMOS switch should satisfy the following constraint:

$$
\begin{equation*}
W_{e f f, p, M} \geq \frac{I_{o u t, \max } L_{\min }}{\mu_{p} C_{o x}\left(V_{\text {in }}-\left|V_{p t h}\right|\right)\left(V_{\text {in }}-V_{o u t, \max }-R_{L} I_{o u t, \max }\right)} \tag{13}
\end{equation*}
$$

where $M$ is the number of all possible switch combinations (it is $2^{N}-1$ ), $V_{\text {out,max }}$ is the maximum available output
voltage of the dc-dc converter. $I_{\text {load, max }}$ can be obtained from measurements or looked up from a data sheet. We determine the maximum effective width of nMOS switches in a similar manner.

To determine the minimum size for the effective widths, we use our observation from the experimental work. Fig. 9 shows the result of simulating the dc-dc converter model in Fig. 3, for various widths of the pMOS switch. The model parameters are determined from the $45-\mathrm{nm}$ BSIM4 predictive technology model for bulk CMOS [23], $f_{s w}=330 \mathrm{MHz}, L=6.8 \mathrm{nH}$, and $C=4 \mathrm{nF}$. According to the results, using switches smaller than a certain width region, yet it does not achieve high efficiency improvement even in the low current region. Therefore, the minimum effective widths should not be made too small.

Next, we consider the boundary conditions in the output current regions. We define the $i$ th smallest boundary condition in the output current range as $I_{b d, t y p e, i}$, where type is the switch type, while $i$ is the $i$ th smallest current value. Thus, $I_{b d, t y p e, i}$ is the boundary condition between two consecutive switch combination regions, each of which has the corresponding optimum effective widths, $W_{\text {eff,type }, i}$ and $W_{\text {eff }, \text { type }, i+1}$. The example with the two pMOS switches in Fig. 8 shows that there are two boundary conditions, $I_{b d, p, 1}$ and $I_{b d, p, 2}$. From (4), the boundary condition for pMOS switches may be calculated as

$$
\begin{equation*}
I_{b d, p, i}=C_{o x} V_{i n} \sqrt{\mu_{p} f_{s w} W_{e f f, p, i} W_{e f f, p, i+1} \frac{m}{D(m-1)}} \tag{14}
\end{equation*}
$$

The boundary condition for nMOS switches can be derived in the same way, and expressed as

$$
\begin{equation*}
I_{b d, n, i}=C_{o x} V_{i n} \sqrt{\mu_{p} f_{s w} W_{e f f, n, i} W_{e f f, n, i+1} \frac{m}{(1-D)(m-1)}} \tag{15}
\end{equation*}
$$

Finally, we derive the objective functions for pMOS and nMOS sizing that minimize the expected power loss of pMOS $\left(P_{p M O S}\right)$ and $\mathrm{nMOS}\left(P_{n M O S}\right)$ under the whole range of the possible output current values

$$
\begin{align*}
& \min \left(\sum_{i=1}^{M-1} \int_{I_{b d, p, i}}^{I_{b d, p, i+1}}\left(\alpha W_{e f f, p, i}+\frac{D I_{\mathrm{out}}^{2}}{\beta W_{e f f, p, i}}\right) f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}}\right)  \tag{16}\\
& \min \left(\sum_{i=1}^{M-1} \int_{I_{b d, n, i}}^{I_{b d, n, i+1}}\left(\alpha W_{e f f, n, i}+\frac{(1-D) I_{\mathrm{out}}^{2}}{\gamma W_{e f f, n, i}}\right) f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}}\right)
\end{align*}
$$

where $\alpha=C_{o x} L_{\text {min }} f_{s w} V_{i n}^{2} m /(m-1), \beta=\mu_{p} C_{o x}\left(V_{i n}-\left|V_{p t h}\right|\right) /$ $L_{\min }$ and $\gamma=\mu_{n} C_{o x}\left(V_{i n}-V_{n t h}\right) / L_{\text {min }} . I_{b d, p, M}$ and $I_{b d, n, M}$ are equal to $I_{o u t, m a x}$, whereas $I_{b d, p, 1}$ and $I_{b d, n, 1}$ equal the minimum output current. $f\left(I_{\text {out }}\right)$ is the load current distribution.

Solving (16) and (17) is not straightforward. This is primarily because, as we also stated before, the number of possible combinations $(M)$ increases exponentially as the number of switches $(N)$ grows. In addition, we also have to abide by other design considerations, e.g., limitations on the control complexity and area overhead. Therefore, $N$ should be carefully selected, i.e., it must be small enough so as not to significantly increase the control and area overheads, but large enough to enable the DSM in response to varying load conditions. Even if


Fig. 10. Flowchart to classify $f\left(I_{\text {out }}\right)$ into three different cases.
$N$ is limited to a small number, assuming that $f\left(I_{\text {out }}\right)$ follows a uniform distribution may not guarantee the optimality of the solution. This is because actual load conditions can be discretely distributed (i.e., those modules which have ON/OFF operation controlled by user activities, such as camera, SD card and so on. We thus classify $f\left(I_{\text {out }}\right)$ into three cases discrete, continuous, and discretizable, as described in Fig. 10. We present the heuristic solution of the switch selection and sizing problem for each case in the following subsections.

1) Discrete $f\left(I_{\text {out }}\right)$ : We define the state of $f\left(I_{\text {out }}\right)$ as discrete when $f\left(I_{\text {out }}\right)$ has (discretely) dominant load current values. For example, if a dc-dc converter powers up some modules including modules that can be controllably turned on/off, it may have several discrete load current values in its $f\left(I_{\text {out }}\right)$. If the discrete values are dominant in the distribution, the problem here aims to select and size the switches so that the effective widths of the switches match to the widths corresponding to the discrete current values, calculated by (9) and (10). According to the switch type, the calculated widths are included to a set $G_{p}$ (for pMOS) or $G_{n}$ (for nMOS). We then define cover so that a set $S$ covers a width $w$ means there is an effective width configured by elements in $S$ to match to the value, $w \pm \Delta . \Delta$ should be small enough. If the given design specification has enough switches $(N)$ so that the effective widths can easily cover all the required widths in $G_{p}$ and $G_{n}$; then, the problem can be solved straightforwardly. However, $N$ is likely quite small in a common design specification. With the given N , the problem is then to find a minimum set of each switch types ( $T_{p}$ and $T_{n}$, where $\left.\left|T_{p}\right|,\left|T_{n}\right| \leq N\right)$ that can cover the maximum number of the widths in $G_{p}$ and $G_{n}$. Finally, we present an algorithm to solve the problem. A function, coverage, in Algorithm III-B1 is a simple dynamic programming that determines whether the current set of switches ( $S$ ) can cover the required width $(w)$. Performing $O p t P_{-}$widths in Algorithm III-B1 returns the set, $T_{p}$, that cover the maximum number of elements in $G_{p}$. The
```
Algorithm 1 To find a minimum set of the optimum widths
of PMOS switches \(\left(T_{p}\right)\) under the given number of switches
\((N)\) and the discretized \(I_{\text {out }}\)
    Initialization
    define \(I_{\text {out }, i}, N, \Delta \quad \triangleright I_{\text {out }, i}\) is \(i^{\text {th }}\) discrete current value
    \(W_{i}=W_{p, \text { opt }}\left(I_{\text {out }, i}\right)\) and \(G_{p}=\left\{W_{1}, W_{2}, \ldots, W_{K}\right\} \quad \triangleright\) from (9)
    \(\max \leftarrow 0 \quad \triangleright \max\) will be updated to the maximum number of
    elements in \(G_{p}\), covered by the set \(T_{p}\) from \(O_{p t} P_{-}\)widths
function coverage ( \(w, S\) )
        if \(|w| \leq \Delta\) then return 1
        for each \(s \in S\) do
            if coverage \(\left(w-s, S \bigcap\{s\}^{c}\right)=1\) then return 1
        return 0
    function \(O p t P \_\)widths \((n, m, S) \quad \triangleright:\) main function
        for \(n \leq i \leq N\) do \(\quad \triangleright i\) is the number of switches in the set
        for \(m \leq j \leq K\) do \(\quad \triangleright\) to add \(W_{j}\) into the set S
            \(S \leftarrow S \bigcup\left\{W_{j}\right\}, c \leftarrow 0\)
            for \(1 \leq k \leq K\) do \(\quad \triangleright\) to check \(W_{k}\)
                        if \(\operatorname{coverage}\left(W_{k}, S\right)=1\) then \(c \leftarrow c+1\)
                        else if \(|S|<i\) and \(j \leq k\) then
                        OptP_widths \((\mathrm{i}, \mathrm{k}, \mathrm{S})\)
                \(c \leftarrow c+1\)
                if \(\sum_{s \in S} s \geq W_{e f f, p, M}\) and \(c \geq \max\) then
                        \(\max \leftarrow c, T_{p} \leftarrow S \quad \triangleright\) to update \(\max\) and \(T_{p}\)
        if \(\max =K\) then break
        return \(T_{p}\)
```

optimum set for the nMOS switches can be obtained in a similar manner.
2) Continuous $f\left(I_{\text {out }}\right)$ : Some dc-dc converters power up modules that have more than two operation levels as set by the user preferences. The brightness level of the display module and the volume level of the speaker module can be representative examples. If the load current of each module's operation level is known, then $f\left(I_{\text {out }}\right)$ may belong to the discrete case. However, our experience with the Qualcomm MDP shows that the load current conditions of the various operation levels typically overlap. We thus cannot find discrete breakpoints in $f\left(I_{\text {out }}\right)$. Furthermore, the user preference is random so that all the load current conditions have the same probability to be chosen. Finally, we define this case as continuous, and treat $f\left(I_{\text {out }}\right)$ as an uniform distribution. In this case, finding a set of the effective widths $\left(G_{p}\right.$ or $\left.G_{n}\right)$ can be formulated as a simple arithmetic progression problem to find $M$ number of effective widths with the given minimum and maximum effective widths. Next, Algorithm III-B1 is applied to the resultant set of effective widths so that we can find the switches to cover the maximum number of the effective widths.
3) Discretizable $f\left(I_{\text {out }}\right)$ : There are some dc-dc converters supplying power to the modules that cannot be controlled by the user. In this case, we propose to use the statistical load profiles of the dc-dc converter. Therefore, not only can the dc-dc converter deal with the dynamically varying load conditions, but also it has more possibility of being tuned for actual load conditions, compiled from the typical smartphone use patterns. The way to obtain the statistical load profiles is aforementioned in Section III-A.

We propose an approach to adapt $K$-means clustering to extract some discrete values from the load current values ( $I_{\text {out }}$ ). The measured data of $I_{\text {out }}$ is initially modified to $I_{\text {out }}^{\prime}$ that $i$ th value of $I_{\text {out }}$ is $\lambda f\left(I_{\text {out }}\right)$ times duplicated in $I_{\text {out }}^{\prime} . \lambda$ is a factor


Fig. 11. Types I and II equivalent converter models.
to adjust the weight of $f\left(I_{\text {out }}\right)$. Then, $I_{\text {out }}^{\prime}$ is divided into $K$ parts evenly, and the initial means of all parts are calculated. For the update procedure in the $K$-means clustering, the new means, set to be the centroids of the parts, are calculated until the result of the means converges. Finally, the set of the resultant means for each type of switches become $G_{p}$ and $G_{n}$, respectively. Then, they are applied to Algorithm III-B1 to find the minimum switch set that covers the maximum number of elements in $G_{p}$ and $G_{n}$.

## IV. Power Delivery Network Characterization

Prior to verifying the efficacy of the proposed dc-dc converter optimization methods in an actual smartphone platform, the power conversion efficiency of the PDN in the target platform should be characterized. However, the characterization is not a trivial task unless the PDN structure and converter specifications, and all the node voltages and branch currents of the PDN are available. Such a white-box approach is generally not possible for commercial smartphone platforms.

In this paper, we attempt a gray-box approach by introducing an equivalent converter concept. Modules in the platform are powered through the PDN, composed of a set of converters, as shown in Fig. 11. The converter set can be an empty set (direct connection), single dc-dc converter, a cascade connection of a dc-dc converter and an LDO, (rarely) a cascade connection of multiple dc-dc converters, etc. The equivalent converter models the set of converters on the path from the battery source to each (set of) module. In other words, the proposed equivalent converter abstraction treats the set of converters as a single equivalent converter. The abstraction enables a gray-box approach by which one can group modules in a smartphone platform by their required supply voltage levels, which can be obtained from datasheets. Power conversion efficiency improvement by adapting the proposed dc-dc converter optimization methods can effectively be performed once we identify the power conversion efficiency of the PDN in the smartphone platform.

## A. Equivalent Converter Model

We classify the equivalent converter models to present either a single dc-dc converter, or a cascaded connection of a dc-dc converter and an LDO, called type I and type II equivalent converters, respectively. We assume that the battery output current flows through a voltage regulator in order to produce a constant voltage throughout full discharge cycle of the battery. Without loss of generality, types I and II equivalent converter models can represent most power conversion tree structures in the PDN [9], [17], [24]. Most digital logic components
can be powered by a single dc-dc converter from the battery to the module-this gives rise to type I converter model. A cascade connection of two or more dc-dc converters is rare, because increasing the number of cascaded dc-dc converters generally increases the cost and area overhead with little (or no) benefit in terms of the conversion efficiency. LDOs are often an indispensable component to provide low-ripple output voltage for switching noise-sensitive RF and analog modules. It is uncommon to use a single LDO from the battery to a load device due to the required large dropout voltage and hence loss of LDO power efficiency. Instead, it turns out to be more energy-efficient to first convert the battery voltage using a dc-dc converter to an internal voltage slightly higher than the device voltage, and subsequently, use an LDO for the final power conversion.

According to (2) and (7), the power loss of the equivalent converter may be expressed as

$$
\begin{equation*}
P_{e q v}=A\left(\delta I_{q}+\sum_{i=1}^{N} I_{m o d, i}\right)^{2}+\delta \zeta \sum_{i=1}^{N} I_{m o d, i}+\left(B+\delta \nu I_{q}\right) \tag{18}
\end{equation*}
$$

where $N$ is the number of modules connected to the equivalent converter, $I_{\text {mod }, i}$ is the input current of the $i$ th module, parameter $A$ for the dc-dc converter is given by $A=R_{L}+D R_{s w 1}$ $+(1-D) R_{s w 2}, B$ is the sum of the second, third, and last terms of (2), $\delta=0$ for type I, and $\delta=1$ for type II equivalent converter, $v$ is the input voltages of the LDO, and $\zeta=\left(v-V_{\text {ref }} k\right)$. We can further simplify (18) by defining the output current of the equivalent converter, $I_{\text {eqv_out }}=\sum_{i=1}^{N} I_{m o d, i}$, and thus, the power loss for both types of equivalent converter models can be expressed as

$$
\begin{equation*}
P_{e q v}=a I_{\text {eqv_out }}{ }^{2}+b I_{\text {eqv_out }}+c \tag{19}
\end{equation*}
$$

where the coefficients $a, b$, and $c$ are derived from (18), and are largely dependent on the converter design specification such as the power MOSFET gate width, inductor IR loss, controller loss [8]. Calculating those coefficients is the key step of the power conversion efficiency characterization.

## B. Module Grouping and Regression Analysis

Measurement (or estimation) of the output current of all the equivalent converters enables us to estimate the unknown coefficients of the equivalent converter model. The input and output voltage levels of each equivalent converter can be obtained from the device datasheets. For example, the Qualcomm MDP MSM8660 [25] incorporates embedded power sensors that monitor and report current values of each module in the platform with fine granularity. When the target platform does not provide embedded current sensors, we can estimate the module current values by activity profiling [2]-[5].

Profiling various applications, which result in diverse usage patterns of the system modules, provides sufficient information and data to perform regression analysis and estimate the unknown coefficients. Linear regression analysis is a widely used method in system identification, requiring: 1) a well designed model and 2) sufficient experimental data to extract the best-fit model coefficients. In reality, however, independent control of each module is a challenging task due to the lack

TABLE I
Grouping Results for Qualcomm MDP MSM8660

| Group | Modules | Voltage |
| :---: | :---: | :---: |
| 1 and 2 | Group 1: CPU core0 <br> and Group 2: CPU core1 | 1.225 V |
| 3 | Internal Memory, Audio DSP, and <br> Digital core (includes GPU and modems) | 1.1 V |
| 4 | Audio codec Vdd, LPDDR2, ISM, <br> DRAM, and Camera-digital | 1.2 V |
| 5 | Audio codec IO, IO PAD3, Display IO, <br> DRAM Vdd1, Camera IO, PLL, <br> and eMMC host interface | 1.8 V |
| 6 | Camera analog, Haptic, SD card, <br> Touch screen, eMMC (Flash), IO PAD2, <br> SD card, and Ambient light sensor | 2.85 V |
| 7 | Display memory and Display backlight | 3.8 V |

of direct control knobs. For example, if we run an application that activates a camera module, currents flowing into the CPU, GPU, memory, and other associated components also ramp up and down correspondingly. We must thus apply linear regression analysis to the whole system (including all smartphone modules) simultaneously, while trying to vary the activity level of each module by running different applications. However, this method may not produce sufficient data to cover the whole range of activities for all smartphone modules, especially when the number of modules is large (e.g., the Qualcomm MDP has 27 embedded modules.) This is a potential source of inaccuracy for regression analysis due to the weak training set issue.

We tackle the problem by performing a module grouping in order to reduce the number of unknown coefficients that must be determined during the characterization process. This grouping procedure reduces the burden in terms of generating sufficient data to perform the linear regression analysis. The idea is that system modules that require the same operating voltage level can be combined into one group, and each group of modules is connected to the battery source via a single equivalent converter, as illustrated in Fig. 11. This method matches well with low power design practices that try to minimize the number of converters, due to their cost and internal power losses.

Given that the number of different voltage levels required by various modules in a smartphone platform is typically less than 10 [17], [24], the grouping procedure significantly reduces the number of parameters to be determined in linear regression. For example, the classification result of the Qualcomm MDP in Table 1 shows that the platform requires only seven groups although the module count is 27 .

Finally, the total power loss of the smartphone, $P_{\text {loss }}$, is given by

$$
\begin{equation*}
P_{l o s s}=\sum_{k=1}^{G} P_{e q v, k}=\sum_{k=1}^{G}\left(a_{k} I_{\text {eqv_out }, k}^{2}+b_{k} I_{\text {eqv_out }, k}+c_{k}\right) \tag{20}
\end{equation*}
$$

where $G$ is the number of groups, $P_{e q v, k}$ is the power loss of the $k$ th equivalent converter corresponding to the $k$ th group of modules, $I_{\text {eqv_out }, k}$ denotes the output current of the equivalent converter, which can be measured using embedded sensors in the Qualcomm MDP, $a_{k}, b_{k}$, and $c_{k}$ are the coefficients of

TABLE II
Extracted Coefficients for Each Group

| k | $a_{k}$ | $b_{k}$ | $c_{k}$ | k | $a_{k}$ | $b_{k}$ | $c_{k}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1,2 | 0.4427 | 0.0025 | 0.0170 | 5 | 0.1971 | 0.5232 | 0.0128 |
| 3 | 0.4079 | 0.1742 | 0.0675 | 6 | 0.1814 | 0.2928 | 0.0320 |
| 4 | 0.1152 | 0.1757 | 0.0077 | 7 | 0.4091 | 0.3871 | 0.0289 |

the equivalent converter model (to be determined by linear regression). We treat the battery voltage presented to the power conversion tree as being (nearly) constant, which is valid considering the function of the regulator between the battery cell/pack and the equivalent converter. Therefore, we may assume that $a_{k}, b_{k}$, and $c_{k}$ are constant values.

## V. EXPERIMENTAL WORK

## A. Experimental Setup

Qualcomm MDP MSM8660 is used as an actual smartphone platform, which is equipped with Google Android OS 2.3 on top of Snapdragon 1.5 GHz asynchronous dual-core CPU, a 3D-supporting GPU, 3.61 in WVGA multitouch screen, 1-GB internal RAM, 16-GB on-board flash, WiFi, Bluetooth, a GPS, dual-side cameras, etc. We perform power measurement of each module using the application profiling tool called Trepn. Use of Trepn ensures higher accuracy of the measurements. Note, however, that our proposed method is independent of the measurement tools, e.g., we may use activity profiling for power measurement provided by Google or based on techniques presented in the literature [2]-[5]. The collected data from MDP8660 is next processed by MATLAB for the characterization, as well as the optimization procedures.

## B. Coefficient Identification

As shown in Table I, the Qualcomm MDP modules can be classified into seven groups based on their operating voltage levels. Some modules such as the CPU cores in the MDP use dynamic voltage and frequency scaling techniques that require a range of variable supply voltage levels. Consequently, we keep each CPU core in a separate group but treat the equivalent converters of these groups identical to each other. Group 7 is associated with display, and therefore, the backlight brightness level mostly determines the current demand in this group. Group 7 coefficients are easy to identify because we can independently control the brightness of the display. In other words, we first perform the linear regression to identify coefficients of the equivalent converter model of Group 7, separately from the other groups.

For the remaining six groups, we profile various applications and collect sufficient data for the regression analysis as explained earlier. It is difficult to identify every $c_{k}$ coefficient of the $k$ th equivalent converters directly from the linear regression process. Rather, we only extract $c_{\text {ext }}$ that corresponds to the sum of all the constant terms in (20), i.e., $c_{e x t}=\sum_{k=1}^{G} c_{k}$. We find an approximate value for each $c_{k}$ as $c_{k}=c_{\text {ext }}\left(P_{\text {group }, k} / P_{\text {group }, \text { total }}\right)$, where $P_{\text {group }, k}$ denotes the power consumption of Group $k$, and $P_{\text {group,total }}$ is the total power consumption of all the groups. The $P_{\text {group }, k}$ and $P_{\text {group }, \text { total }}$ values are available from the embedded sensors in the MDP.


Fig. 12. Conversion efficiencies for all groups. (a) Groups 1 and 2. (b) Group 3. (c) Group 4. (d) Group 5. (e) Group 6. (f) Group 7.


Fig. 13. Part of traces of total power consumption: measured data and modeled data.

The extracted coefficients of the seven equivalent converters are reported in Table II. The power conversion efficiency of Group $k$, derived from $\left(P_{\text {group }, k} /\left(P_{\text {group }, k}+P_{\text {eqv }, k}\right)\right.$ ), is shown in Fig. 12. We verify the characterization results of each equivalent converters. Fig. 13 shows the comparison of the system power consumption trace between the real measurement as reported by a built-in battery sensor and the estimation as obtained by our extracted equivalent converter coefficients. The trace includes ten mobile applications, as stated in Section III-A. We measure the error as a signal-to-noise ratio, and the resulting average error is 0.075 . The standard deviation of the error is 0.059 . The worst case average error is 0.128 and is seen for Neocore (there is a rare but important synchronization problem with the built-in sensor causes extreme worst error in this case). We also run four completely new mobile benchmarks (they are different from the one used for the regression analysis): Antutu [26], [27], Quadrant [28], and GLBenchmark [29]. These benchmarks are designed to test the performance of various modules in the smartphone platform. In particular, Vellamo includes HTML5 and METAL chapter to evaluate the mobile web browser performance and the mobile processors, respectively. GLBenchmark and Antutu

TABLE III
$W_{d e f}$ of Equivalent Converter Models

| Group | 1,2 | 3 | 4 | 5 | 6 | 7 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $W_{\text {def }}$ | 1.2401 | 1.1033 | 1.3109 | 1.4033 | 1.4102 | 0.7368 |



Fig. 14. Relation between the power conversion efficiency and $W$ : Group 7. (a) Efficiency: Group 7. (b) Power loss: Group 7.
include a 3D testing for GPU. Quadrant performs CPU, Memory, I/O testings. Therefore, we believe these four new benchmarks are sufficient to evaluate our regression analysis. The resulting average error and standard deviation are $0.047,0.046$ for Antutu, 0.062, 0.040 for Vellamo:Metal, $0.092,0.052$ for Vellamo:Html5, $0.064,0.045$ for Quadrant, and $0.065,0.058$ for GLBenchmark. We have thus confirmed that the results of the power conversion efficiency characterization process is accurate enough for the subsequent optimization process.

## C. Default Widths Extraction

Given that the pMOS switch typically has smaller current per width than the nMOS switch, the pMOS switch is much larger than nMOS switch in the dc-dc converters [12]. We thus focus on scaling the width of the pMOS switch, and the nMOS switch is sized to have the same resistance of the pMOS switch (i.e., the widths of both switches are in turn linearly proportional to each other). From (4) and (5), the dc-dc converter power loss model may be generally expressed as a function of $W$

$$
\begin{equation*}
P_{\text {converter }}=\left(\frac{r_{1}}{W}+r_{2}\right) I_{\text {out }}^{2}+r_{3} W+r_{4} \tag{21}
\end{equation*}
$$

where $W$ is linearly proportional to both width of pMOS and nMOS switches, $r_{1}, r_{2}, r_{3}$, and $r_{4}$ are constants.

Given that the two MOSFET switches in a dc-dc converter dominate the power loss of the equivalent converter, and $I_{q}$ is small, we rewrite (19) as

$$
\begin{equation*}
P_{e q v, k}=\left(\frac{r_{1, k}}{W_{k}}+r_{2, k}\right) I_{\text {eqv_out }, k}^{2}+b I_{\text {eqv_out }, k}+r_{3, k} W_{k}+r_{4, k} \tag{22}
\end{equation*}
$$

where $P_{\text {eqv,k }}$ and $I_{\text {eqv_out,k }}$ are the power loss and the output current of the $k$ th equivalent converter, respectively, corresponding to the $k$ th group of modules, $W_{k}, r_{1, k}, r_{2, k}, r_{3, k}$ and $r_{4, k}$ are the coefficients of the equivalent converter model that have been determined based on linear regression. In the linear regression procedure, we carefully set the initial condition not be trapped in a local minimum. Then, the resultant coefficient $W_{k}$ is the default value of $W\left(W_{\text {def }}\right)$ of the $k$ th equivalent converter, which is shown in Table III.

## D. Simulation Results: Static Switch Sizing

Fig. 14(a) shows an example in which $W$ changes the efficiency graph of Group 7. Fig. 14(b) shows that the power loss plots have a convex functional form in terms of $W$. From (11) and (22), the optimal $W$ of the $k$ th group is calculated by

$$
\begin{equation*}
W_{o p t, k}=\sqrt{\int_{I_{\text {equ }}^{-} \text {out }, k}} I_{\text {out }}^{2} f_{k}\left(I_{\text {out }}\right) d I_{\text {out }} \sqrt{\frac{r_{1, k}}{r_{3, k}}} \tag{23}
\end{equation*}
$$

where $f_{k}\left(I_{\text {out }}\right)$ is the $k$ th load current distribution.
In order to derive $f_{k}\left(I_{\text {out }}\right)$, we use the collected loading profiles. As introduced in Section III-A, we run the ten representative mobile applications, and the loading profiles of all the modules in $k$ th groups are measured for each application. All the applications except clock and system setting are run under the same setup where WiFi is turned on and the backlight level of the display is the highest. Clock is measured under the median level of the backlight and WiFi on, whereas system setting is measured under the lowest backlight and WiFi off. For the case of call, we consider auto turn-off screen during the call. We derive two types of load current distributions, according to the two representative smartphone usage patterns, patterns I and II introduced in Fig. 6. Fig. 12 shows the resulted $f_{k}\left(I_{\text {out }}\right)$ from pattern I.

From (22), the expected power loss of an equivalent converter can be generally expressed as

$$
\begin{align*}
E\left[P_{\text {eqv }}\right]= & \left(\frac{r_{1}}{W}+r_{2}\right) \int I_{\mathrm{out}}^{2} f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}}+b \int I_{\mathrm{out}} f\left(I_{\mathrm{out}}\right) d I_{\mathrm{out}} \\
& +r_{3} W+r_{4} \tag{24}
\end{align*}
$$

We denote the efficiency and power loss for different setup as $\eta_{\text {setup }}$ and $P_{\text {setup }}$, where setup can be def or opt. def implies the default setup, whereas opt implies the optimal setup of the dc-dc converter. Then, $\eta_{\text {setup }}$ can be calculated by $P_{\text {group }} /\left(P_{\text {group }}+P_{\text {setup }}\right)$, and $P_{\text {setup }}$ can be derived from (24) with $W=W_{\text {setup }} . P_{\text {group }}$ is the power consumed by all the modules in the group. Finally, we define the power conversion efficiency enhancement $\left(\right.$ Gain $\left._{\eta}\right)$ and power loss reduction $\left(\right.$ Gain $\left._{P}\right)$ by

$$
\begin{align*}
\operatorname{Gain}_{\eta} & =\left(\frac{\eta_{\text {opt }}}{\eta_{\text {def }}}-1\right) 100(\%) \\
\operatorname{Gain}_{P} & =\left(1-\frac{P_{o p t}}{P_{\text {def }}}\right) 100(\%) \tag{25}
\end{align*}
$$

Table IV shows the S3 results for both patterns I and II, where the values of $W_{\text {opt }}$ are $W_{\text {opt }, I}$ for pattern I, and $W_{\text {opt }, \text { II }}$ for pattern II. The overall power conversion efficiency enhancements for patterns I and II are $6 \%$ and $5.5 \%$, which correspond to $19 \%$ and $18 \%$ power loss reductions during power conversion, respectively.

To check how much the voltage ripple increases by changing from Wdef,k to Wopt,k, we define a parameter called voltage ripple change (\%) and calculated as $\Delta V_{o p t, k} / \Delta V_{d e f, k} \cdot 100$, where $\Delta V_{\text {def }, k}$ and $V_{o p t, k}$ are obtained by substituting $W_{\text {def } f k}$ and $W_{o p t, k}$ in (6), respectively. Throughout the regression results and possible range of output current for each group, $V_{s w 1}+V_{s w 2}=$ $r_{1} I_{\text {out }}$, and $V_{L}=r_{2} I_{\text {out }}$. Then, all the possible $V_{s w 1}, V_{s w 2}$ and $V_{L}$ are considered to derive the maximum voltage ripple change.

TABLE IV
Static Switch Sizing (S3) Results (\%) of Patterns I and II

| $k$ | $W_{\text {opt }, I}$ | Gain $_{\eta}$ | Gain $_{P}$ | $W_{\text {opt }, \text { II }}$ | Gain $_{\eta}$ | Gain $_{P}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | 0.2976 | 7.9718 | 29.7828 | 0.3365 | 6.4977 | 26.7376 |
| 2 | 0.1886 | 14.0471 | 38.7954 | 0.2073 | 12.3355 | 37.0320 |
| 3 | 0.3793 | 4.4908 | 11.4699 | 0.3869 | 3.9988 | 10.4214 |
| 4 | 0.1671 | 10.9028 | 29.7479 | 0.1834 | 9.7412 | 28.0067 |
| 5 | 0.1271 | 12.8754 | 27.7512 | 0.0900 | 12.4529 | 27.2171 |
| 6 | 0.1176 | 14.2869 | 34.0873 | 0.1228 | 13.6310 | 33.3380 |
| 7 | 0.2130 | 2.0874 | 11.0332 | 0.2145 | 2.0615 | 10.9291 |
| $t$ | - | 6.0157 | 19.0699 | - | 5.5536 | 18.0396 |

TABLE V
Voltage Ripple Change (\%) From $W_{d e f, k}$ TO $W_{o p t, k}$ FOR PATTERNS I AND II

|  | $k=1$ | $k=2$ | $k=3$ | $k=4$ | $k=5$ | $k=6$ | $k=7$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $I$ | 8.3 | 11.6 | 5.5 | 14.3 | 7.7 | 1.9 | 0.1 |
| $I I$ | 7.1 | 11.0 | 5.5 | 13.4 | 10.2 | 1.8 | 0.1 |

The results for each group are reported at Table V. Because the worst case is only $14 \%$, and the equivalent converter model includes LDO, we can safely state that the resulting voltage ripple will satisfy the design constraints.

## E. Simulation Results: Dynamic Switch Modulation

According to the classification flow in Fig. 10, Groups 1, 2,3 , and 5 are classified to discretizable, Groups 4 and 6 are discrete, and Group 7 belongs to continuous. In this section, only the results from the smartphone usage pattern, pattern I, is presented for brevity (i.e., the results from pattern II are almost same as the pattern I).

As a result of the $K$-means clustering procedure with $K=7$ and $\lambda=1000$, each of Groups $1,2,3$ and 5 has seven discrete load current values. Table VI shows the resulted values and their corresponding optimum width values. From Algorithm III-B1, we can derive the set of switches of each group that covers the maximum number of the width values in Table VI. The boundary conditions of the load current region are calculated by $I_{b d, i}=\sqrt{W_{i} W_{i+1} r_{3} / r_{1}}$, which is derived from (14).

Table VII shows examples of the resulted efficiency enhancement $\left(\right.$ Gain $\left._{\eta, \text { method }}\right)$ of Groups 1, 2 and 3, when $\Delta=0.4$, and $N=3($ method $=D S M 1)$ or 4 $($ method $=D S M 2)$. The results from the S3 $($ method $=S 3)$ are also provided for comparison. We assume that the power losses of the controller for all methods are the same. The table includes the results of the five applications. The results of the other five applications are omitted in this paper, but they show similar results to the application in the table. Rather, in order to demonstrate the effectiveness of DSM for the varying load conditions, three cases of (fixed) high load current conditions are also explored, although they are rarely observed when running the common applications.

Table VII shows that $\operatorname{Gain}_{\eta, D S M 2}$ is slightly better than Gain $_{\eta, D S M 1}$, and Gain $\eta_{\eta, D S M 1}$ is generally better than Gain $_{\eta, S 3}$. For Groups 1 and 2, high efficiency enhancement is achieved for all the methods when the applications require the low load current (i.e., System setting and Call in both groups, and

TABLE VI
Results of $K$-Means Clustering for Groups $1,2,3$ and 5: $I_{o u t, k}^{\prime}$ Is the $k$ Th Mean Value ( $m A$ ), and $W_{\text {opt }, k}$ IS ITs Corresponding Optimal WidTh

| Group | $I_{\text {out }, 1}^{\prime}$ | $W_{\text {opt }, 1}$ | $I_{\text {out }, 2}^{\prime}$ | $W_{\text {opt }, 2}$ | $I_{\text {out }, 3}^{\prime}$ | $W_{\text {opt }, 3}$ | $I_{\text {out }, 4}^{\prime}$ | $W_{\text {opt }, 4}$ | $I_{\text {out }, 5}^{\prime}$ | $W_{\text {opt }, 5}$ | $I_{\text {out }, 6}^{\prime}$ | $W_{\text {opt }, 6}$ | $I_{\text {out }, 7}^{\prime}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |$W_{\text {opt }, 7}$.

TABLE VII
Efficiency Enhancement Results (\%) of the Dynamic Switch Modulation (Gain $\eta_{\eta, D S M}$ ) and the Static Switch Sizing ( Gain $_{\eta, S 3}$ )

| Group 1-DSM1: $N=3$, Width set=\{0.1509, $0.4404,1.1547\}$ and $D S M 2$ : $N=4$, Width set $=\{0.1509,0.3017,0.4404,1.1547\}$ |  |  |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Case | Gain $_{\eta, S 3}$ | Gain $_{\eta, \text { DSM }}$ | Gain $_{\eta}$, DSM 2 | Case | Gain $_{7, S 3}$ | Gain $_{\eta, \text {, DSM } 1}$ | Gain $_{\eta, \text {, DSM2 }}$ |
| System setting | 9.7606 | 10.0825 | 10.2681 | Neocore | 3.9914 | 4.4510 | 4.4730 |
| Call | 8.7008 | 8.6859 | 8.9136 | $I_{\text {out }}=150 \mathrm{~mA}$ | -3.7896 | 0.1598 | 0.4380 |
| Skype-videochat | 3.8615 | 4.4978 | 4.5696 | $I_{\text {out }}=250 \mathrm{~mA}$ | -9.5422 | 0.0685 | 0.0851 |
| Facebook | 3.9612 | 4.5575 | 4.6576 | $I_{\text {out }}=350 \mathrm{~mA}$ | -13.7721 | 0.7423 | 0.8465 |
| Group 2-DSM1: $N=3$, Width set=\{0.1088, $0.4250,0.9744\}$ and DSM2: $N=4$, Width set $=\{0.1088,0.1918,0.2514,0.9744\}$ |  |  |  |  |  |  |  |
| Case | Gain $_{7, S 3}$ | Gain $_{\mathrm{\eta}, \text { DSM1 }}$ | Gain $_{\eta, \text {,DSM2 }}$ | Case | Gain $_{7, S 3}$ | Gain $_{\eta, \text {, DSM } 1}$ | Gain $_{\eta, \text {, DSM } 2}$ |
| System setting | 10.0681 | 11.7873 | 12.2481 | Neocore | 12.1440 | 11.2321 | 12.2433 |
| Call | 12.9531 | 12.3106 | 13.1478 | $I_{\text {out }}=150 \mathrm{~mA}$ | -6.4464 | 0.5639 | 0.5639 |
| Skype-videochat | 1.8863 | 4.2109 | 4.2728 | $I_{\text {out }}=250 \mathrm{~mA}$ | -13.9895 | -0.1039 | 0.0045 |
| Facebook | 7.8378 | 8.2429 | 8.7913 | $I_{\text {out }}=350 \mathrm{~mA}$ | -19.5088 | 0.3518 | 2.4955 |
| Group 3-DSM1: $N=3$, Width set=\{0.2970, 0.3945, 0.8983\} and DSM2: $N=4$, Width set $=\{0.2970,0.3717,0.4122,0.8983\}$ |  |  |  |  |  |  |  |
| Case | Gain $_{\mathrm{Y}, 53}$ | Gain $_{\mathrm{\eta}, \text { DSM1 }}$ | Gain $_{\eta, \text { DSM } 2}$ | Case | Gain $_{7, S 3}$ | Gain $_{\eta, \text { DSM } 1}$ | Gain $_{\eta, \text {,DSM2 }}$ |
| System setting | 4.4993 | 4.5576 | 4.6029 | Neocore | -3.0773 | 0.4821 | 0.4636 |
| Call | 4.8733 | 4.9012 | 4.9499 | $I_{\text {out }}=200 \mathrm{~mA}$ | 2.7330 | 2.8415 | 8.3743 |
| Skype-videochat | -0.0035 | 1.5326 | 1.5531 | $I_{\text {out }}=250 \mathrm{~mA}$ | 0.7476 | 4.7393 | 4.8130 |
| Facebook | 3.2841 | 3.4753 | 3.5499 | $I_{\text {out }}=300 \mathrm{~mA}$ | -5.8153 | 2.1294 | 2.2558 |

Facebook and Neocore in Group 2), which are around 8\% to $12 \%$. On the other hand, for the applications requiring the higher load current (i.e., Skype-videochat in both group, and Facebook and Neocore in Group 1), the efficiency enhancement of Groups 1 and 2 are not that high, which is around $1 \%$ to $4 \%$. That is because the efficiencies from the default setup ( $\eta_{\text {def }}$ ) are higher in the high load current conditions than the low load current conditions. Meanwhile, the S3 achieves high efficiency enhancement at the low load current conditions, as shown in Table VII. But it has drawbacks that the efficiencies at the high load current conditions are reduced-Gain $\eta_{\eta, S 3}$ can be even negative. On the other hand, DSM can achieve the high efficiency enhancement for wide load current range. For example, the result of Skype-videochat at Group 2, Gain $_{\eta, S 3}$ is $1.8 \%$, but Gain $_{\eta, D S M 1}$ and Gain $_{\eta, D S M 2}$ are still more than $4 \%$. Furthermore, in the cases of Groups 1 and 2 when the load current conditions are 150,250 , and 350 mA , the results demonstrate that DSM keeps efficiency enhancement even for the high current region, but the S3 does not. The results from Group 3 in Table VII show the similar results.

The $K$-means clustering result for Group 5 in Table VI shows the gap between minimum and maximum load current conditions is only 12 mA . Thus, only one switch set $(N=1)$ sized by the S 3 would be enough. For DMS with $N=2$, $\{0.0968,0.1231\}$ can be a set of widths of the switches.

Camera-digital in Group 4 is the module that has the on/off operation controlled by an user. Furthermore, as shown in Fig. 15, it dominantly consumes power ( $55 \%-65 \%$ ). When the camera is on, the average load current of Group 4 is 62.7269 mA , and when it is off, the average load current of Group 4 is 19.5208 mA . From (23), these current values


Fig. 15. Ratio of the power consumed by camera digital to the power consumed by all the modules in Group 4.


Fig. 16. Load current distribution of display modules in Group 7 according to the ten brightness levels.
correspond to the width of switches as 0.5127 and 0.1595 , respectively. Meanwhile, SD card and Camera analog are such modules in Group 6. Then, we have four discrete load current values, $15,29,37$, and $51(\mathrm{~mA})$, according to the conditions of (SD card, Camera analog): off/off, off/on, on/off and on/on. These current values correspond to $0.1128,0.2181,0.2783$, and 0.3836 as the optimum effective widths, respectively. When $N=2,\{0.1128,0.2783\}$ can be a set of widths of the switches. Table VIII shows the efficiency enhancement

TABLE VIII
Efficiency Enhancement Results of Groups 4 and 6

|  | Group 4 |  | Group 6 |  |
| :---: | :---: | :---: | :---: | :---: |
| Case | Gain $_{\eta, S 3}$ | Gain $_{\eta, \text { DSM }}$ | Gain $_{\eta, \text { S3 }}$ | Gain $_{\eta, \text { DSM }}$ |
| System <br> setting | 16.7919 | 16.8653 | 12.9967 | 12.9725 |
| Neocore | 5.6650 | 5.6295 | 13.7939 | 13.7870 |
| Skype- <br> videochat | -1.4018 | 1.6159 | 5.3388 | 6.0069 |
| Camera | 0.1997 | 2.3499 | 6.0982 | 6.6365 |

TABLE IX
Results of the Power Loss Gain (\%) for Ten Applications

| Application | Gain $_{P, S 3}$ | Gain $_{P, D S M 1}$ | Gain $_{P, D S M 2}$ |
| :---: | :---: | :---: | :---: |
| Call | 18.3237 | 18.8361 | 19.0967 |
| Camera | 15.1582 | 15.9335 | 16.1485 |
| Clock | 22.4631 | 23.3820 | 23.4811 |
| Facebook | 17.2478 | 18.3995 | 18.6534 |
| GoogleMap | 16.6835 | 18.3276 | 18.5324 |
| Neocore | 4.8926 | 11.1143 | 11.1352 |
| Skype-videochat | 9.5858 | 12.2300 | 12.2963 |
| SMS | 18.4505 | 19.4249 | 19.6605 |
| System setting | 21.0103 | 21.5473 | 21.6712 |
| Youtube | 17.7074 | 18.2816 | 18.5089 |

results of Group 4 and 6, for the four applications. The case of Group 4 shows the similar results to the previous cases of Groups 1, 2, and 3 that DSM performs as well as the S3 does in the low load current conditions (i.e., System setting and Neocore), but DSM also keeps positive enhancements even in the high load current conditions (i.e., Skype-videochat and Camera). On the other hand, the case of Group 6 shows that both methods have almost same results. That is because the load current range of Group 6 is narrow, besides the applications may not frequently require the maximum current.

Group 7 consists of two modules, display memory and display backlight. Display backlight has various brightness levels that can be set by the user preference. We divide the brightness levels by 10 , and measure the load current of Group 6 for each level. Then, the load current condition induced by each bright level is overlapped to the conditions of the adjacent levels. Fig. 16 shows the resulted load current distribution of Group 6, when all the levels are equally likely to occur. Next, we select the seven discrete current values of an arithmetic sequence satisfying that the minimum and maximum current values are 11 and 66 mA , respectively. These current values corresponds to the required width values. From Algorithm 1 with $\Delta=0.01$, a set, $\{0.0355,0.1225$, $0.2128\}$, is derived. All the possible effective width from the set can cover the seven required width values (thus $N=3$ is enough in this case). Finally, we have the enhancement results that Gain $_{\eta, S 3}=3.9483 \%$, and Gain $_{\eta, D S M}=4.3424 \%$, in the case of all the levels to be equally likely chosen.

For interested readers, we also provide Table IX to show the detailed results for the ten applications.

## VI. CONCLUSION

This paper demonstrated that significant power loss occurs during power conversion in the PDN of a smartphone platform.

To mitigate this problem, this paper focuses on the dc-dc converters in the PDN to introduce two optimization methods for the dc-dc converters. S3 was presented to configure the switches in dc-dc converters so that the optimal operating conditions of the dc-dc converters match to the general load current conditions. The general load current distributions for all modules in the platform were derived from the measured loading profiles and smartphone usage patterns. DSM was also presented to overcome the lack of capability of the S3 that may not be optimal for dynamically varying load conditions. By exploiting the multiswitching scheme, detailed procedures to select and size the switches were introduced. To verify the presented methods in an actual smartphone platform, the PDN characterization procedure was performed. By the proposed equivalent converter model and grouping method, the power conversion efficiency of the PDN in the target smartphone platform could be characterized. Finally, we applied the proposed optimization methods to the platform. The experimental results showed that the S3 achieves 6\% overall efficiency enhancement, which translates to $19 \%$ power loss reduction for the general smartphone usage pattern. The DSM accomplishes the similar improvement at the same condition. Furthermore, it also can achieve the high efficiency enhancement in the various load conditions. In the design flow, both S3 and DSM methods can be applied only after obtaining the load current distributions for the modules. S3 is simple to implement, but may not produce the optimal transistor widths under dynamically changing load conditions or even under the case that the load distribution has a high variance. On the other hand, DSM has more control/area overhead than S3, but it can achieve high conversion efficiency enhancement under all load conditions. Note that if it happens that the load current distributions are changed because of newly added applications or changing usage patterns compared to those used for the initial optimization, the DSM method will continue to provide power efficiency enhancement because of its adaptability whereas the S3 method will fail.

## REFERENCES

[1] W. Lee, Y. Wang, D. Shin, N. Chang, and M. Pedram, "Power conversion efficiency characterization and optimization for smartphones," in Proc. Int. Symp. Low Power Electron. Design, 2012, pp 103-108.
[2] L. Zhang, B. Tiwana, R. P. Dick, Q. Zhiyun, Z. M. Mao, W. Zhaoguang, and Y. Lei, "Accurate online power estimation and automatic battery behavior based power model generation for smartphones," in Proc. Int. Conf. Hardware/Software Codesign Syst. Synthesis, 2010, pp. 105-114.
[3] M. Dong and L. Zhong, "Self-constructive high-rate system energy modeling for battery-powered mobile systems," in Proc. Int. Conf. Mobile Syst. Appl. Services, 2011, pp. 335-348.
[4] A. Pathak, Y. C. Hu, and M. Zhang, "Fine-grained energy accounting on smartphones with Eprof," in Proc. EuroSys, 2011, pp. 29-42.
[5] D. Shin, N. Chang, W. Lee, Y. Wang, Q. Xie, and M. Pedram, "Online estimation of the remaining energy capacity in mobile systems considering system-wide power consumption and battery characteristics," in Proc. Asia South Pacific Des. Autom. Conf., 2013, pp. 59-64.
[6] L. Benini, A. Bogliolo, and G. D. Micheli, "A survey of design techniques for system-level dynamic power management," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 8, no. 3, pp. 299-316, Jun. 2000.
[7] Q. Qiu, Q. Wu, and M. Pedram, "Dynamic power management in a mobile multimedia system with guaranteed quality-of-service," in Proc. Design Autom. Conf., 2001, pp. 834-839.
[8] Y. Choi, N. Chang, and T. Kim, "DC-DC converter-aware power management for low-power embedded systems," IEEE Trans. Comput.Aided Des. Integr. Circuits Syst., vol. 26, no. 8, pp. 1367-1381, Aug. 2007.
[9] B. Amelifard and M. Pedram, "Optimal design of the power-delivery network for multiple voltage-island system-on-chips," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 28, no. 6, pp. 888-900, Jun. 2009.
[10] Z. Zeng, X. Ye, Z. Feng, and P. Li, "Tradeoff analysis and optimization of power delivery networks with on-chip voltage regulation," in Proc. Design Autom. Conf., 2010, pp. 831-836.
[11] O. Abdel-Rahman, J. A. Abu-Qahouq, L. Huang, and I. Batarseh, "Analysis and design of voltage regulator with adaptive FET modulation scheme and improved efficiency," IEEE Trans. Power Electron., vol. 23, no. 2, pp. 896-906, Mar. 2008.
[12] S. Kudva and R. Harjani, "Fully-integrated on-chip DC-DC converter with a 450X output range," IEEE J. Solid-State Circuits, vol. 46, no. 8, pp. 1940-1951, Aug. 2011.
[13] A. A. Sinkar, H. Wang, and N. S. Kim, "Workload-aware voltage regulator optimization for power efficient multicore processors," in Proc. Design Autom. Test Eur., 2012, pp. 1134-1137.
[14] J. Wibben and R. Harjani, "A high-efficiency DC-DC converter using 2nH integrated inductors," IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 844-854, Apr. 2008.
[15] R. Erickson and D. Maksimovic, Fundementals of Power Electronics. Berlin, Germany: Springer, 2001.
[16] J. Xiao, A. Peterchev, J. Zhang, and S. Sanders, "An ultra-low-power digitally-controlled buck converter IC for cellular phone applications," in Proc. Appl. Power Electron. Conf., 2004, pp. 383-391.
[17] C. Shi, B. C. Walker, E. Zeisel, E. B. Hu, and G. H. McAllister, "A highly integrated power management IC for advanced mobile applications," in Proc. Custom Integr. Circuits Conf., 2006, pp. 85-88.
[18] Y. Du, M. Wang, R. T. Meitl, S. Lukic, and A. Q. Huang, "Highfrequency high-efficiency DC-DC converter for distributed energy storage modularization," in Proc. Int. Experts Consultants, 2010, pp. 1832-1837.
[19] S. Musunuri and P. L. Chapman, "Optimization of CMOS transistors for low power dc-dc converters," in Proc. Power Electron. Specialist Conf., 2005, pp. 2151-2157
[20] A. Shye, B. Scholbrock, and G. Memik, "Into the wild: Studying real user activity patterns to guide power optimizations for mobile architectures," in Proc. Int. Symp. Microarchitecture, 2009, pp. 168-178.
[21] F. Hossein, M. Ratul, K. Srikanth, L. Dimitrios, G. Ramesh, and E. Deborah, "Diversity in smartphone usage," in Proc. Int. Conf. Mobile Syst. Appl, Services, 2010, pp. 179-194.
[22] T. M. T. Do, J. Blom, and D. Gatica-Perez, "Smartphone usage in the wild: A large-scale analysis of applications and context," in Proc. Int. Conf. Multimodal Interaction, 2011, pp. 353-360.
[23] PTM [Online]. Available: http://ptm.asu.edu (accessed 2013 Jun.)
[24] Texas Instruments. Handset: Smartphone Solutions [Online]. Available: http://www.ti.com/solution/handset_smartphone (accessed 2013 Jun.)
[25] Qualcomm. Snapdragon MDP MSM8660 Datasheet [Online]. Available: https://developer.qualcomm.com/mobile-development/
development-devices/snapdragon-mdp-legacy-devices (accessed 2013 Jun.)
[26] Antutu [Online]. Available: http://www.antutu.net (accessed 2013 Jun.)
[27] Vellamo [Online]. Available: http://www.quicinc.com/vellamo (accessed 2013 Jun.)
[28] Quadrant [Online]. Available: http://www.aurorasoftworks.com (accessed 2013 Jun.)
[29] GLBenchmark [Online]. Available: http://gfxbench.com (accessed 2013 Jun.)


Woojoo Lee ( $S^{\prime} 12$ ) received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2007, and the M.S. degree in electrical engineering from the University of Southern California, Los Angeles, CA, USA, in 2010. He is currently pursuing the Ph.D. degree in electrical engineering at the Department of Electrical and Electronic Engineering, University of Southern California, under the supervision of Prof. M. Pedram.
His current research interests include low-power VLSI design, system-level power management, and embedded system designs.


Yanzhi Wang ( $S^{\prime} 12$ ) received the B.S. degree with distinction in electronic engineering from Tsinghua University, Beijing, China, in 2009. He is currently pursuing the $\mathrm{Ph} . \mathrm{D}$. degree in electrical engineering at the Department of Electrical and Electronic Engineering, University of Southern California, Los Angeles, CA, USA, under the supervision of Prof. M. Pedram.

His current research interests include systemlevel power management, next-generation energy sources, hybrid electrical energy storage systems, near-threshold computing, and the smart grid. He has published around 60 papers in these areas.


Donghwa Shin (S'05-M'12) received the B.S. degree in computer engineering and the M.S. and Ph.D. degrees in computer science and electrical engineering from Seoul National University, Seoul, Korea, in 2005, 2007, and 2012, respectively.
He is currently with the Dipartimento di Automatica e Informatica-EDA Group, Politecnico di Torino, Torino, Italy, as a Research Assistant. His current research interests include system-level lowpower techniques for embedded systems and hybrid power system design for embedded systems.


Naehyuck Chang (F'12) received the B.S., M.S., and Ph.D. degrees from the Department of Control and Instrumentation, Seoul National University, Seoul, Korea, in 1989, 1992, and 1996, respectively.
He joined the Department of Computer Engineering, Seoul National University, in 1997, where he is currently a Professor with the Department of Electrical Engineering and Computer Science and is the Vice Dean of the College of Engineering. His current research interests include low-power embedded systems, hybrid electrical energy storage systems, and next-generation energy sources.
Dr. Chang has served on technical program committees in many EDA conferences, including DAC, ICCAD, ISLPED, DATE, CODES+ISSS, and ASP-DAC. He was a TPC Chair or Co-Chair of RTCSA 2007, ISLPED 2009, ESTIMedia 2009 and 2010, and CODES+ISSS 2012, and will serve as the TPC Chair of ICCD 2014, and ASP-DAC 2015. He was the General Vice Chair of ISLPED 2010, and the General Chair of ISLPED 2011 and ESTIMedia 2011. He has served as an Associate Editor of IEEE TCAS-I, IEEE TCAD, ACM TODAES, and ACM TECS, Springer DAES, and was a Guest Editor of ACM TODAES in 2010, and ACM TECS in 2010 and 2011. He is the ACM SIGDA Chair and an ACM Distinguished Scientist.


Massoud Pedram (F'01) received the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, CA, USA, in 1991.

He is the Stephen and Etta Varra Professor with the Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA. He holds ten U.S. patents and has published four books, 12 book chapters, and more than 130 archival and 320 conference papers. His current research interests include low power electronics, energy-efficient processing, and cloud computing to photovoltaic cell power generation, energy storage, and power conversion, and from RT-level optimization of VLSI circuits to synthesis and physical design of quantum circuits.
Dr. Pedram was a recipient of the 1996 Presidential Early Career Award for Scientists and Engineers, an ACM Distinguished Scientist, and currently serves as the Editor-in-Chief of the ACM Transactions on Design Automation of Electronic Systems and the IEEE Journal on Emerging and Selected TOPICS IN CIRCUITS AND SYSTEMS. For this research, he and his students have received six conference and two IEEE Transactions Best Paper Awards. He has also served on the Technical Program Committee of a number of premiere conferences in his field and was the Founding Technical Program Co-Chair of the 1996 International Symposium on Low Power Electronics and Design and the Technical Program Chair of the 2002 International Symposium on Physical Design.


[^0]:    Manuscript received March 28, 2013; revised July 7, 2013; accepted August 7, 2013. Date of current version December 16, 2013. This work was supported in part by grants from the Software and Hardware Foundations of the Division of Computer and Communication Foundations of the U.S. National Science Foundation, in part by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MEST) under Grant 2012R1A6A3A03038938, and in part by the Center for Integrated Smart Sensors funded by the Ministry of Science, ICT and Future Planning as Global Frontier Project under Grant CISS-2012054193. The ICT at Seoul National University provided research facilities for this paper. This paper was presented in part at the 2012 International Symposium on Low Power Electronics and Design, Redondo Beach, CA, USA [1]. This paper was recommended by Associate Editor Y. Shin.
    W. Lee, Y. Wang, and M. Pedram are with the Department of Electrical and Electronic Engineering, University of Southern California, Los Angeles, CA 90089 USA (e-mail: woojoole@usc.edu; yanzhiwa@usc.edu; pedram@usc.edu).
    D. Shin is with the Dipartimento di Automatica e Informatica, Politecnico di Torino, Torino 10129, Italy (e-mail: donghwa.shin@polito.it).
    N. Chang is with the Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea (e-mail: naehyuck@elpl. snu.ac.kr).
    Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
    Digital Object Identifier 10.1109/TCAD.2013.2282287

