## Low-power pipelined phase accumulator with sequential clock gating for DDFSs

## Y.S. Kim, J. Lee, Y. Hong, J.E. Kim and K.-H. Baek

A pipelined phase accumulator (PACC) for direct digital frequency synthesisers (DDFSs) is presented. A highly pipelined structure is inevitable in a PACC design to achieve high-speed performance, which causes a large number of pre-skewing flip-flops (F/Fs) and leads clock signals to be a large source of power dissipation. Since the input data do not change every single cycle, clock gating can save power by decreasing the number of unnecessary clock switching in the pre-skewing F/Fs. Sequential clock gating for pipelined PACCs is proposed. Compared with the conventional pipelined PACCs with and without clock gating, the proposed scheme reduces power dissipation by up to 55.4 and 77.2%, respectively, for the 32-bit 8-pipeline-stage PACCs.

Introduction: Direct digital frequency synthesisers (DDFSs) have been in greater demand than phase-locked loops for communication systems requiring precise controls and fast frequency switching [1]. To enhance the speed performance in conventional DDFSs, a highly pipelined phase accumulator (PACC) is adopted, which is followed by a phase-to-sine amplitude converter (PSC) and a D/A converter. As the pipeline depth increases for higher-speed operation, a PACC becomes the most powerconsuming block in a DDFS [2]. Many studies have focused on reducing the power dissipation in a PACC and its clock distribution. Clock gating schemes that reduce the number of pre-skewing flip-flops (F/Fs) in a PACC have been presented in [3, 4]. However, the updating speed of a frequency control word (FCW) is limited by the depth of pipelining. In this Letter, sequential clock gating is proposed to reduce power dissipation in the clock distribution of a PACC. Similar to conventional clock gating, pre-skewing F/Fs are turned off when their values are unchanged. Once the FCW updates, pre-skewing F/Fs are enabled sequentially column-wise. Comparisons are presented for three types of PACCs: a conventional pipelined PACC; a pipelined PACC with a single clock gating scheme; and a pipelined PACC with the proposed sequential clock gating.

Conventional pipelined PACC: In a conventional N-bit pipelined PACC, an N-bit adder and N F/Fs are partitioned into M blocks to increase the throughput of the accumulator by a factor of M, where N=32 and M=8 in Fig. 1. Then, the N-bit phase output out is truncated to k bits to reduce hardware complexity. Pre-skewing F/Fs composed of shift registers are added in front of a partitioned accumulator (ACC) to synchronise an N-bit FCW with carry out signals from each stage. As the depth of pipelining M increases for higher-speed operation, the power consumption and chip area increase because of the increased number of pre-skewing registers as N(M+1)/2. Post-skewing F/Fs synchronise the pre-skewed accumulator output by adding latencies for each pipelining stage, and its count becomes  $k (k \cdot M/N - 1)/2$ . The global clock CK is distributed throughout the PACC and toggles individual F/Fs. Since the system can neither detect the change in the input values of the individual F/Fs nor shut some portions of the F/Fs off, the clock distribution for CK in the pre-skewing F/Fs continuously consumes the same amount of power, regardless of the value of FCW. Assuming that dynamic power dissipation is dominant for a clock distribution network, its power dissipation for the pre-skewing F/Fs becomes  $P_{\rm D} = \alpha C_{load} V_{\rm DD}^2 f_{\rm CK}$ , where  $\alpha$  (=1) is a switching activity, and  $C_{load}$  is a load connected to CK.



**Fig. 1** *Simplified block diagram of pipelined PACC with global clock CK* (N = 32, M = 8 and k = 12)

*Single clock gating:* Clock gating is a widely used technique in synchronous circuits for reducing dynamic power dissipation by disabling portions of a clock distribution network at the expense of additional logic [3, 4]. Clock gating can be applied to pre-skewing F/Fs in a PACC because the *load* signal indicates when FCW is updated. When FCW is not updated, the clock distribution for the pre-skewing F/Fs in Fig. 2a is disabled. All the pre-skewing F/Fs are tied to a single gated clock GCK so that they are turned on and off simultaneously. GCK can be generated by using a gated clock generator with CK and *load* signals as inputs, where *load* becomes high once FCW is updated. Fig. 2b depicts a detailed block diagram of the conventional gated clock generator for pre-skewing F/Fs. A shift register with M (=8) F/Fs passes the *load* signal; then, an M-bit OR gate remains high for M clock cycles. The timing diagram of the conventional single clock gating scheme is shown in Fig. 2c.



## Fig. 2 Single clock gating

a Block diagram for pre-skewing F/Fs (N=32 and M=8)

b Gated clock generator

c Timing diagram

Proposed sequential clock gating: There are two main drawbacks of the single clock gating scheme. The first is that GCK needs to be enabled for at least M clock cycles, as shown in Fig. 2c, resulting in unwanted power consumption because GCK is connected to all pre-skewing F/Fs. The second drawback is that no power savings can be obtained when the FCW is updated faster than the M clock cycles. Unlike the single clock gating scheme for pre-skewing F/Fs, the proposed sequential clock gating works with multiple clock signals, as shown in Fig. 3a. Each column of the F/F array is connected to its own gated clock, and the FCW is sequentially loaded from left to right. Fig. 3b shows the detailed block diagram for the sequential clocks composed of a shift register and AND gates. After being activated by the load signal, each gated clock is enabled and disabled sequentially for only one clock cycle, as depicted in Fig. 3c. Thus, the proposed scheme overcomes the first drawback of the single clock gating scheme and thereby reduces the dynamic power consumption of the clock distribution for pre-skewing F/Fs. The dotted area is the timing that leads to power savings compared with the single clock gating scheme. Since each gated clock signal is activated whenever the load signal is applied, the second drawback of the conventional single clock gating scheme can be eliminated in the proposed design without increasing hardware complexity.



Fig. 3 Proposed sequential clock gating

*a* Block diagram for pre-skewing F/Fs (N = 32 and M = 8)

c Timing diagram

Simulation results and comparisons: A 0.18-µm CMOS technology is used for simulations with three types of 32-bit PACCs: a conventional

b Gated clock generator

pipelined PACC; a pipelined PACC with single clock gating; and a pipelined PACC with the proposed sequential clock gating. Eight pipeline stages are applied to all PACCs to operate at an 800-MHz clock speed. For a fair comparison, the rest of the PACCs, except the preskewing F/Fs and their clock buffers, are identically designed. When the FCW is set to be updated every eight clock cycles, the single clock gating scheme exhibits the worst performance compared with the others, mainly because of the additional power consumption in the circuitry for clock gating without reducing power in the clocking distribution. The power breakdown for the three types of PACCs is depicted in Fig. 4, where each power is normalised to the conventional pipelined scheme. More than 60% of the power is consumed by the pre-skewing clock buffers for a pipelined PACC, whether or not a single clock gating scheme is used. By using the proposed sequential clock gating scheme, the power consumption of the pre-skewing clock buffers can be reduced by more than 47%. The overall power of the proposed PACC can be reduced by more than 51.5% compared with the other schemes.



Fig. 4 Power breakdown for three types of PACCs

The power consumption of a PACC depends on FCW. Fig. 5 compares the power dissipation for three types of PACCs for various FCW update rates. For an FCW update rate of up to 8, the relative power consumption for single clock gating increases as discussed. If the FCW update rate is larger than 8, the power consumption of the single clock gating scheme becomes lower than that of the conventional pipelined scheme. The proposed scheme consumes the least amount of power among others, except for when the FCW is updated every clock cycle. For the given FCW update rate, the power consumption of the proposed scheme can be reduced by up to 55.4 and 77.2% compared with the conventional pipelined PACCs with and without a single clock gating scheme, respectively.



Fig. 5 Comparison of power dissipation for three types of PACCs

Conclusion: A low-power pipelined PACC for DDFSs is presented. As the pipeline depth increases for higher-speed operation, a PACC becomes the most power-consuming block in a DDFS, mainly because of the increased number of pre-skewing F/Fs and their clock buffers. A single clock gating scheme can be applied to reduce dynamic power dissipation in clock distribution. However, no power savings are achieved when the input data are updated quickly. The proposed sequential clock gating overcomes this problem with multiple clock signals for pre-skewing F/Fs and disabling them accordingly. The proposed scheme is not limited to CMOS technology but can be used for various compound semiconductor technologies.

Acknowledgments: This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grant no. 2013028468) and also by the Chung-Ang University Excellent Student Scholarship in 2012.

© The Institution of Engineering and Technology 2013 2 August 2013

doi: 10.1049/el.2013.2588

One or more of the Figures in this Letter are available in colour online.

Y.S. Kim, J. Lee, Y. Hong, J.E. Kim and K.-H. Baek (School of Electrical and Electronics Engineering, Chung-Ang University, 221 Heukseok-dong, Dongjak-gu, Seoul 156-756, Korea) E-mail: kbaek@cau.ac.kr

## References

- Yeoh, H.C., Jung, J.-H., Jung, Y.-H., and Baek, K.-H.: 'A 1.3-GHz 1 350-mW hybrid direct digital frequency synthesizer in 90-nm CMOS', IEEE J. Solid-State Circuits, 2010, 45, pp. 1845-1855
- Yang, C.-Y., Weng, J.H., and Chang, H.Y.: 'A 5-GHz direct digital frequency synthesizer using an analog-sine-mapping technique in 0.35-µm SiGe BiCMOS', IEEE J. Solid-State Circuits, 2011, 46, pp. 2064-2072
- Kim, Y.S., and Kang, S.-M.: 'A high speed low-power accumulator for direct digital frequency synthesizer'. IEEE MTT-S Int. Microwave Sym. Dig., San Francisco, CA, USA, June 2006, pp. 502-505
- Jung, Y.-H., Yoo, T., Cho, S.-J., and Baek, K.-H.: 'Pipelined phase accumulator using sequential FCW loading scheme for DDFSs', IET Electron. Lett., 2012, 48, pp. 1044-1046