# Power Optimization Approach of ORCA Processor for 32/28nm Technology Node

Davit Babayan,

National Polytechnic University of Armenia Synopsys Armenia CJSC Yerevan, Armenia e-mail: <u>davitb@synopsys.com</u>

## ABSTRACT

This paper presents a method of power optimization implemented on RISC architecture ORCA processor with the help of power gating approach aimed at significant reduction of leakage power consumption. Presented approach results significantly decrease both dynamic and leakage power of ORCA processor when used in combination with multivoltage power reduction method.

#### **1. INTRODUCTION**

ORCA processor is a 32-bit CPU microprocessor core. Microprocessor has two main interfaces: PCI interface and source synchronous DDR interface for SDRAM. The subblock CLOCK\_GEN contains two PLLs (Phase Locked Loop) and a clock multiplier for the functional clocks (Fig.1). These two PLLs cancel the clock tree insertion delay for the PCI I/O interface timing and for the SDRAM input interface timing. The sub-block RESET\_BLOCK has a synchronizing reset circuitry for the global, asynchronous prst\_n signal. The synchronizing reset circuitry is used during functional mode, but bypassed in test mode. The design has two main interfaces, a PCI interface and an SDRAM with a source synchronous double data rate interface (DDR). The SDRAM bus is capable of addressing PC266 type memory. The DDR data bus is synchronous with both edges of the incoming and outgoing clocks. The processor core consists of a high-speed RISC machine with a power save mode. The BLENDER block is shut down during power save mode and RISC\_CORE is slowed down to half its frequency. All asynchronous interfaces between clock domains are isolated with dual-port FIFOs. [3]



Fig.1. ORCA TOP (functional block diagram)

Control PCI bus is operating at 33 (0) or 66 (1) MHz, Control RISC\_CORE operates at 200 (0) or at 100 (1) MHz.

### **2. PREVIOUS RESEARCH**

Previously research of ORCA processor power reduction with multi-voltage method was performed using different voltage supplies for different power domains (RISC core). As a result, power consumption was decreased by about ~15%, compared with standard design, but the area overhead was about ~12%, timing characteristics were globally unchanged (RISC core clock frequency 200MHz). [1]

| Frequency          | 200 MHz                              |
|--------------------|--------------------------------------|
| Data required time | 20.21 ns                             |
| Data arrival time  | -20.20 ns                            |
| Slack(MET)         | 0.01 ns                              |
| Total Power        | 75.46 mW (-15%)                      |
| Macro/Black Box    | 16340.796387 μm <sup>2</sup>         |
| area               |                                      |
| Total cell area    | 661980.75374 μm <sup>2</sup> (+15%)  |
| Total area         | 678321.550135 μm <sup>2</sup> (+12%) |

Table1. Results of timing/power/area report with multi voltage design method ORCA/RISC core implementation

Deep investigation of ORCA processor structure showed that RISC core consists of more than 1000 registers, and about ~60% of total power is spent on registers [2]. This evidence made it possible to consider power-gating method to be efficient in decreasing power of RISC core. Replacing all registers with retention type will provide power reduction, which at the same time will increase area.

## 3. THE POWER GATING IMPLEMENTATION

Power gating method is one of the main power reduction methods. For its implementation ISOLATION and RETENTION cells are used in the design. ISOLATION cell usually consists of logic-NAND (with 2 inputs) from the library and two transistors (p-MOS connected to VDD and n-MOS connected to the ground) with the ENABLE signal connected to the gates of transistors (Fig.2).



Fig.2. ISOLATION cell structure

ISOLATION cells are placed around the borders of shutdown power domains and effectively keep stable signal at the outputs of the sub-block during inactive mode by the application of ENABLE signal. [5]

During power off (shut-down) mode, there is a necessity to save the state and restore it after wake-up implemented using RETENTION registers (sometimes called SAVE/RESTORE registers) (Fig.3). These have second lower backup power supply (VDDG) which always stays active even when main supply (VDD) is off.



Fig.3. RETENTION register structure

### **4. DESIGN PROCCES**

The design flow of ORCA with power gating method fully fits into standard digital design flow with UPF integration presented in (Fig.4).



Fig.4. ORCA design steps with power gating method.

During implementation the power gating method was chosen for RISC sub-block as it contains high and low-performance parts. Design specification describes differences between two low power optimization methods (power gating and multi-voltage design [1]). Unified Power Format (UPF) description was developed for power gating implementation in both logic and physical design processes (Fig.5).

| create_power_domain TOP                 |  |
|-----------------------------------------|--|
| create_power_domain RISC -elements RISC |  |

## TOPLEVEL CONNECTIONS # VDD create\_supply\_port VDD create\_supply\_net VDD -domain TOP connect\_supply\_net VDD -ports VDD create\_supply\_net VDD -domain RISC -reuse # VSS create\_supply\_port VSS create\_supply\_net VSS -domain TOP create\_supply\_net VSS -domain RISC -reuse connect\_supply\_net VSS -ports VSS # VDDG create\_supply\_port VDDG create\_supply\_net VDDG -domain TOP create\_supply\_net VDDG -domain GPRS -reuse connect\_supply\_net VDDG -ports VDDG create\_supply\_net VDDGS -domain RISC ## PRIMARY POWER NETS set\_domain\_supply\_net TOP -primary\_power\_net VDD primary\_ground\_net VSS set\_domain\_supply\_net RISC -primary\_power\_net VDDGS primary\_ground\_net VSS ## RISC SETUP SWITCH create\_power\_switch risc\_sw \ -domain RISC \ -input\_supply\_port {in VDDG} \ -output\_supply\_port {out VDDGS} \ -control\_port {risc\_sd PwrCtrl/risc\_sd} \ -on\_state {state2002 in {risc\_sd}} set\_isolation risc\_iso\_out \ -domain RISC \ -isolation\_power\_net VDD -isolation\_ground\_net VSS \ -clamp\_value 1 \ -applies\_to outputs set\_isolation\_control risc\_iso\_out \ -domain RISC \ -isolation\_signal PwrCtrl/risc\_iso \ -isolation sense low -location parent # RETAIN set\_retention risc\_ret -domain RISC \ -retention\_power\_net VDDG -retention\_ground\_net VSS set\_retention\_control risc\_ret -domain RISC \ -save\_signal {PwrCtrl/risc\_restore low} -restore\_signal {PwrCtrl/risc\_restore high} map\_retention\_cell risc\_ret \ -domain RISC \ -lib\_cells {RDFFNX1 RDFFARX2 } # ADD PORT STATE INFO add\_port\_state VDD -state {HV 0.95} add\_port\_state VDDG -state {LV 0.7} add\_port\_state risc\_sw/out -state {LV 0.7}-state {OFF off} add\_port\_state VSS -state {GND 0} ## CREATE PST create\_pst orca\_pst -supplies {VDD VDDG VDDGS } add\_pst\_state function1 -pst orca\_pst -state {HV LV LV } add\_pst\_state sleep -pst orca\_pst -state {HV LV OFF }

Fig.5. Unified Power Format (UPF) for power gating

In UPF diagram (Fig. 6) two power domains were defined. Special cells ISOLATION were placed around the boundary of the chosen domain. Standard registers were replaced with RETENTION registers. In the result UPF synthesis used the same design constraints for frequency (for PCI clock at 75 MHz, System RISC clock at 200 MHz, SDRAM clock at 75 MHz) and physical utilization: 30% as multi-voltage design. Values of power, timing and area of power gating and multivoltage designs are shown in Table 1.



Fig.6. Power gating UPF diagram for ORCA

|                         | Power gating                         | multi-voltage                |
|-------------------------|--------------------------------------|------------------------------|
| Frequency               | 200 MHz                              | 200 MHz                      |
| Data required time      | 27.48 ns                             | 20.21 ns                     |
| Data arrival time       | -24.86 ns                            | -20.20 ns                    |
| Slack(MET)              | 2.62 ns                              | 0.01 ns                      |
| Total Power             | 69.42 mW                             | 75.46 mW<br>~(- 8%)          |
| Macro/Black Box<br>area | 16340.8µm <sup>2</sup>               | 16340.8µm <sup>2</sup>       |
| Total cell area         | 807616.35µm <sup>2</sup> (+22%)      | 661980.75<br>μm <sup>2</sup> |
| Total area              | 823956.35<br>μm <sup>2</sup> ~(+21%) | 678321.55<br>μm <sup>2</sup> |

Table2. Results of timing/power/area report with power gating design method and multi-voltage design method

Total power of the circuit was reduced by more than 8% compared to multi-voltage design (Table 2), and by more than 23% compared with standard design. However, total area of design increased by ~21% mainly in register cell area (22%). Increased area is due to retention flip-flops being much bigger than standard flops as well as additional isolation cells. 200 MHz frequency is still supported (with increase of 7ns in input to output latency). Differences between power gaiting, multi-voltage design and standard design are presented in Fig. 7 for power and in Fig. 8 for area respectively.



Fig.7. Power consumption for power gating, multi-voltage and standard designs.



Fig.8. Die area of power gating, multi-voltage and standard designs.

## **5. CONCLUSION**

Power gating design is an efficient method of reduction of ORCA/RISC processors power consumption. Compared with other methods of power optimization [1] (multi-voltage design) power gating is efficient by more than 8% with the same timing specification. Moreover, power gating method is more favorable if area increase can be neglected.

#### 6. ACKNOWLEDGEMENTS

Design was implemented using SAED 32/28nm EDK developed by Synopsys Armenia Educational Department with the help of Synopsys Design Compiler and IC Compiler tools made available by Synopsys Armenia Educational Department [4].

## REFERENCES

[1] Melikyan V., Babayan D., Babayan E., Petrosyan P., Melkonyan V., 32/28 nm low power ORCA processor with multi-voltage supply, ICSMN-2015

[2] Jason M. Hart, Member, IEEE, Hoyeol Cho, Yuefei Ge, Gregory Gruber, Dawei Huang, Member, IEEE, A 3.6 GHz 16-Core SPARC SoC Processor in 28 nm, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 49, NO. 1, JANUARY 2014

[3] ORCA documentation, Synopsys Inc. Synopsys Inc. 2008

[4] Goldman,R,. Bartleson, K.; Wood, T.; Kranen, K.; Melikyan, V.; Babayan, E. 32/28nm Educational Design Kit: Capabilities, deployment and future, 2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

[5] Gourisetty, Venkatesh, et al. "Low power design flow based on Unified Power Format and Synopsys tool chain." Interdisciplinary Engineering Design Education Conference (IEDEC), 2013 3rd. IEEE, 2013