Xu N, Ma Y, Liu J *et al.* Thermal-aware post layout voltage-island generation for 3D ICs. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 28(4): 671–681 July 2013. DOI 10.1007/s11390-013-1367-8

# Thermal-Aware Post Layout Voltage-Island Generation for 3D ICs

Ning Xu<sup>1</sup> (徐 宁), Senior Member, CCF, Yu-Chun Ma<sup>2,3,\*</sup> (马昱春), Jia Liu<sup>1,2</sup> (刘 佳) and Shou-Chun Tao<sup>1,2</sup> (陶守春)

<sup>1</sup>School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China

<sup>2</sup>Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

<sup>3</sup> Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China

E-mail: 873124566@qq.com; myc@mail.tsinghua.edu.cn; liujia251267875@gmail.com; tsc211@126.com

Received July 16, 2012; revised February 7, 2013.

**Abstract** To reduce the interconnect delay and improve the chip performance, three-dimensional (3D) chip emerged with the rapid increasing of chip integration and chip power density. Therefore, thermal issue is one of the critical challenges in 3D IC design due to the high power density. Multiple Supply Voltages (MSV) technique provides an efficient way to optimize power consumption which in turn may alleviate the hotspots. But the voltage assignment is limited not only by the performance constraints of the design, but also by the physical layout of circuit modules since the modules with the same voltage should be gathered to reduce the power-network routing resource. Especially in 3D designs, the optimization using MSV technique becomes even more complicated since the high temperature also influences the power consumption and delay on paths. In this paper, we address the voltage-island generation problem for MSV designs in 3D ICs based on a mixed integer linear programming (MILP) model. First, we propose a general MILP formulation for voltage-island generation to optimize thermal distribution as well as power-network routing resources while maintaining the whole chip performance. With the thermal-power interdependency, an iterative optimization approach is proposed to obtain the convergence. Experimental results show that our thermal-aware voltage-island generation approach can reduce the maximal on-chip temperature by 23.64% with a reasonable runtime and save the power-network routing resources by 16.71%.

Keywords three-dimensional integrated circuit, multiple supply voltage, thermal, mixed integer linear programming

## 1 Introduction

With the fast development of VLSI technology, interconnect delay and performance degradation have become its critical bottleneck. Three-dimensional (3D) integration, as Fig.1 shows, has drawn much attention due to its potential to solve this problem. Using 3D ICs, designers can not only stack dies vertically in the same stack, but also connect components on different dies by through-silicon via (TSV) with the reduced interconnect delay. Nevertheless, the stacked layers with the active blocks increase the power density and the thermal conductivity of dielectric between layers is extremely low. Consequently, thermal issue is one of the significant challenges in 3D IC design. To remove this challenge, one positive way is to reduce power consumption<sup>[1]</sup>. Especially in 3D designs, the temperature relies on the power distribution greatly. It is a critical issue to optimize the power consumption and temperature distribution with performance constraints considered.



Fig.1. 3D IC technology.

As one of the techniques to optimize power consumption, Multiple Supply Voltage (MSV) has been widely used to control the power and performance tradeoff<sup>[2-8]</sup>. Since lowering supply voltage may cause performance degradation, in order to minimize the drawback, designers usually trade timing slacks for power reduction by selecting appropriate supply voltages. Besides, there are extra costs to use MSV technology, including the cost of power supply network routing resource for different modules in different supply voltages, and the cost of additional level-shifters which are necessary when a block with low-voltage needs to be shifted to a

Regular Paper

This work is supported by the National Natural Science Foundation of China under Grant No. 61076035 and TNList Crossdiscipline Foundation of Tsinghua University, China.

<sup>\*</sup>Corresponding Author

 $<sup>\</sup>textcircled{O}2013$ Springer Science + Business Media, LLC & Science Press, China

high-voltage. In order to minimize these costs, voltageisland appears by clustering blocks with the same voltage into one or more islands.

In 3D IC design, some literatures tackle the thermalaware optimization based on layout optimization. Cong et al.<sup>[9]</sup> proposed a thermal-driven floorplanning algorithm for 3D ICs. It employs a simulated annealing with an integrated compact thermal model. Huang et al.<sup>[10]</sup> proposed a thermal-aware floorplanning for 3D microprocessors. The power consumption of interconnect was considered during floorplanning. But the key point to alleviate thermal issue is to reduce the power consumption. Both [11] and [12] proposed the TSV planning approaches in 3D designs to optimize both the thermal issue and power issue. Though MSV has been proved to be an effective approach to optimizing the power distribution, most of the voltage-island generation methods are based on 2D designs. Hu *et al.*<sup>[2]</sup> used a simulated annealing (SA) based algorithm to generate voltage-islands on core-based 2D System-on-Chip (SoC) designs at the floorplanning stage. After that, Hung et al.<sup>[3]</sup> took power reduction and thermal distribution into account using combined genetic algorithm (GA) with simulated annealing algorithm (SA). Mak and Chen<sup>[4]</sup> formulated the generation of voltage-island as an integer-linear-programming (ILP) problem whose objective is to reduce total power. Chang et al. used dynamic programming<sup>[5]</sup> and mixed integer linear programming (MILP)<sup>[6]</sup> to generate voltage-islands in 2D floorplan with power network routing resource, power reduction and speed degradation considered. In addition, Ma et al.<sup>[7]</sup> proposed an incremental floorplanning by moving blocks incrementally so that the power consumption and the voltage-island generation can be further optimized. But most of the previous researches are based on 2D IC design and few of them take thermalaware design issue into consideration. Though in [8] an MSV assignment method was proposed to minimize the total power consumption of 3D IC with thermal issue considered, the proposed method is based on grid-based post placement approach. But the module-based floorplanning could not be handled using the proposed approach. In current 3D IC technologies, there are several different technologies for P/G network including TSVbased P/G network<sup>[11]</sup> or wire bonded P/G network<sup>[8]</sup>. Due to the limitation of TSV techniques, the most used technology would be the wire bonded P/G network in which P/G network on each layer is connected to the package I/O directly by wire bonding, so that there is no directed connection between P/G networks on different layers. Therefore, the P/G network in 3D design could be regarded as the stack of several P/G networks in 2D chip. But the most critical issue in 3D IC would

be the severe thermal issue and the voltage-island generation on the multiple stacked layers would face more challenges compared with the lateral design space. The main challenges include:

Thermal-Aware Voltage Assignment. As shown in Fig.2, the temperature distribution varies a lot among different layers that on the higher layer, the temperature value is higher due to the accumulative thermal resistance. Different with the voltage assignment in 2D designs, which should lower the voltage on the noncritical paths as much as possible, the voltage assignment in 3D designs could assign low voltage values to the hot blocks with timing constraints considered.



Fig.2 Temperature diagram for 3D IC design: 4 vertically stacked layers in Z (vertical) direction. T is the temperature and Si j means the j-th silicon layer.

Power-Thermal-Performance Dependency. In previous 2D designs, performance evaluation is based on fixed temperature estimation since the thermal distribution on a single layer is relatively even. But in 3D cases, the various temperature values among layers may in turn change the power distribution and the delay of the paths. And the high temperature also influences the delay of each component which will also in turn influence the voltage assignment results. Therefore, the voltage-island generation and the thermal distribution influence with each other greatly.

In order to guarantee the performance in 3D MSV designs, the voltage assignment should be operated in an iterative process considering the thermal distribution, power consumption and the layout of the circuit blocks. In this paper, we propose a method to optimize the voltage-island generation so that not only the maximal on-chip temperature can be reduced due to the proper power reduction, but also the performance and voltage-island distribution can be optimized at the same time. Our major contributions are summarized as follows:

1) An MILP-based thermal-aware voltage-island generation methodology is proposed to formulate the voltage assignment problem in 3D ICs. Based on the abstract thermal evaluation model, we set up a new MILP formulation for 3D voltage-island generation problem so that not only the power consumption and temperature distribution can be optimized, but also the additional cost such as power routing resources and the amount of the level shifters can be considered.

2) The voltage-island generation is a chicken-egg problem considering the interdependency between the temperature distribution and power density distribution. Therefore, we propose an iterative process based on MILP formulation to tackle this thermal-power interdependency with various voltage assignments. With the effective iterative approach, we can achieve tradeoff between thermal optimization and other objectives considering various constraints simultaneously.

The rest of the paper is organized as follows: Section 2 defines our problem; Section 3 introduces our thermal-aware model; the MILP-based voltage-island generation methodology is described in Section 4; Section 5 reports the experimental results and Section 6 is the conclusion.

#### 2 Problem Definition

Multi-Supply Voltage (MSV) was introduced to provide higher flexibility in controlling the power and performance trade-off. In region-based MSV, circuits are partitioned into "voltage-islands" where each island occupies a contiguous physical space and operates at one supply voltage. For thermal reduction in voltage-island generation process, considering different voltage-island distribution may bring diverse costs of power-network routing resources. In terms of these, our target is to generate appropriate voltage-islands for the whole chip with maximal thermal reduction considering the performance constraints. The inputs of the problem include:

• A packing consisting of n rectangular blocks from a set  $B = \{b_1, b_2, \ldots, b_n\}$  placed on L layers without overlaps, for each  $b_i \in B$ ,  $(x_i, y_i)$  and  $(rx_i, ry_i)$  illustrate the lower-left and upper-right coordinates respectively,  $Z_i$ means the layer number  $b_i$  is placed.

• The netlist to describe the connections between blocks in B.

• Timing constraints  $T_{\text{cycle}}$  which mean that the maximal delay along a path including both blocks and wire connections should not exceed  $T_{\text{cycle}}$ .

• A set of candidate voltages V provided for each block  $b_i \in B$  to choose,  $V = \{V_1^i, V_2^i, \ldots, V_m^i\}$ , m is the number of available voltages. Each voltage  $V_k^i \in V$  has a corresponding power density  $p_k^i$  and a corresponding delay  $d_k^i$ . Normally the larger voltage is assigned, the larger the power and the smaller delay will generate.

The objective is to minimize temperature and power-

network routing resources with different voltage-islands generated. To explain our approach, several definitions are introduced.

**Definition 1** (Voltage-Island). To favor the construction of P/G network with multiple voltages, the blocks with the same voltage assignment on the same layer are always clustered into one or more voltageislands that in each voltage-island all the block are operated at the same supply voltage and each block is adjacent to at least one other block in the island.

Fig.3 shows the layout of voltage-islands in a 2-layer design. In this paper we focus on the wire-bonded P/G network in which the dies on each layer have their own P/G networks as shown in Fig.3(c), which are supplied by the power pads located at the package around the chip. Since voltage-island assignment should be planned before P/G planning, we focus the voltage assignment in this paper considering the estimation of power network resource on individual dies. For each block, there are four possible voltage levels to be assigned. From the layout, we can see that the blocks with the same voltage are clustered to compose the voltage-island so that the power network resources, we use the following notion.



Fig.3. Voltage-islands. (a) Clustering of four voltage-islands,  $\{b_1, b_2, b_4\}$ ,  $\{b_3, b_5\}$ ,  $\{b_6\}$  and  $\{b_7\}$ . (b) Layout of the voltage-islands in 3D ICs. (c) Wire-bonded 3D P/G network.

**Definition 2** (Power-Network Routing Resource). The power-network routing resource (PNRR) of a voltage-island is evaluated by half perimeter evaluation of the minimal bounding box enclosing the island on the same die.

As shown in Fig.3(a), the black solid box stands for the minimal rectangle which can cover voltage-island  $\{b_3, b_5\}$ . Therefore, PNRR of this voltage-island is evaluated by the sum of the width and height of the box. For the voltage-island construction in Fig.3(b), there are four islands on the top layer and three islands on the bottom layer. Thus PNRR cost is evaluated for each layer and summed together to obtain the final evaluation.

To obtain the feasible design with cool temperature, the blocks will be assigned with different voltage levels. To alleviate the maximal temperature on the top layer, the blocks on that layer tend to choose low voltage value. But other constraints, especially the timing constraints may limit the assignments of the low voltages. In turn, the high temperature may cause the increase of the delay on modules. Then our goal is to obtain a stable voltage assignment so that all the metrics can be optimized including temperature and power routing resources, etc. Therefore the objective of our problem can be expressed as

$$\min(\alpha \times Temp + \beta \times PNRR),$$

where *Temp* is the estimation of temperature, PNRR is the estimation of power network routing resources which will be explained in the following section,  $\alpha$ ,  $\beta$ are weights to balance the values for these two items.

#### 3 Thermal-Aware Models

## 3.1 Thermal Resistance Model

There exists a prominent duality between the thermal conductivity and current flow. Wilkerson *et al.*<sup>[13]</sup> proposed a thermal resistance model similar to the current-resistance model in circuit theory. It has been widely used for temperature profiling in 3D ICs in many literatures<sup>[9,14-17]</sup>. The 3D circuit is divided by a series of tile stacks with the thermal current source and the corresponding resistors. As shown in Fig.4(a), each tile stack is composed by vertically-stacked tile stacks on different layers, as shown in Fig.4(b). These tile stacks are connected by lateral thermal resistance  $R_{\text{lateral}}$  and form a resistance network. Within each tile stack, a thermal resistor  $R_i$  is modeled for the *i*-th layer, while the thermal resistance of the bottom layer and silicon substrate is modeled as  $R_b$ .  $P_i$  represents the power density of all grids in the resistance network, as shown in Fig.4(c). Then we can use the following equation to obtain the thermal profile as a function of power profile.

$$\begin{pmatrix} R_{11} & \cdots & R_{1m} \\ \vdots & \ddots & \vdots \\ R_{m1} & \cdots & R_{mm} \end{pmatrix} \begin{pmatrix} P_1 \\ \vdots \\ P_m \end{pmatrix} = \begin{pmatrix} T_1 \\ \vdots \\ T_m \end{pmatrix}.$$

Through these linear equations, we can capture the temperature in each layer for each block. The ambient temperature is set to 27°C.



Fig.4. Resistence thermal model for 3D ICs<sup>[13]</sup>. (a) Tiles stack array. (b) Single tile stack. (c) Tile stack analysis.

#### 3.2 Thermal-Aware Timing Analysis

As we know that temperature has a strong impact on gate leakage current, sub-threshold current, and then the delay of each block. Thus the increased temperature will increase the delay for each module and then influence the voltage assignment which will in turn influence the power distribution and temperature as well. In this paper, we adopt the temperature aware delay calculation equation<sup>[8]</sup> as follows:

$$Delay = \alpha_0 + \alpha_1 L_{\text{eff}} + \alpha_2 t_{ox} + \alpha_3 T, \qquad (1)$$

where,  $L_{\text{eff}}$  is the effective channel length,  $t_{ox}$  is the oxide thickness, T is the temperature value and  $\alpha_i$ , i = 0, 1, 2, 3, are coefficients.

With the connection between blocks, we can set up a directed acyclic graph to show the signal paths which is called as timing graph. In this graph, each node represents a block in a given packing, and the weight on an edge indicates the delay between two blocks. In this work, we apply static timing analysis (STA)<sup>[8]</sup> to obtain the timing slack for each node. The slack is calculated as follows:

$$t_i^{\alpha} = \max_{j \in FI(i)} (t_j^{\alpha} + td(j) + d(i, j)),$$
  
$$t_i^{r} = \min_{k \in FO(i)} (t_K^{r} - td(i) - d(i, k)),$$
  
slack<sub>i</sub> =  $t_i^{r} - t_i^{\alpha},$ 

where  $t_i^{\alpha}$ ,  $t_i^r$  and slack<sub>i</sub> denote the arrival time, the required time and the timing slack of node *i*, respectively. td(j) is the gate delay of block *j*, and d(i, j) is the interconnect delay between *i* and *j*. Here we give an example timing graph in Fig.5. As a part of a timing topology graph shown in Fig.5, we give (delay, arrival time, required time) pair for each node.

As we know, the blocks in each netlist share the same total timing slack. However in order to obtain maximal power reduction, we need to distribute the slack to the



Fig.5. Timing analysis example.

nodes which can provide the most benefits. To ensure the performance constraints, the delay in each netlist should not exceed the timing constraint  $T_{\text{cycle}}$ . Also, for each block, the arrival time should not exceed the required time. These constraints are all delay related which should be considered to drive our optimization flow by formulating all these factors into the MILP formulation.

# 4 MILP-Based Voltage-Island Generation Methodology

Our method is based on the MILP formulation similar to the method used in [6] to generate voltage-island for a given floorplan. But in thermal-aware 3D design, we need to put focus on the thermal effects in this formulation. Before going into the detail of the MILP formulation, we need to find a way to represent the voltage-islands so that the adjacency between blocks can be represented in MILP formulation.

#### 4.1 Representation of Voltage-Islands

For voltage-island design, we need to obtain the connection of physical location for all blocks. Lee *et al.*<sup>[6]</sup> proposed a method called Relax Block Adjacency to capture the physical proximity of blocks in a floorplan. Therefore some specific definitions are introduced as following.

**Definition 3** (Relax Block Adjacency). Suppose that  $b_i$  and  $b_j$  are two blocks on the same layer, given a constant  $l \ge 0$ , block  $b_j$  is considered *l*-adjacent to  $b_i$  if  $b_j$  overlaps with the bounding box that encloses  $b_i$ and whose perimeter is longer than  $b_i$  by *l* around the boundaries. In Fig.6, for example, we consider block  $b_2$  to be *l*-adjacent with block  $b_1$  since the minimal distance between  $b_1$  and  $b_2$  is less than *l*. Obviously, the relaxed block adjacency enables us to explore a larger solution space for voltage-island generation. Therefore, besides the aforementioned problem inputs, we require a user-specified constant *l* to define the relaxed block adjacency.

To cluster voltage-islands in a floorplan, we require a data structure to represent the block adjacency in the floorplan. In this paper, we employ block-adjacency graph (BAG)<sup>[6]</sup> to represent the block adjacency relation.



Fig.6. Example for relax block adjacency. The block  $b_2$  is *l*-adjacent to  $b_1$ , while  $b_3$  is not.

**Definition 4** (Block-Adjacency Graph). Given a floorplan with a set B of blocks and a constant l, the block-adjacency graph (BAG) is an undirected graph G = (N, E) where each node  $n_i \in N$  represents a block  $b_i \in B$ , and each edge  $(n_i, n_j) \in E$  represents an ladjacency between blocks  $b_i$  and  $b_j$ . Besides, each edge  $(n_i, n_j)$  is associated with a weight which equals the half perimeter wirelength of the minimal bounding box of the blocks  $b_i$  and  $b_j$ . Different with the timing graph defined above, the block adjacency graph is an undirected graph.

Fig.7(a) shows a floorplan with seven blocks, and Fig.7(b) shows the corresponding BAG with l = 0. Based on BAG, we can represent the block adjacency and find the abutted blocks to one block efficiently.



Fig.7. Block-adjacency graph. (a) Floorplan with seven blocks.(b) Corresponding BAG with l = 0.

## 4.2 ILP Formulation for Voltage-Island Generation

The goal of our problem is to minimize the temperature and the routing cost with several constraints. According to Section 3, the resistance thermal model cannot be solved efficiently and it is hard to represent the temperature value directly in linear programming formulation. But the temperature can be treated as the accumulation of projected power. Therefore, we propose a weighted power formulation to evaluate the temperature in MILP formulation.

### 4.2.1 MILP Model for Temperature

The temperature values heavily rely on the power consumption. The total power consumption comes from not only circuit blocks, but also the level-shifters. To generate voltage-island for a given floorplan, we assume the number of available voltage levels is m. Then for each block  $b_i$  in a given floorplan, m binary variables:  $BV_1^i, BV_2^i, BV_3^i, \ldots, BV_m^i$ , are introduced to represent which voltage level is chosen. Since each block can just operate at one voltage, for each block i, only one variable in  $BV_1^i, BV_2^i, BV_3^i, \ldots, BV_m^i$  can be chosen as 1 and all the others should be 0.

$$\sum_{k=1}^{m} BV_{k}^{i} = 1.$$
 (2)

For power consumption  $P_i$  for each block, we formulate it using the given power density  $p_i^k$  for voltage level k,

$$P_i = \sum_{k=1}^m p_i^k \times A_i \times BV_k^i,$$

where  $A_i$  denotes the area of block  $b_i$ .

For the power consumption of level-shifter  $PS_{ij}$ , let binary variable  $VS_{ij}$  illustrate the voltage equivalence between block  $b_i$  and output node  $b_j$ . Since the levelshifter will be required and bring power cost  $PS_{ij}$  only if the voltage between block  $b_i$  and output node  $b_j$  is distinct. So we have,

$$PS_{ij} = P_{LS} \times VS_{ij}, \tag{3}$$

where  $P_{LS}$  is the power for one lever-shifter. For binary variable  $VS_{ij}$ , we define that  $VS_{ij} = 0$  only if the voltage of block  $b_i$  is equal to the voltage of its output node, otherwise  $VS_{ij} = 1$ . So, for each output node  $b_j$ of node  $b_i$  in a timing graph and each voltage level k in m, we have,

$$VS_{ij} \ge BV_k^i - BV_k^j$$
, for  $k \in [1, m]$ . (4)

Formula (4) means that if the voltage assignments of node *i* and node *j* are the same, then no level shifter is needed since for all possible levels,  $VS_{ij} = 0$ . But if the assignments of node *i* and node *j* are not equal, then  $BV_k^i - BV_k^j$  is not always equal to 0, and it also could be  $\pm 1$  for some levels. Then  $VS_{ij}$  equals 1 which means that there is a level shifter needed and (4) is satisfied.

Now the total power can be formulated as:

$$\sum_{i=1}^{n} \left( P_i + \sum_{j=1}^{FO} PS_{ij} \right).$$

where n is the number of blocks in the given floorplan, FO is the number of output nodes of i in the timing graph.

With the formulation of the power consumption on the whole chip, we can discuss the temperature distribution. It is discussed in [2] that the temperature distribution is quite uneven between different layers. The top layer that is far from the bottom heat sink has much higher average temperature than the layer closer to the heat sink. Normally, the hotspot is on the top layer and blocks on lower layers affect the thermal distribution in hotspot region according to the thermal resistance model. The temperature can be formulated as the accumulation of projected power. Therefore, the estimation for temperature increase would be determined in two successive stages. Firstly, the blocks on the lower layers will be projected to the top layer while the power value is also projected. Then the maximal temperature can be estimated by a lateral thermal model on the top layer.

Take Fig.8 for example: hotpot block A is on the top layer, block B is on the second layer and its projection on the top layer, C, has the same shape and horizontal position with B. To make the virtual block C has the same thermal effects with B, according to [16], the power of block C can be determined according to the thermal resistance model as follows:

$$P_C = P_B \times \frac{R_b + \sum_{i=1}^3 R_i}{R_b + \sum_{i=1}^4 R_i}.$$
 (5)



Fig.8. Power projection from B to C.

Now, projected block C is on the same layer with hotspot block A. Then, in the second stage, temperature increase from the projected block to the hotspot is calculated as below:

$$\Delta T_i = \theta(\alpha, r) \times P_i \times \alpha/k, \tag{6}$$

where:

$$\theta(\alpha, \gamma) = \begin{cases} c_1 I_0(mr^+) + \frac{t}{2\alpha B_i}, & \text{if } 0 \leqslant r^+ < 1, \\ c_4 K_0(mr^+), & \text{if } r^+ > 1, \end{cases}$$

$$c_1 = \frac{-t/2\alpha B_i}{I_0(m) + K_0(m) \frac{I_1(m)}{K_1(m)}}, \quad c_4 = -c_1 \frac{I_1(m)}{K_1(m)}, \\ r^+ = \frac{r}{\alpha}, \quad B_i = \frac{ht}{g}, \quad m = \sqrt{\frac{2h}{gt}}.$$

The meanings of the symbols are: r is the distance between the hotspot and the center of block  $b_i$ , h is the coefficient of thermal transfer, t is block thickness, g is the thermal conductivity of material per unit volume,  $P_i$  is block power,  $I_0$  and  $I_1$  are modified Bessel functions of the first kind while  $K_0$  and  $K_1$  are modified Bessel functions of the second kind.

#### 4.2.2 MILP Model for Voltage-Island Clustering

Power-network routing resources are evaluated by the half perimeter of the bounding box of the clustered blocks with the same voltage assignment. Since the voltage-island construction is dynamic, it is hard to directly formulate such cost into the objective function of our MILP form. Instead of evaluating the PNRR cost directly, we provide another term which could represent the cost of clustering blocks into voltage-islands. We approximate the routing cost among every two adjacent blocks in the given floorplan by comparing the equivalency of voltages between them. Let  $r_{id}$  represent the voltage equivalence between the adjacent block *i* and block *d* represented by BAG, that means, for each voltage level k,

$$r_{id} = \begin{cases} 0, & \text{if } BV_k^i = BV_k^d, \\ 1, & \text{if } BV_k^i \neq BV_k^d. \end{cases}$$

Formulate it as constraint of  $BV_k^i$ ,

$$r_{id} = |BV_k^i - BV_k^d|.$$

Then change it into linear form,

1

$$r_{id} \geqslant BV_k^i - BV_k^d, \tag{7}$$

$$r_{id} \ge -BV_k^i + BV_k^a. \tag{8}$$

Thus, to minimize the total power-network routing resources PNRR, we could minimize the sum of  $r_{id}$  as follows:

$$\min\Big(\sum_{i=0}^n\sum_{d=0}^{Ad}(r_{id})\Big),\,$$

where n is the number of blocks in the given floorplan, Ad is the number of nodes adjacent with i.

#### 4.2.3 MILP Formulation for Timing Constraints

As mentioned in Subsection 3.2, we need to distribute the timing slack properly to ensure the performance constraints, while the delay on each path should not exceed  $T_{\text{cycle}}$  and the arrival time should not exceed the required time in each block. Considering this, we traverse the timing graph to obtain input nodes and output nodes of a certain block to capture the length between two adjacent blocks. MILP model can represent the whole design through proper constraint formulation.

For each block  $b_i$ , let  $td_i$  denote the gate delay of  $b_i$ . Then,

$$td_i = \sum_{i=1}^n \sum_{k=1}^m d_k^i \times J_k^i \times BV_k^i, \tag{9}$$

where,  $d_k^i$  is the given delay and  $J_k^i$  is the corresponding coefficient for certain voltage level k.

For node i,  $t_i^{\alpha}$ ,  $t_i^{r}$ , and  $td_i$  are the arrival time, required time and gate delay of node i respectively. To guarantee the performance, we have:

$$t_i^r + td_i \leqslant T_{\text{cycle}},\tag{10}$$

$$t_i^r - t_i^\alpha \ge 0. \tag{11}$$

Let  $w_{is}$  denote the weight between  $b_i$  and  $b_s$  in the timing graph ( $b_s$  is the input node of  $b_i$  in the timing graph).  $w_{ij}$  denotes the weight between  $b_i$  and  $b_j$ , and  $b_j$  is the output node of  $b_i$  in the timing graph. Here the weights on edges are evaluated as weighted HPWL (half-perimeter wirelength) of the bounding box between adjacent blocks.

For each node pair connected by an edge in the timing graph,

$$t_i^{\alpha} \ge t_s^{\alpha} + td_s + w_{is}, \tag{12}$$

$$t_i^r \leqslant t_j^r - td_j + w_{ij}. \tag{13}$$

# 4.3 MILP Formulation for Objectives and Constraints

After all the above relevant MILP models have been proposed, we can establish metrics for the two objectives *Temp* and PNRR as follows:

$$\min\left(\alpha \times \sum_{i=1}^{n} T_{\text{factor}}(b_i) \times \left(P_i + \sum_{j=1}^{FO} PS_{ij}\right) + \beta \times \sum_{i=0}^{n} \sum_{d=0}^{Ad} (r_{id})\right) \quad \text{s.t.}$$

1) voltage assignment constraints: (2), (3), (4);

2) power network routing resources constraints: (7), (8);

3) timing constraints: (9), (10), (11), (12), (13); where,  $T_{\text{factor}}(b_i)$  is the temperature influenced power weight for block  $b_i$  obtained by (1), (5) and (6). Then, we employ a leading linear programming solver GLPK<sup>1</sup> to obtain an initial optimized voltage and power distribution.

<sup>&</sup>lt;sup>(1)</sup>GLPK, http://www.gnu.org/software/glpk/, July 2012.

#### 678

# 4.4 Iterative Voltage-Island Generation Flow

With MILP formulation we can obtain the voltageisland and power distribution on each layer for temperature reduction. Since the temperature may influence both the delay and the power distribution, our initial optimized result is not rigorous enough without the feedback of temperature. To tackle this interaction, we employ the thermal resistance model to generate the initial temperature distribution of each layer. With the initial temperature distribution, we can update the delay on both blocks and wires. Then, MILP-based models can iteratively help us gain more temperature reduction with more accuracy. The complete flow of our method is shown in Fig.9. First, we use the input parameters to construct a BAG. We construct the performance and power-network routing resources constraints, as well as the timing constraints and objective function. According to the model we have constructed, we adopt a leading LP solver GLPK to solve the MILP problems and obtain an initial voltage-island and power distribution, based on which we use SparseLib<sup>(2)</sup> to solve the equations in thermal resistance model to get the whole chip temperature. Then, we iteratively update the timing distribution with temperature influence and solve the updated model until the temperature reduction is less than the threshold.



Fig.9. Overall iterative flows for voltage-island generation.

## 5 Experimental Results

We have implemented our thermal-aware voltageisland generation methodology using C++ language, and all experiments are run in Linux server. We test our optimization methodology on GSRC benchmarks<sup>[9]3</sup>. The original power dissipations of blocks are from [9]. In addition, we provide four available voltage levels, which are 1.5, 1.2, 1.0 and 0.8 Volts. We use four levels from 1 to 4 to represent different voltage levels from high to low. The initial floorplan are 4-layer stacked. Initially, all the blocks are set with the highest voltage level and we figure out the delay of critical path  $(D_{\text{critical}})$  considering the interconnect delay. The timing constraint  $T_{\text{cycle}}$  is set to  $1.15 \times D_{\text{critical}}$ .

#### 5.1 Voltage-Island Generation Results

First, to illustrate the different voltage-island generation for 2D and 3D chips, we implement the traditional 2D voltage-island generation introduced in [6]. We provide two levels of supply voltage: 1 and 4, where rank 1 represents the higher voltage. Also, we set four layers for tests in 3D. Fig.10 shows the voltage distribution graph of n100 on one layer. Meanwhile, Fig.11



Fig.10. Voltage distribution graph of test circuit n100 in 2D.



Fig.11. Example of the process of iteration in voltage-island generation using test circuit n100 in 3D. Different grey levels denote different supply voltages. The dark ones denote higher voltage and the light ones mean lower voltage. (a) Result from models not considering temperature influence. (b) First cycle considering temperature impact. (c) Final result obtained by our thermalaware models.

<sup>&</sup>lt;sup>(2)</sup>SparseLib, http://math.nist.gov/sparselib++/, July 2012.

<sup>&</sup>lt;sup>(3)</sup>GSRC Floorplan Benchmark Suite, http://vlsicad.eecs.umich.edu/BK/GSRCbench/HARD/, July 2012.

shows the corresponding voltage distribution graph of n100 in 3D chips with the same cycle-time constraint. Different grey levels represent different supplied voltages. Fig.11(a) is obtained without considering temperature impact. Fig.11(b) and Fig.11(c) show our iterative process and the corresponding generated voltage distribution respectively. The detailed iterative process will be given in the following part. From the results we can see that, in 2D designs, because of the long wirelength, less blocks are assigned with the lower voltages, which causes more power consumption compared with 3D.

Also, Table 1 reports the comparison between 2D and 3D voltage-island generation in terms of power (P), maximum temperature (T), delay (D) and area (A). We can verify that, in some aspects, there indeed exist some advantages in 3D design, such as high performance (20% improvement), low power consumption (4% reduction) and small area. However, we also notice that the thermal issue is critical and the maximal onchip temperature increases about 116% from 2D to 3D. The high temperature also limits the power reduction in 3D designs. As shown in Table 1, though the power consumption can be saved for n100, n200 and n300, for ami33 and ami49, due to the hot spot generated on the top layer, low voltage values are assigned to hot blocks, therefore, the optimization of power consumption will be traded off to gain thermal balance.

## 5.2 Effects of Iterative Optimization

To verify the effects of our thermal-aware iterative optimization method, we run different applications using two kinds of models, one considering temperature influence and the other does not. Table 2 shows their average achievement comparison with original 3D benchmarks. Besides the power value (P) and temperature (T), we also compare the runtime (RT) and the power network resources estimation (PNRR) defined in Subsection 2. From the table we can clearly figure out that our thermal-aware optimization can help achieve further thermal reduction. The temperature can be reduced by 9.49% by the MSV design without consideration of thermal effects. But with thermal effects considered, our approach can reduce the maximal temperature by about 23.64%. Besides, there are other benefits such as the saving of PNRR while the performance is still maintained.

From the results for n100 in Fig.11, we can clearly see the effects on individual layers using our thermalaware iterative process. By redistributing the supply voltage of blocks, more blocks on higher layer are assigned with lower voltage; while it is the opposite situation on the lower layer that more power consumption is pushed to the lower layer and the power value on the top layer with higher resistance is reduced. In this way, we can obtain temperature reduction.

## 5.3 Trade-Off Between Performance and Temperature

To capture the influence of actual cycletime to the thermal-driven models, we carry out another experiment. We set different limits for  $T_{\rm cycle}$  to the same circuit. As shown in Table 3, for circuit n100, with the increase cycletime from 150 ms to 180 ms, our thermaldriven models can achieve further temperature reduction by 7% making use of the timing slacks, with 40.5%

Bench-2D3DComparison (3D-2D)/2D (%) mark  $P(\mu w)$ Max. D (ms) $A \ (\mu m^2)$  $P(\mu w)$ Max.  $D \,(\mathrm{ms})$  $A \ (\mu m^2)$  $\mathbf{P}$ Max. DΑ  $T (^{\circ}C)$  $T (^{\circ}C)$  $T (^{\circ}C)$ ami33 $2\,578\,457$ 110.239.99  $1\,220\,110$  $2\,952\,740.0$ 247.335.82 $358\,062$ 14.51124.4-10.44-70.65ami49 48 541 600 95.7169.9539650800 53855690.0 208.399.60 12309800 10.95117.6-41.40-68.95n100  $352\,142$ 133.1193.13 205 332  $233\,956.9$ 288.4149.4150388 -33.56116.6-22.64-75.46n200  $413\,652$ 128.5239.80 291 973 367 686.0 279.0199.72 $59\,450$ -11.11 117.1-16.71-79.64n300  $499\,564$ 137.0283.60 $346\,895$  $487\,813.0$ 283.2249.98 80 6 47 -2.35106.7-11.85-76.75Average -4.31116.5-20.69-74.29

Table 1. Results of Voltage-Island Generation in 2D and 3D

Table 2. Results of Iterative Optimization Process with/Without Consideration of Thermal-Issue in 3D

| Bench-                | Without Considering Thermal Influence |                   |        |           | Considering Thermal Influence |                   |        |           |       | $\Delta$ PNRR |
|-----------------------|---------------------------------------|-------------------|--------|-----------|-------------------------------|-------------------|--------|-----------|-------|---------------|
| $\operatorname{mark}$ | $\Delta P$ (%)                        | $\Delta T \ (\%)$ | RT(s)  | PNRR (mm) | $\Delta P$ (%)                | $\Delta T \ (\%)$ | RT(s)  | PNRR (mm) | . (%) | (%)           |
| ami33                 | -6.41                                 | -8.37             | 0.35   | 396376    | -6.33                         | -12.60            | 0.60   | 371771    | 71.43 | -6.21         |
| ami49                 | -5.68                                 | -7.39             | 1.62   | 417389    | -5.36                         | -20.67            | 2.03   | 389739    | 25.31 | -6.63         |
| n100                  | -8.33                                 | -10.54            | 10.45  | 191973    | -7.27                         | -27.56            | 17.22  | 141249    | 64.78 | -26.42        |
| n200                  | -8.00                                 | -9.05             | 103.67 | 300647    | -7.20                         | -27.51            | 152.50 | 236693    | 47.10 | -21.27        |
| n300                  | -9.14                                 | -12.09            | 189.98 | 429197    | -7.98                         | -29.89            | 231.80 | 330388    | 22.01 | -23.02        |
| Average               | -7.51                                 | -9.49             | -      | -         | -6.83                         | -23.64            | -      | -         | 46.13 | -16.71        |

|           | Cycletime  |                 |       |        |  |            |                 |            |        |   |                       |                        |
|-----------|------------|-----------------|-------|--------|--|------------|-----------------|------------|--------|---|-----------------------|------------------------|
| Layer     | 150        |                 |       |        |  |            | 18              | Comparison |        |   |                       |                        |
|           | $P(\mu W)$ | $T (^{\circ}C)$ | RT(s) | D (ms) |  | $P(\mu W)$ | $T (^{\circ}C)$ | RT(s)      | D (ms) | - | $\Delta P \; (\mu W)$ | $\Delta T (^{\circ}C)$ |
| Layer 0   | 49252.5    | 144.4           | 23.7  | 149.4  |  | 49252.5    | 100.2           | 33.3       | 172.6  |   | 0.0                   | -30.62                 |
| Layer 1   | 63744.2    | 198.0           | 23.7  | 149.4  |  | 63744.2    | 175.1           | 33.3       | 172.6  |   | 0.0                   | -11.80                 |
| Layer 2   | 58800.0    | 201.2           | 23.7  | 149.4  |  | 41209.0    | 186.5           | 33.3       | 172.6  |   | -29.9                 | -7.30                  |
| Layer 3   | 62160.2    | 213.8           | 23.7  | 149.4  |  | 41440.1    | 198.7           | 33.3       | 172.6  |   | -33.3                 | -7.00                  |
| Wholechip | 233956.9   | 213.8           | 23.7  | 149.4  |  | 195645.8   | 198.7           | 33.3       | 172.6  |   | -17.2                 | -7.00                  |

Table 3. Results for the Influence of Different Cycletime in 3D with Test Circuit n100

runtime increments. The reason for this gain is that we can assign lower voltage to more blocks if the timing constraints can be relaxed.

## 6 Conclusions

In this paper, we proposed a thermal-aware voltageisland generation methodology to optimize the thermal distribution and power-network routing resources in 3D ICs while the performance would not be compromised. Mixed integer linear programming (MILP) models were devised for thermal reduction in the process of voltage-island generation. The interdependency between the temperature distribution and power density distribution makes the voltage-island generation problem a chicken-egg problem. Therefore, we proposed an iterative process based on MILP formulation to tackle this thermal-power interdependency with various voltage assignments. With the effective formulations based on MILP, our iterative approach can achieve trade-off between thermal optimization and other objectives considering various constraints for 3D ICs simultaneously. Experimental results show that our methodology, considering the impact of temperature, can achieve 23.64% reduction of maximal on-chip temperature while the method not considering temperature influence can only reduce 9.79% maximal on-chip temperature. At the same time 16.71% of PNRR reduction can be obtained by our approach. In our future work, we could propose an incremental flow to further conquer the dependency between layout optimization and voltage assignment.

#### References

- Hua H, Mineo C, Schoenfliess K, Sule A, Melamed S, Jenkal R, Davis W R. Exploring compromises among timing, power and temperature in three-dimensional integrated circuits. In *Proc. the 43th Design Automation Conference*, June 2006, pp.997-1002.
- [2] Hu J, Shin Y, Dhanwada N, Marculescu R. Architecting voltage-islands in core based system-on-a-chip designs. In *Proc. Int. Symp. Low Power Electronics and Design*, August 2004, pp.180-185.
- [3] Hung W, Link G, Xie Y, Vijaykrishnan N, Dhanwadaf N, Conner J. Temperature-aware voltage islands architecting in system-on-chip design. In Proc. IEEE International Conference on Computer Design, October 2005, pp.689-696.
- [4] Mak W K, Chen J W. Voltage island generation under per-

formance requirement for SoC designs. In *Proc. Asia and South Pacific Design Automation Conference*, January 2007, pp.798-803.

- [5] Lee W P, Liu H Y, Chang Y W. Voltage island aware floorplanning for power and timing optimization. In Proc. International Conference on Computer-Aided Design, November 2006, pp.389-394.
- [6] Lee W P, Liu H Y, Chang Y W. An ILP algorithm for postfloorplanning votage-island generation considering powernetwork planning. In Proc. International Conference on Computer-Aided Design, November 2007, pp.650-655.
- [7] Ma Y C, Qiu X, He X Q, Hong X L. Incremental power optimization for multiple supply voltage design. In Proc. International Symposium on Quality of Electronic Design, March 2009, pp.280-286.
- [8] Yu S A, Huang P Y, Lee Y M. A multiple supply voltage based power reduction method in 3-D ICs considering process variations and thermal effects. In Proc. Asia and South Pacific Design Automation Conference, January 2009, pp.55-60.
- [9] Cong J, Wei J, Zhang Y. A thermal-driven floorplanning algorithm for 3D ICs. In Proc. International Conference on Computer-Aided Design, November 2004, pp.306-313.
- [10] Hung W L, Link G M, Xie Y, Vijaykrishnan N, Irwin M J. Interconnect and thermal-aware floorplanning for 3D microprocessors. In Proc. the 7th International Symposium on Quality Electronic Design, March 2006, pp.98-104.
- [11] Yu H, Ho J, He L. Allocating power ground vias in 3D ICs for simultaneous power and thermal integrity. ACM Transactions on Design Automation of Electronic Systems (TO-DAES), 2009, 14(3): Article No. 41.
- [12] Yu H, Shi Y, He L, Karnik T. Thermal via allocation for 3D ICs considering temporally and spatially variant thermal power. *IEEE Transactions on Very Large Scale Integration* Systems (TVLSI), 2008, 16(12): 1609-1619.
- [13] Wilkerson P, Raman A, Turowski M. Fast, automated thermal simulation of three-dimensional integrated circuits. In Proc. the 9th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, June 2004, pp.706-713.
- [14] Cong J, Zhang Y. Thermal via planning for 3-D ICs. In Proc. International Conference on Computer-Aided Design, November 2005, pp.745-752.
- [15] Cong J, Zhang Y. Thermal-driven multilevel routing for 3-D ICs. In Proc. Asia and South Pacific Design Automation Conference, January 2005, pp.121-126.
- [16] Cheng Y, Tsai C, Teng C, Kang S. Electrothermal Analysis of VLSI Systems. Springer, 2000.
- [17] Ma Y C, Li X, Wang Y, Hong X L. Thermal-aware incremental floorplanning for 3D ICs based on MILP formulation. *IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*, 2009, E92-A(12): 2979-2989.

Ning Xu et al.: Thermal-Aware Post Layout Voltage-Island Generation for 3D ICs



Ning Xu received his Ph.D. degree in electronic science and technology from the University of Electronic Science and Technology of China in 2003. Later, he was a postdoctoral fellow with Tsinghua University from 2003 to 2005. Currently, he is a professor at the Computer Science Department of Wuhan University of Technology. Dr. Xu's research in-

terests include computer-aided design of VLSI circuits and systems, computer architectures, data mining, and highly combinatorial optimization algorithms.



Yu-Chun Ma received her B.S. degree in computer science from Xi'an Jiaotong University in 1999, and Ph.D. degree with honors in computer science from Tsinghua University in the summer of 2004, and continued her research as a postdoctoral scholar in Tsinghua University till the summer of 2006. Currently, she is an associate professor in Ts-

inghua University. She has published more than 80 research papers in the areas of physical design and synthesis for VLSI, 3D IC, etc. Dr. Ma is a recipient of the best paper awards in ASICON2007, ISVLSI2012 and the best paper nomination in ASP-DAC2010.





Jia Liu received her B.S. and M.S. degrees in computer science from Wuhan University of Technology in 2009 and 2012, respectively. From 2010 to 2011, she was a visiting student working in Tsinghua University. Her research interests include 3D IC design and physical optimization.

**Shou-Chun Tao** got his M.S. degree in computer science and technology from Wuhan University of Technology in 2012. In 2011, he was a visiting student working in Tsinghua University. His research interests include physical optimization algorithms and multiple supply voltage design optimization.