Standard cell placement can be described as the problem of defining the coordinates of each movable cell of a design, within a predefined chip area encompassing rows of specific height equal to that of the aforementioned cells, in a manner that eliminates the possibility of cell overlaps or boundary overflows. A typical placement procedure consists of three distinct steps: global placement, legalization and detailed placement.
During the global placement step, the cells are spread across the chip area so that one or more target functions are optimized, such as total interconnection wirelength, critical path length and timing. The output, in most cases, is a design where cells are placed at their optimal positions, according to the target functions, but overlaps and/or overflows occur, thus rendering the design unroutable and necessitating an additional legalization step. Figure 1 depicts the ibm01 benchmark circuit after its global placement with mPL6  . During legalization cells are aligned in rows and overlaps are eliminated. Figure 2 illustrates a legalized version of the ibm01 benchmark circuit where every cell is aligned to a specific row and overlaps have been dealt with. The main performance metrics for legalization algorithms are total wirelength, overall cell displacement and runtime. The interconnection wirelength of each net is modeled as the half perimeter wirelength (HPWL) of the bounding box of its cells, while the cell displacement is the sum of the cells’ displacement in each axis. Both metrics should be kept at a minimum, since wirelength directly affects the routability of the design and displacement can be interpreted as how far away we have drifted from the optimal (but unrealizable) solution. Finally, in detailed placement, heuristics are applied that lead to minor design alterations and produce better results concerning a specific metric, usually at the expense of another.
Figure 1. Cell placement by mPL6 (ibm01 benchmark circuit).
Figure 2. Cell placement by running Tetris over mPL6 (ibm01 benchmark circuit).
In this paper we introduce a scheme that improves upon the solution quality of two of the fastest existing legalization algorithms (Tetris  and Abacus  ), while retaining and enhancing their advantage concerning running time against the more complex methods. The rest of the paper is organized as follows. Section 2 presents the related work on global placement and legalization. Section 3 illustrates the algorithm, which is evaluated on Section 4. Finally, Section 5 summarizes the results and provides our concluding remarks.
2. Related Work
Standard cell placement has been at the forefront of academic research through- out the last decades. Initially in the context of global placement various approaches were proposed based in simulated annealing with the most prominent being Timberwolf  . Timberwolf’s global placement procedure is divided into two separate stages, due to the fact that it incorporates a global routing routine. In the first stage, simulated annealing is applied in order to place the cells in a way that minimizes the total wirelength, while in the second stage feed through cells are inserted (in pursuance of completing the global routing) and simulated annealing is re-applied to re-place the cells and minimize total wirelength.
Min-cut partitioning was also used to tackle the global placement problem. FengShui  performs min-cut multi-way partitioning (using hMetis) to spread the cells throughout the available region. Capo  , unlike FengShui, does not explicitly use multi-way partitioning, instead the partitioning direction is dictated by placement feedback and additional cutline shifting functionality is incorporated.
A significant amount of academic global placers formulate quadratic optimization problems. Gordian  utilizes the connectivity between cells in the direction of formulating a series of quadratic programming problems, by iteratively partitioning the available area and adding new constraints that restrict cell movement, whilst recalibrating each cell’s position to obtain the minimum wirelength. FastPlace3  focuses on improving the overall runtime in the expense of the final solution quality, and POLAR  achieves speedup by applying parallelization techniques throughout the execution of the problem’s formulation and solution procedures.
Furthermore, force-directed methods were applied in order to acquire better global placement results. Kraftwerk2  and ePlace  suggest the creation of a set of forces that are applied to each movable node of the design. The goal is to spread the nodes at such extend that equilibrium is reached and, thus, the optimization of the metric at hand is achieved.
Significant work concerning legalization has also been published, the most important step during the placement procedure, due to its direct connection to the manufacturability of the design. In  Tetris, one of the most a classic approaches to the legalization problem is presented. Tetris is a simple, yet elegant, solution that is used in a plethora of placers as an add-on, given the fact that it can generate a legal solution in reasonable runtime. Its simple nature has led to the implementation of various heuristics  that improve significantly all legalization related metrics (total half perimeter wirelength, displacement, runtime). In  a two-phase overlap removal procedure is discussed. Initially, cells are moved vertically till the row capacity constraints are met, and subsequently overlaps within each row are removed through a topological shortest path calculation that achieves minimum perturbation and minimum half perimeter wirelength.
Abacus  shares some common characteristics with Tetris. Cells are placed in order of their x-coordinate, but in contrast to the aforementioned legalizer, in case of overlaps within a row, a cluster of the cells involved is created and its ideal position is calculated through the formulation of a quadratic problem, thus allowing the re-placement of previously placed cells. OAL  further extends Abacus is using a linear wirelength model instead of a quadratic for measuring cell displacement and a differentiated cell insertion policy which places cells based on their width. Domocus  introduces a fast and efficient legalization scheme, by exploiting Abacus. Using coarse grain parallelism, chip area is divided into equally sized zones which can be processed independently occurring minimum synchronization overhead. This parallelization scheme is also applicable in any full-fledged legalizer of this particular category (e.g. Tetris). Additionally, in  a fast approach to legalization is presented, that differentiates from previous algorithms by applying a fast row selection technique that is based in the k-medoid clustering approach.
BONNPLACE  partitions the placement area into bins and assigns cells to them. The bin assignment is balanced by an algorithm that realizes the flow between bins, therefore eliminating overflowing bins. The aforementioned algorithm is a slight modification of the successive shortest path algorithm, which ensures that only flow augmentations that can be realized are chosen and subsequently realized before the next augmentation.
Finally, in  a history-based legalization scheme is proposed. Legal placement solutions are generated through a min-cost problem formulation. Viable solutions are automatically translated to actual cell movements, but non-viable solutions are recorded in each iteration in a history archive, as to drop future similar flow realization attempts.
3. Our Algorithm
In this section an in-depth description of our algorithm is presented. This algorithm is designed as a legalization add-on for global placers but can additionally be applied as a detailed placement heuristic. A typical standard cell placement framework is considered, meaning that all cells and rows are of equal height and the solution consists of pinpointing the cells’ optimal locations while satisfying legality constraints (elimination of overlaps and overflows).
Initially every movable node is ordered according to the x-coordinate of its lower left corner, in accordance with most academic legalizers. Subsequently, the algorithm searches for the most cost-effective row in which each cell should be placed. The insertion cost of a cell in a specific row is determined as the overall displacement of the cell.
After the cell insertion, any movable object belonging to the same nets as the aforementioned cell is moved by the same amount to the cell’s direction. This policy affects only objects that are not placed out of bounds by the displacement and have not already been moved in a previous iteration of the algorithm. The process comes to a halt when all cells have been checked. Figure 3 shows the pseudocode of the algorithm.
The main advantage of the proposed scheme is that it brings the connectivity between nodes into the decision process. Figure 3 illustrates the placement of 6 cells in a 6 row circuit. Assume that cells 0, 1, 2, 3, 4 form net N1 while cells 0, 4, 5, 6 form net N2. The optimization goal of the algorithm is to reduce the total wire length which is measured for each net as follows: the minimum bounding rectangle that encloses all its cells is calculated and its half perimeter is accumulated. Red and blue box depict the HPWL for N1 and N2 respectively. It is straightforward to see that our methodology (Figure 4(d)) accounts in a minimal HPWL compared to Abacus and Tetris (Figure 4(b), Figure 4(c)).
Figure 3. Pseudocode for the connectivity-based legalization scheme.
(a) (b) (c) (d)
Figure 4. A legalization example for three different legalizers. (a) Initial placement; (b) Abacus legalizer; (c) Tetris legalizer; (d) Proposed legalizer.
4. Experimental Results
The ISPD05  benchmark circuits were used to evaluate the performance of our approach. Global placements were obtained by four commonly used algorithms, namely, Gordian  , NTUPlace  , mPL6  and FengShui  . Thereupon, the designs were legalized following our scheme and the results were compared against a classic legalization algorithm (Tetris) and a state of the art legalization algorithm (Abacus). For each circuit we recorded the percentage of performance improvement our algorithm (CB) over Tetris (T) and Abacus (A) as follows:
improvement (T) = (performance (T) − performance (CB))/performance (T)
improvement (A) = (performance (A) − performance (CB))/performance (A)
In order to characterize the total performance over the entire dataset we used the average improvement experienced over all circuits. Experiments were executed in a Linux server with two 6-core Intel Xeon E5-2630 CPUs running at 2.3 GHz.
The performance achieved by our connectivity-based approach was significant. As observed in Tables 1-4, applying this connectivity-based legalization scheme over the four aforementioned global placers gave results that were
Table 1. Performance improvement over tetris and abacus (Gordian global placer).
Table 2. Performance improvement over tetris and abacus (NTUplace3 global placer).
Table 3. Performance improvement over tetris and abacus (mPL6 global placer).
constantly dominating the ones produced when applying Tetris as far as the HPWL metric is concerned. Similar observations hold for the execution time since the proposed methodology drastically cuts down time cost. Specifically, in the case of Gordian global placement, our approach achieved 68.84% in HPWL, 79.77% in displacement over Tetris and 4.14% reduction in HPWL over Abacus. It is worth mentioning that our approach dominates Abacus concerning runtime in every experiment performed.
Finally, Table 5 illustrates the average performance improvement of our approach over Tetris and Abacus.
Table 4. Performance improvement over tetris and abacus (fengShui global placer).
Table 5. Average performance improvement over tetris and abacus.
In this paper we have proposed a connectivity-based legalization scheme that produces high quality results as compared to both Tetris and Abacus in competitive execution time. More specifically, the proposed method produces Abacus-level results in Tetris-level execution time, thus providing an improvement over both legalization approaches. More specifically our approach is 67.46% better on average in HPWL from Tetris and 1.77% from Abacus while it is at least two orders of magnitude better on execution time. This is due to the fact that we have taken into account cell connectivity while others emphasize on minimizing displacement via dynamic programming.