Statistical analysis shows that, in a prototypical RISC machine, 72 percent of the instructions perform addition (or subtraction) in the datapath  . It is even reported to reach 80 percent in ARM processors . Therefore, binary addition is one of the most common arithmetic operations that a computer processor performs. According to the hardware design principal, make the common case fast , faster hardware for addition operation can achieve faster processor. Designing a faster binary adder is therefore an interesting research issue.
Researchers are thoroughly investigating the addition operation since the beginning of modern computing  and they are still working on it. Recently the architecture and performance of a new adder PASTA have been discussed by Rahman et al. , which uses recursive process for generating the final result. Though PASTA is an asynchronous adder and uses ripple fashioned carry propagation technique, its performance is compared with various reputed synchronous and asynchronous adders and shown competitive in all respect . The mentionable achievements of PASTA are: simple design (area and interconnection-wise equivalent to ripple carry adder), logarithmic average time performance and highly practical and efficient completion detection unit.
PASTA is a self-timed adder and it has a completion detection mechanism. An adder which can announce the completion of its operation can take the advantage of the shorter average-case propagation delay and, in turn, exhibit average case performance . Some recent studies      further investigated the architecture of PASTA but none of those studies changed its carry propagation technique. Though PASTA has appreciable average case performance, improving its worst case performance can make it more acceptable. This paper presents four different enhanced architectures of PASTA with modified carry propagation technique. This modification makes its worst case delay nearly half without adversely degrading its other performances. The proposed implementations are not only appropriate for asynchronous systems but also suitable for Globally Asynchronous Locally Synchronous (GALS) systems  because of their improved worst case performance and efficient completion detection mechanism. For the ease of explanation, the term Enhanced Parallel Self-Timed Adder (EPASTA) is used to indicate the proposed adder circuits.
Total nine different 16-bit adders are implemented. Five of them are existing adders and the remaining four are the four different varieties of the proposed EPASTA. The carry propagation delays of the proposed adders are compared with the delays of the existing adders.
2.1. Existing Adders
The basic architecture of an n-bit PASTA is adopted from  and it is illustrated in Figure 1. One can intuitively understand, by examining the architecture of
Figure 1. General block diagram of PASTA .
PASTA, that the carry propagation mechanism of this adder is similar to that of the basic ripple carry adder. Its operation is divided into two phases namely Initial Phase and Recursive Phase. The common selector (SEL) sends signal for all the 2 × 1 multiplexers. It determines the appropriate phase. In initial phase SEL = 0 and in recursive phase SEL = 1. In initial phase the multiplexer in the ith bit position allows ai and bi to go to the corresponding half adder and the half adder produces the initial values of the sum (Si) and the output carry ( ) bits. In the recursive phase, the feedback path allows the initial sum to be added to the input carry Ci recursively. The values of the sum and the output carry bits are recalculated for every recursion cycle. The recursion is terminated when the stopping criterion is met. The completion detection unit produces an asserted Terminate signal for indicating the completion of operation.
The lookahead-carry generation techniques of four well known tree-like parallel synchronous adders are used in the EPASTA. Chosen parallel synchronous adders are: Block Carry Lookahead Adder (BCLA), Kogge Stone Adder (KSA), Brent-Kung Adder (BKA) and Sklansky’s Conditional Sum Adder (SCSA). These adders are well known and have preferable worst-case performances. The basic structures of these adders were explained in    . Four types of sub-circuits are repeatedly used in these adders. For the ease of explanation, these sub-circuits are termed in this paper as modules and are represented by four different symbols. These sub-circuits and their corresponding symbols are shown in Figure 2. Here, i, j and k indicate the bit positions, where i ≥ j ≥ k. The X-module is used in all four types of adders. This module of the ith bit position computes the following outputs:
Here, ai and bi are the ith bit of the n-bit operands A and B respectively. The symbol, Ci represents input carry of ith position or the carry output of the (i-1)th position. For all these expressions . The BCLA uses the Y-module that computes the carry bits. It also computes the block-carry-propagate and the block-carry-generate signals as follows:
Figure 2. Four repeatedly used sub-circuits and their corresponding symbols.
where, and .
The other three adders (KSA, BKA and SCSA) use gray-module and white-module in addition with X-module. These adders also operate on the principle of Block Carry Propagate and Block Carry Generate  . The gray-module and the white-module are the subdivisions of the Y-module of BCLA. The gray-module computes the Block Carry Propagate and Block Carry Generate whereas the white-module computes the carry Cj. The internal circuits of the gray-module and the white-module are shown in Figure 2(c) and Figure 2(d) respectively.
2.2. Enhanced Parallel Self-Timed Adder (EPASTA)
The basic architecture of PASTA has a uniform design. An n-bit PASTA is constructed from n+1 similar blocks of circuitry. In the rest of this paper the circuitry of each block of PASTA is termed as a PASTA-block. Structure of a PASTA-block is shown in Figure 3(a). This PASTA-block is used in the proposed architecture. In order to increase the visibility of the EPASTA architecture, a symbol is defined to represent this PASTA-block as shown in Figure 3(b). The new symbol is termed as EP-module. For the sake of simplicity of the figure, the SEL terminal is not shown in the EP-module.
Carry propagation in PASTA is similar to that of ripple carry adder. Therefore the worst case computation time of PASTA is not so satisfactory. Adders having lower worst case computation time, computes carry earlier by using lookahead
Figure 3. PASTA-block and a symbol to represent it in the EPASTA.
carry generation techniques. Any carry can be computed by using only two levels of basic logic gates , but the size of the circuit increases exponentially with the size of the operands. For large operands (n > 4) it is impractical and inefficient to use two level of logic for carry computation because of the limitation of fan-in and fan-out, irregular structure, use of many long wires, etc.  . Therefore, the practical carry-lookahead schemas usually use tree-like circuits that have simple regular structures     . Four different architectures of EPASTA, having four different lookahead-carry generation schemes, are presented below.
The and the terminals of the EP-module, in the initial phase, can be expressed as and . These outputs are same as the , outputs of the X-module. In the first cycle of iterative phase, the Sum terminal of the EP-module will produce . This output is exactly same the Sum output of the X-module. Therefore, the EP-module is equivalent to the X-module. The EPASTA circuits are implemented by replacing the X-modules of the four selected tree-like parallel adder circuits. The first of the four architectures of 16-bit EPASTA, which is constructed by replacing the X-modules of BCLA with the EP-modules, is shown in Figure 4. In order to store the correct value of Cout an additional EP-module is attached after the most significant bit position. Moreover for sensing the completion of operation a completion detection unit similar to that of PASTA is attached. This version of EPASTA uses the lookahead-carry generator of BCLA for computing the carry bits.
It has been shown that the X-module and the EP-module are interchangeable. So, the second version of EPASTA, which uses the lookahead-carry generator of KSA, can be constructed easily by replacing the X-modules of KSA with the EP-modules. Similar to the previous version of EPASTA an additional EP-module should be added for storing the correct value of Cout. A completion detection unit should be added for sensing the completion of operation. Figure 5 illustrates the architecture of a16-bit EPASTA with lookahead-carry generator of KSA.
Similar procedure is followed to construct the other two versions of EPASTA. The X-modules of the BKA and SCSA are replaced by the EP-modules. Both the
Figure 4. Architecture of a 16-bit EPASTA with lookahead-carry generator of BCLA.
Figure 5. Architecture of a 16-bit EPASTA with lookahead-carry generator of KSA.
implementations require an additional EP-module for capturing the final value of Cout. The completion detection unit is also included in both the implementations for realizing the completion of operations. The EPASTA implementations with the lookahead-carry generators of BKA and SCSA are illustrated in Figure 6 and Figure 7 respectively.
Figure 6. Architecture of a 16-bit EPASTA with lookahead-carry generator of BKA.
Figure 7. Architecture of a 16-bit EPASTA with lookahead-carry generator of SCSA.
In this section the simulation results are presented for all the adders that are discussed in this paper. Though the illustrated adders are 16-bit adders, actually 32-bit versions of those adders are implemented for simulation. All the simulation are done by using an industry standard software tool and executed on 64 bit Linux platform. Simulation is performed for three different TSMC processes.
Three different types of adder operation are analyzed and those are: worst-case, best-case and average case. The best-case addition does not involve any carry propagation and hence incurs only a single bit adder delay for producing the result. The worst-case involves maximum carry propagation cascaded delays due to the propagation length of 32-bits. The average case shows how the separate carry propagation is limited within their individual propagation chain and can progress simultaneously with the other carry propagation chains. Some test-cases, representative of these cases, are chosen. The dataset used for this experiment is summarized in Table 1. The expected carry chain length for n-bit binary numbers is established in . A carry chain length indicates the maximum number of consecutive PASTA-blocks that propagate a carry bit (1). While adding two n-bit binary numbers, a carry chain length m indicates that a carry bit will propagate through maximum of m consecutive PASTA-blocks for at least once in the whole n-bit addition operation. Since carry chain length is a factor, five different random numbers are chosen. Among these numbers, three have maximum carry chain length of 5 and two have maximum carry chain length of 6. Thus they represent an average carry chain length of 5.4. The delay is measured at 70% transition point for the related signals.
The delay performances of different adders are shown in Table 2. It is divided into three parts to differentiate between conventional synchronous parallel adders, basic architecture of PASTA and four versions of EPASTA. The top part shows the results for conventional adders (i.e. BKA, BCLA, KSA and SCSA). Worst-case delay is important for these adders because they do not have any completion sensing mechanism. The adders in the middle and the bottom part
Table 1. Dataset for comparing different adders.
Table 2. Spice timing report for different 32-bit adders.
are asynchronous adders which have completion detection mechanism and whose complete operation is divided into two phase (initial phase and recursive phase). So for computing the delay of these adders the following relation is used:
Here tinitial represents the time required for the state transition of the initial phase and trecursive represents the delay between SEL and the Terminate signals. The value of ttotal is listed in Table 2. Since the major concern of this study is to analyze the performance of the proposed adders with respect to PASTA, the minimum delays of the asynchronous adders are shown in bold face.
Table 2 shows that all of the four proposed architectures of EPASTA give better worst-case delay compared to that of PASTA and the amount of improvement is minimum 3.45 nS and maximum 4.66 nS (i.e. minimum 45.25% and maximum 61.09%). Though, with respect to PASTA, the best-case and average case delays of EPASTA increase most of the time, the amount of this increment is not so high (maximum 1.18 nS).
Among the four proposed architectures, EPASTA with lookahead-carry generator of KSA gives best result for the worst case. For the other two cases (best case and average case), EPASTA with lookahead-carry generator of CLA performs better than the other architectures of EPASTA.
The major objective of this paper was to analyze the practicality of using the lookahead-carry generator with the newly introduced self-timed adder PASTA. Moreover, this study also investigates a probable solution to improve the worst case performance of PASTA. The results show that the proposed architecture gives better worst case performance than the basic architecture of PASTA without major compromise of the best and average case performances. Moreover, these results also support the practicality of the proposed architecture of self-timed adders.
 Franklin, M.A. and Pan, T. (1994) Performance Comparison of Asynchronous Adders. Proceedings of IEEE Symposium on Advanced Research in Asynchronous Circuits and Systems, Salt Lake City, 3-5 November 1994, 117-125. https://doi.org/10.1109/ASYNC.1994.656299
 Rahman, M.Z., Kleeman, L. and Habib, M.A. (2014) Recursive Approach to the Design of a Parallel Self-Timed Adder. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23, 213-217. https://doi.org/10.1109/TVLSI.2014.2303809
 Sivakumar, M. and Omkumar, S. (2018) Design and FPGA Implementation of FBMC Transmitter by Using Clock Gating Technique based QAM, In verse FFT and Filter Bank for Low Power and High Speed Applications. Journal of Electrical Engineering & Technology, 13, 2479-2484.
 Sivakumar, M. and Omkumar, S. (2016) Integration of Optimized GDI Logic Based NOR Gate and Half Adder into PASTA for Low Power & Low Area Applications. International Journal of Applied Engineering Research, 11, 2629-2633.
 Kogge, P.M. and Stone, H.S. (1973) A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Transactions on Computers, 100, 786-793.
 Ngai, T.F., Irwin, M.J. and Rawat, S. (1986) Regular Area-Time Efficient Carry-Lookahead Adders. Journal of Parallel and Distributed Computing, 3, 92-105.