Received 18 February 2016; accepted 15 March 2016; published 27 June 2016
Modern portable digital electronics face trade-offs between performance, power consumption, and supply voltage requirements. In cases where high performance is required, design considerations assume a sufficient supply of energy and a stable power source to maintain constant supply voltage throughout overall system operation. The supply voltage is set above the transistor’s threshold voltage, so called above-threshold operation. In low power and ULV applications, however, the energy supply is strictly limited and benefits from techniques for active energy minimization and standby power reduction. Examples of such techniques include dynamic voltage scaling, forced transistor stacking and power gating. ULV systems are designed such that the energy supply is so limited that it is very difficult or even impractical to maintain high supply voltages. Such applications include wearable electronics, intelligent remote sensors, implantable medical devices and energy-harvesting systems. Sub-threshold circuit design can reduce energy per cycle by one or more orders of magnitude by scaling power supply voltage below the device threshold voltage. Sub-threshold circuit design is suitable for such emerging energy constrained applications.
The aggressive scaling of device dimensions and supply voltage in order to achieve greater transistor density and low power consumption results in degradation in the speed of the logic circuits due to reduced effective input voltage on gate source of the transistors. On one hand, the ever increasing market segment of portable electronic devices demands the availability of the low-power building blocks that enable the implementation of long-lasting battery-operated systems. On the other hand, the general trend of increasing operating frequencies and circuit complexity, in order to cope with the throughput needed in modern high-performance processing applications, requires the design of very high-speed circuits. Current digital design techniques do not offer logic circuits that can reliably operate at deep sub-threshold voltages for high speed applications. Hence, operability is the primary and basic goal in constructing ULV systems in high speed applications. Domino logic is a high- performance circuit configuration that is usually embedded in the static logic environment and tightly coupled with the clocking scheme. Domino CMOS has become the prevailing logic family for high performance CMOS applications and it is extensively used in the high speed processors  -  since they offer several advantages over static CMOS logic, namely higher speed, reduced transistor-count (resulting in reduced die area) and hazard-free operation.
SFG gates have been proposed for ULV NP-domino logic structures  -  . ULVSFG logic implemented in a modern CMOS process requires frequent initialization to avoid significant leakage. By applying the input signals, using input semi-ﬂoating capacitors (Cin), to the gate of evaluation transistors (EN), these nodes (SFG nodes) can have a larger voltage level than power supply voltage (VDD)  -  . The main aim is to increase the current through the evaluation transistor (EN) in order to obtain higher speed in the evaluation phase. In this paper we used the SFG concept to speed up the performance of the conventional dual-rail logic. We compared the performance of the designed SFG dual-rail NOR gate with the conventional dual-rail NOR gate. Simulation result shows significant speed improvements. Furthermore, we discussed the inclusion of a new keeper transistor which improves the stability of the gate, to hold the voltage of the floating gate node and the output node, for a delayed input signal, especially when these gates are utilizing in a chain of gates in a large system.
This paper is organized as follows: in Section 2, a short introduction to the simple conventional dynamic single-rail and dual-rail domino logic is provided and also the ULVSFG inverter and ULVSFG NOR gates are discussed. In Section 3, the proposed ULVSFG dual-rail domino NOR gates are discussed, and we elaborate on the delay and stability of the new logic gate. In Section 4, the simulation results, for the different NOR gates are given, and compared; finally Section 5 concludes the paper.
2. Domino Dynamic ULV Logic
2.1. Conventional Dynamic and Dual-Rail Domino NOR
Figure 1 shows the conventional dynamic precharge to 1, single-rail and dual-rail NOR gates  . This type of domino logic is widely used in the high-speed applications like high- speed processors (e.g. 1-GHz 0.75 W ARM Cortex A8 designed by INTRINSITY) and studied in details in many papers and books (e.g.  ). Domino logic gates, operate in two different phases of “precharge” and “evaluate”. Compared to the simple static CMOS NOR, dynamic domino logic achieves higher speed at the cost of higher power consumption  -  . However A major limitation in the single-rail Domino logic is that only non-inverting logic can be implemented  . This requirement has limited the widespread use of pure Domino logic. This limitation is overcome with the true and complemented outputs of the dual-rail domino logic, at the cost of approximately double the area and power consumption  .
2.2. Semi-Floating-Gate Domino Logic
Figure 2(a) shows the dynamic, precharge to 1, single-rail Inverter gate using SFG technique  . This kind of ULV NP- domino logic is introduced and discussed in details in many papers    . The main purpose of the ULVSFG domino logic style is to increase the current level for the transistors at the low supply voltages without increasing the transistor widths. We may increase the current level and speed compared to complementary static CMOS using different initialization voltages to the gates and applying capacitive coupled inputs. In these topologies Voffset+ pins are connected to VDD and Voffset- pins are connected to GND. The High-speed N-type ULVSFG domino NOR gate (precharge to 1), is shown in Figure 2(b). The clock signals are used both
Figure 1. Conventional precharg to 1, dynamic NOR gates. (a) single-rail (b) dual-rail.
Figure 2. Simple precharge to 1, ULVSFG (a) Inverter (b) NOR gate.
as control signals for the recharge transistors RP and RN, and as reference signals for NMOS evaluation transistors EN. Both inverter and NOR gates shown in Figure 2, operate in two different phase called “precharge” and “evaluate” and follow the sequences in the normal NP-domino logic style. The virtual ground signal (Virt.GND) is synchronized with the clock signal, with transistor dimensions sufficient enough to drive the needed current through the EN transistors during the evaluation phase.
When CK is low (0), and Virt.GND is high (1), both inverter and NOR gates becomes in the precharge phase. During this phase, RP transistors turn on and recharge the gate of the EN transistors to VDD. Meanwhile RN transistors turn on and recharge the gate of EP transistors to 0. Thus EP transistors turn on and precharge the output nodes to VDD. The keeper transistors, KN and KP, are inactive during this phase as the output node is precharge to VDD, and input signals are in the low (0) level since the gates follow the NP-domino logic.
In the evaluation phase, in the both inverter and NOR gates shown in Figure 2, when clock signal CK switch from 0 to 1, and Virt. GND = 0, both recharge transistors RP and RN switch off which make the charge on the gate terminals (Vp and Vn) become semi-float. The output nodes remain at high level until an input transition occurs. The input signals (IN) must be monotonically rising to ensure the correct operation for the N-type domino logic. This can only be satisfied if the input signal is low at the beginning of the evaluation phase, and if IN only makes a single transition from 0 to 1 in the evaluation phase. When this transition happens (IN rises from 0 to 1), the voltage of the semi-floating gates (VN) increase well above VDD, based on a capacitive coupling from the input node to the SFG node, and this increases the current of the EN devices in the evaluating phase and speed up the evaluation process. In this case the keeper transistors (KN) will be turn off and the voltage of the semi-floating node (VN) will be stable. Also, in this case, the keeper transistors (KP) will be turned on and will increase the gate voltage of the Ep transistor (VP), and eventually turn the Ep transistors off. This helps to reduce the static current which directly impacts on the noise margin and the power consumption of the proposed logic. In the second scenario (when no change at the input) the output voltage will be remained high. In this case, the keeper transistors (KN) will continue to reduce and discharge the voltage of VN nodes and therefore turning the EN transistors off, also the keeper transistors (KP) remain off and voltage of VP nodes will be float. The main necessity to have the keeper transistors KN is To turn off the evaluation transistors (EN) to minimize the current dissipation during the evaluation phase when there is no raising input signals edge (input signals remain 0). This reduces the static power consumption significantly. The ULVSFG logic demonstrates significant speed improvements in comparison to conventional static CMOS logic  -  . However, as mentioned before, a major limitation in the single-rail domino logic is that only non-inverting logic can be implemented . As a solution, we propose a dual-rail version of the ULVSFG NOR gate in the next Section,
3. Proposed Dual-Rail NOR Gates
As mentioned before, the NOR gate shown in Figure 2(b), is a single-rail logic, and although it is quite high speed logic when compared to static CMOS logic, it is not enough to implement inverting logics.
3.1. Dual-Rail SFG NOR Gate
Figure 3 shows the proposed ULVSFG, precharge to 1, dual-rail, NP-domino, NOR gate. Operation of the proposed NOR gate (shown in Figure 3) is similar to the operation of the single-rail version shown in Figure 2(b). The circuit has both true (A, B) and complementary version of the input signals (,) and produce both true (Q) and complementary output signals, and follows the sequences of the normal NP-domino logic (precharge and evaluate phases). SFG technique is used to boost the current of EN transistors in the evaluate phase as it is done in the single-rail version. In the precharge phase, as discussed before, output nodes charge to VDD by turning the RN and EP devices on. In this phase the gate of EN devices charge to VDD by turning the RP devices on. As in the single-rail version, in the precharge phase, since the logic gate is an NP-domino type logic, all input signals (including complementary versions) come from the previous stage dual-rail domino gates (which are precharge to 0), and are low during the precharge phase, while making a conditional 0 to 1 transition during evaluation. In the evaluate phase, as discussed before, each input signal, either remains at 0 or goes to high logic level (VDD). When a true input signal (A, B) remains at low (0) logic, the complementary version of that signal goes high, and when the true signal goes from 0 to 1 logic level, the complementary version remains at the low (0) level. As it mentioned before during precharge phase both Q and precharge to 1 and output signal either remain in 1 or goes low (0), depend on the input signals in the evaluation phase. The functionality of the gate is
Figure 3. Dual-rail ULVSFG NOR gate.
quite similar to that of the conventional dual-rail ones. In this topology the role of the keeper transistors (KN and KP) are the same as the keeper transistors in the single-rail version, as discussed in Section II. These transistors (KN) remain off, when there is a raising input signals edge. The KN devices turn on when there is no raising input signals edge (input signals remain 0) and discharge the floating nodes (VN) and this causes the evaluation transistors (EN) become off and the current dissipation to be reduced significantly, during the evaluation phase. This reduces the static power consumption significantly. In the evaluation phase, simulation results show that a falling transition (1 to 0) in the output (Q) takes less than 50 pS, while the same transition in the conventional dual-rail NOR gate, shown in Figure 1(b), takes 1.7 nS. However in both circuits, the falling transition (1 to 0) in the side is slower than Q side. This happens since both circuits are using stacked (cascode) transistors in the path. For the proposed NOR gate falling transition in takes 140 pS and for the conventional NOR gate shown in Figure 1(b) it takes 2.4 nS to switch from high to low level. However it is considerable that the structures utilizing the SFG technique (both single and dual-rail versions) are sensitive to the delay of the input signals. If the input voltage signals rise late enough, the voltages on the SFG nodes will be discharged by the ON currents of the keeper transistors (KN) and also by the leakage currents of the devices connected to the SFG nodes. In this case the structures will lose the benefits of having the higher voltage (over than VDD) on the SFG nodes and this causes significant speed reduction in the evaluation phase and even failure in the functionality of the gate for a delayed input signal. This condition happens when these structures are utilizing in the high-depth logic circuits. Therefore the timing of the input rising signals is important for the proper operation of the structures which are utilizing the floating gate technique. If the voltages of the SFG nodes are lower than Vdd, the En transistors will not be able to pull down the voltages of output nodes to GND. The timing issue gives constraints in term of a valid time-frame for a delayed input edge. The evaluation speed of the SFG gates is affected by the delay of the input edge. This will affect the next gate and all gates in a chain, thus limiting the number of cascaded gates. Simulation results show that the evaluation delay of the proposed SFG dual-rail NOR gate is approximately 50 ps at 300 mV power supply and the input edge delay less than 1 ns. For a 1.7 ns delay at the input signal, the evaluation delay becomes more than 80 ps. Furthermore, the simulated data show that the swing will not be sufficient enough for high-speed operation.
3.2. Modified Dual-Rail SFG Domino NOR Gate
In order to make the proposed ULVSFG NOR gate more robust and less dependent on the delay of the input edge, a new keeper structure should be designed. In the new keeper structure, the keeper devices should be off before raising the input signals, and conditionally turn on, depend on the transitions on the output nodes. One way to make it is applying a signal to the drain terminals of the keeper devices (KN), instead of signal, which are similarly linked to the timing of the input signals. These signals should be precharged to 1, when the output of the ULV gate is precharged to 1, and switche to 0 if the both input signals (A and B) remain 0 for the entire evaluation phase. Figure 4 shows the modified version of the dual-rail SFG, NOR gate which is tolerant to the delay of the input signal. In this topology the drain terminals of the keeper devices (KN) are connected to the output nodes of the complementary side, instead of the signal in the NOR gate shown in Figure 3. In this modified version of the ULVSFG dual-rail NOR gate, KN1 and KN2 keeper devices will be ON, if both A and B input signals remain low and both and signals become high (1) and consequently a high to low transition happens in. In this condition, both KN3 and KN4 will be OFF, and Q remains in the high level. On the other hand, if there is at least one raising transition on the input signals (A, B), a high to low transition will happen in the output voltage of the NOR gate (Q), and both KN1 and KN2 transistors become OFF. In this case both KN3 and KN4 devices will be ON and will discharge the SFG nodes (VN3 and VN4). Improvement in the robustness of the modified NOR gate shown in Figure 4, in term of holding the voltage of the SFG nodes (VN), until a input signal edge arrives, larger logical depths is feasible to implement with SFG technique.
Figure 4. Modified dual-rail ULVSFG NOR gate.
4. Simulation Results
The simulations for the designed logic circuits are done using Cadence software (version 6.1.6) in a typical 90 nm TSMC CMOS technology. Low threshold voltage devices are chosen to speed up the circuit. To verify the effect of the SFG technique on the performance of the ULV dual-rail NOR gate, ULVSFG NOR gates, shown in Figure 1(b), Figure 3 and Figure 4 are designed in the same device size, power supply voltage (300 mV) and load capacitors (CL = 2 fF), and finally the characteristics are studied. In the all designed circuits, a 2 fF capacitor is chosen for the input (C in) capacitors. Simulation result shows that this size is the optimum capacitor size for the maximum speed, when the minimum size devices are used for the precharge devices. In the evaluation phase, simulation results for ULVSFG NOR gates shown in Figure 3 and Figure 4, show that a falling transition (1 to 0) in the output (Q) takes less than 50 pS, while the same transition in the conventional dual-rail NOR gate, shown in Figure 1(b), takes 1.7 nS. However in both circuits, the falling transition (1 to 0) in the side is slower than Q side. This happens since both circuits are using stacked (cascode) transistors in the path. For the proposed NOR gate falling transition in takes 140 pS and for the conventional NOR gate shown in Figure 1(b) it takes 2.4 nS to switch from high to low level. Figure 5 shows the transient simulations results for the both simple ULVSFG NOR gate shown in Figure 3, and for modified version shown in Figure 4 when the input signal arrive with different delays compared to the CK signal, and in this case, there is at least one rising edge in the input signals (True sides) of the NOR gates which pulls down the output voltage signal to the Virt.GND. In Figure 5(a), there is not significant delay in input signal compared to CK signal and both NOR circuits shown in Figure 3 and Figure 4 manage to pull down the output voltage to Virt.GND in the almost same evaluation time. In this case, for both circuits, the voltage of the SFG node (VN1) is larger than 500 mV which is well above VDD = 300 mV. It is considerable that the Virt.GND signal is settled down to 0 before arriving the input signal. In Figure 5(b), there is a 3nS delay in the input signal compared to CK signal and both NOR circuits shown in Figure 3 and Figure 4 manage to pull down the output voltage to Virt.GND. However the modified version is faster than ULVSFG NOR gate shown in Figure 3, since it holds the voltage of the SFG node, longer and has larger voltage at that node. In this case the voltage of the SFG node (VN1) in modified version is larger than 500 mV while it is reduced to 460 mV in the simple ULVSFG NOR gate. In Figure 5(c), there is a 4.5 nS delay in the input signal compared to CK signal and both NOR circuits shown in Figure 3 and
Figure 5. Transient simulation results when a 1 to 0 transition occurs in Q.
Figure 4 manage to pull down the output voltage to Virt.GND. However the modified version is much faster than ULVSFG NOR gate shown in Figure 3, since it hold the voltage of the SFG node, longer and has larger voltage at that node when input signal arises. In this case the voltage of the SFG node (VN1) in modified version is larger than 500 mV while it is reduced to less than 400 mV in the simple ULVSFG NOR gate. The evaluation delay of the simple ULVSFG NOR gate is significantly increased by the delay of the input signal as shown in Figures 5-7. The simulated response for the ULVSFG NOR gates, when a delayed input-signal edge of 7 ns relative to the clock signal (CK), are shown in Figure 6. As expected the voltage of the SFG node in the ULVSFG NOR gate shown in Figure 3, is reduced well below VDD by the leakage currents and the ON currents of the KN transistors. The EN transistors are not be able to pull down the output to 0 (Virt. GND) given the voltage swing at the capacitive coupled input signal. Clearly, the responses of the ULVSFG gates are significantly affected by the input delay as depicted in Figure 7. For the modified version shown in Figure 4, the case is different, the functionality of the structure will not be affected by the delayed input edge and the voltages of the SFG nodes of the gate remain stable at VDD. The given environment of Table 1 for the ULVSFG NOR gates, the longest delay for an input signal edge is approximately 4.7 nS to manage to respond correct logically. The timing of the input signals signiﬁcantly degrade the performance of the simple ULVSFG NOR gates as shown in Figures 5-7. This is because the initial charge of the SFG node in the ULVSFG NOR gate is reduced to smaller than VDD/2 and the current provided by the EN transistor is reduced significantly. The input swing gives the same capacitive transfer to the SFG node, but due to the fact that the voltage of the SFG node is Vdd/2, the maximum peak would be approximately 300 mV on this node. One other crucial aspect is the outputs voltage swing. Considering a chained structure the succeeding gate would receive an input signal with a lower swing than expected and hence would give a slower response. Hence the simple ULVSFG NOR gate has less noise margin comparing to modified version. In summary, the ULVSFG NOR gate suffers both from the increased output delay as well as the degenerated voltage swing (less noise margin). For smaller delays in the input signal, e.g. less than 3 ns, the response of the simple ULVSFG NOR is only affected by the reducing speed in the evaluation phase compared to the modified version shown in Figure 4 which is not sensitive to delay of the input signals and has better output signal swing and noise margin. By increasing the delay at the input signal,
Figure 6. Transient simulation results when a 1 to 0 transition occurs in Q, and when the input signals has significant delay compared to the CK signal.
Figure 7. The evaluation delay of the simple ULVSFG NOR and modified version of ULVSFG NOR, gates for delayed input signals, relative to the clock signal CK.
Table 1. Size of transistors used in the ULVSFG NOR gates.
noise margin degrades significantly. Moreover, that is not the case for the Modified ULVSFG NOR gate. The delay of the ULVSFG NOR and ULVSFG modified NOR gate as functions of the input delay (relative to the CK) are shown in Figure 7. For input delays above 1.5 ns the delay of the ULVSFG NOR gate increases almost exponentially, whereas the delay of the modified ULVSFG NOR gate is stable at approximately 50 ps. The details of the evaluation delay of the ULVSFG NOR gate compared to modified version, is shown in Figure 7. The improvements as the data shows are 25 times at 4.8 nS. The delay for the proposed NOR gates are less than 5% of the delay of the conventional dual-rail NOR gate in same device size and power supply voltage. Simulation results show that the proposed circuit is operating properly with power supplies down to 100 mV. At those low power supplies, the speed reduces significantly, structures become more sensitive to process variations and overall performance of the structure reduces. However, ULVSFG structures are faster and more robust than conventional static CMOS and dual-rail domino logic in those ultra low voltage power supplies, as mentioned in  -  .
In this paper, new NOR gate based on the ULVSFG dual-rail domino logic structure is presented. By applying the floating gate technique to the conventional dual-rail NOR gate, speed of the circuit increased significantly at the cost of increasing the complexity of the structure. Using the proposed method, delay of the ULVSFG Domino dual-rail NOR gate, is reduced more than 20 times in the evaluating phases and structure becomes robust significantly. The delay for the proposed NOR gates is less than 5% of the delay of the conventional dual-rail NOR gate in same device size and power supply voltage. Also a new keeper structure is introduced which makes the SFG technique more robust against the delay of the input signal. Using the new keeper structure, high-depth logics are feasible to implement with the SFG technique. Simulation results using 90 nm TSMC CMOS process parameters and Cadence software, confirm the predicted improvements.
 Heald, R. (2000) A Third-Generation SPARC V9 64-b Microprocessor. IEEE Journal of Solid-State Circuits, 35, 1526-1538.
 Silberman, J., et al. (1998) A 1.0-GHz Single-Issue 64-Bit Power PC Integer Processor. IEEE Journal of Solid-State Circuits, 33, 1600-1607.
 Sung, R.J.-H. and Elliott, D.G. (2007) Clock-Logic Domino Circuits for High-Speed and Energy-Efficient Microprocessor Pipelines. IEEE Transactions on Circuits and Systems II: Express Briefs, 54, 460-464.
 Berg, Y., Wisland, D.T. and Lande, T.S. (1999) Ultra Low-Voltage/Low-Power Digital Floating-Gate Circuits. IEEE Transactions on Circuits and Systems, 46, 930-936.
 Kotani, K., Shibata, T., Imai, M. and Ohmi, T. (1995) Clocked-Neuron-MOS Logic Circuits Employing Auto-Threshold-Adjustment. IEEE International Solid-State Circuits Conference, San Francisco, 15-17 February 1995, 320-321, 388.
 Berg, Y. and Mirmotahari, O. (2012) Ultra Low-Voltage and High Speed Dynamic and Static Cmos Precharge Logic. FTFC 2012: The 11th International Conference of Faible Tension Faible Consommation (FTFC), IARIA, Paris, 6-8 June 2012, 1-4.
 Berg, Y. (2011) Novel High Speed Differential CMOS Flip-Flop for Ultra Low-Voltage Applications. Proceedings of the 9th Edition of IEEE New Circuits and Systems Conference (NEWCAS), Bordeaux, 26-29 June 2011, 241-244.
 Berg, Y. and Mirmotahari, O. (2012) Novel High-Speed and Ultra-Low-Voltage CMOS NAND and NOR Domino Gates. Proceedings of the 5th international Conference on Advances in Circuits, Electronics and Micro-Electronics, Rome, 19-24 August 2012.