Received 20 March 2016; accepted 1 May 2016; published 4 May 2016
CAM, Content Addressable Memory, is a special type of computer memory used in certain very-high-speed searching applications and for routers, cache memories. CAM access the data based on its content whereas RAM, Random Access Memory, accesses the data based on its address. RAM accesses the memory randomly but a CAM compares an incoming data with all stored words in parallel and returns the address of the matched data. CAMs are composed of conventional semiconductor memory (usually SRAM) which enables a search operation to complete in a single clock cycle, whereas in RAM it may need two or three clock cycles.
One of the attractive features of CAM is a parallel comparison. Due to the parallel comparison in CAM, high power consumption is considered. For example, in embedded processors, the full associative TLBs (Translation Look aside Buffers) with CAM absorbs about 17% of chip power   . Due to this high power consumption, the leakage current increases because of junction temperature increase which reduces the performance of the chip and reduces the circuit reliability.
CAM with low power was designed in many studies. For example, to reduce the parallel comparisons during a search binary-weighted connection has been used in a clustered-sparse network based on the CAM architecture  . CAM array has been divided into several equally-sized sub-blocks. To reduce the power consumption, only a few sub-blocks need to be activated. An NAND type word circuit can be partitioned into 2 segments which can be sequentially operated with different sizes which were proposed in a self-time overlapped search mechanism for a high throughput CAM  . In the first segment, when an input sub-word has been matched with stored sub-word, the larger subsequent was operated using its subsequent sub-word. From this, most of the subsequent will be unused to achieve the power saving. CAM array partition could expectedly reduce the power consumption, where there will be a need for extra control circuits which is dependent on searching pattern and will increase the power consumption. An NAND type word circuit can be partitioned into 2 segments which can be sequentially operated with different sizes. This was introduced in a reordered overlapped search mechanism  .
There are 2 main sources of high power consumption in CAM which are high capacitive Search Lines (SLs) and Match Lines (MLs). Many circuit level power techniques have been proposed to reduce the power consumption of CAM. In paper  , the SL power consumption is reduced by comparing the data stored in the low swing search data of SLs. The swapped CAM cell to improve the performance of SL and to reduce the power consumption of SL has been realized in the paper  . In paper  , the authors use a method of dual supply voltage to balance the power and delay the budget between the priority encoding circuit and the comparison circuit. The memory array and priority encoder were powered by the low supply voltage and high supply voltage respectively. To reduce the voltage swing on the ML buses, a self amplifier of a self-power off ML is utilised.  adopted a differential ML sense amplifier on CAM. After the matching results have been generated, to check the charge current a self-disabled sensing technique has been used. To enhance the search performance and to save the power of ML both the paper   chose NAND-type CAM.  introduced another method to reduce the power consumption, the traditional ternary CAM (TCAM) has been modified to reduce the ML capacitance and for charge sharing it has been divided into 2 unequal segments. To reduce the leakage power of TCAM,  introduced a self-gating method. A segmented ML which is selectively pre-charge is used to reduce the power consumption in the paper  .
The effort has been made to the ML voltage swing for power reduction. In many other studies, there is a need of special amplifier or extra control circuits to sense the voltage and to decide the charge balance time respectively. For example, in the paper  the authors have designed a positive-FB (feedback) ML sense amplifier to reduce the power consumption. In this paper, content addressable memory using automatic charge balancing with self-control mechanism is proposed to reduce the ML power consumption without any specific or extra sense amplifiers. Two different kinds of CAM structure, as well as CAM cell, have been implemented in this proposed architecture, which is namely N-CAM and P-CAM. By using the compatible features of N-CAM, P-CAM and self-control mechanism, the dynamic power consumption of ML can be reduced.
In the proposed method, the gpdk 180 nm technology which is simulated using 1.8 V supply voltage in cadence virtuoso can save the power consumption up to 8.12% for 32 TLB entries.
The rest of the paper is organized as follows. The traditional CAM cell, NOR-type and NAND-type CAM structure followed by charge balance content addressable memory is discussed. The simulation of the various proposed designs and its features are compared in detail. The results and conclusion of this paper are provided.
2. Content Addressable Memory (CAM)
In CAM design, generally, there are 2 kinds of ML structures namely NAND-type and NOR-type CAM. In NAND-type CAM, the search process will be slow but the power consumption is less whereas in NOR-type CAM the search process will be faster but increased power consumption   .
2.1. Traditional CAM Cell
The core of content addressable memory is the CAM cell arrays. The traditional CAM cell consists of one 6T SRAM, one 2T comparing unit and evaluation logic. The SRAM cell is used for storing the compared data, and the comparing unit performs the comparison between the stored data and the search data. The evaluation transistors, of XOR-type CAM cell inFigure 1(a)and N10 of XNOR-type CAM cell in Figure 1(b), which are gate-controlled by the comparing result, are necessary for connecting/disconnecting ML to/from the ground. SL and SL\ represent the SL pair, BL and BL\ represent the bit line pair, and WL indicates the word line. Depending on different applications, the comparing unit can be implemented as XOR or XNOR functions, respectively. The point is to observe that both XOR and XNOR are implemented for minimizing the area cost. In XOR-type CAM cell, when the stored data D is equal to the search data of SL,i.e., match, the pull-down transistor N7 would be turned off to prevent the matchline from being discharged to 0. On the contrary, in the XNOR-type CAM cell, when the comparing result is a match, the pull-down transistor N10 would be turned on to discharge the ML to 0.
2.2. NOR-Type CAM
Figure 2 shows the NOR-type CAM structure, for which the XOR CAM cell is used. During the search, there is two phase search has been performed namely initial and evaluation phase. In the case of a mismatch, the ML is discharged to 0 instantly because of the short pull-down path. However, this type of CAM provides high performance in search operation which leads to the high power consumption.
2.3. NAND-Type CAM
Figure 3 shows the NAND-type CAM structure, for which the XNOR CAM cell is used. In contrast to NOR-type CAM, an NAND-type method has been developed to reduce the power consumption. In the case of a match, the load capacitance of ML is small and only one ML is discharged to 0. However, this type of CAM provides slower search which leads to high power consumption.
3. Charge Balance Content Addressable Memory (CAM)
3.1. Nor-Type P-CAM Cell
To implement the automatic charge balancing CAM, NOR-type P-CAM structure should be designed.
P-CAM in Figure 4 which is identical to the XOR-type CAM cell which is described in Figure 1(a), but this P-CAM operates in opposite way. The major differences between the P-CAM cell and The XOR CAM cell
Figure 1.The traditional (a) XOR-type CAM cell (b) XNOR-type CAM cell.
Figure 2. NOR-type PCAM cell.
Figure 3. NAND-type PCAM cell.
are the comparing unit and the evaluation logic. The evaluation logic of the XOR CAM cell is a pull-down NMOS; however, the P-CAM cell uses a pull-up PMOS for the evaluation. When the data of the search-line (SL) is equal to the stored data (D), the transistor P5 is turned off to prevent the ML from being charged to VDD. On the other hand, when the SL data is not equal to D, the transistor P5 is turned on and ML is charged to VDD. In Figure 5 the NOR-type P-CAM structure is shown. In the initial phase of data search, by setting pre\ to 1 the ML is pre-discharged to 0. In the evaluation phase, when one or more cells are mismatched, the ML is charged to VDD. Otherwise, the ML remains 0. For convenience, we rename the traditional NOR-type CAM to N-CAM, because the evaluation logic of the tradition NOR-type CAM is an NMOS. The states of the P-CAM cell and N-CAM cell are summarized.
Figure 4. The P-CAM cell.
Figure 5. The NOR-type P-CAM structure.
When the search is a match, the ML of N-CAM is high while the match line of P-CAM is low. On the contrary, when the search is the mismatch, the match line of N-CAM is low and the match line of P-CAM is high. Similar to the N-CAM structure, P-CAM also provides good search performance. However, it still consumes large power due to the frequent match line charge and discharge. In the initial phase, the match line of N-CAM is charged to VDD and the match line of P-CAM is discharged to 0. Then, in the evaluation phase, for a word, when one or more cells are mismatched, the match line of N-CAM is discharged to 0 and the match line of P-CAM is charged to VDD. Unfortunately, in general, applications, most of the CAM words are mismatched. In order to reduce the power consumption of MLs, the different features of N-CAM and P-CAM are investigated. The states of P-CAM and N-CAM cells are summarized in Table 1. Search line (SL), D-CAM stored data, (ML) match line in Table 1.
3.2. The Charge Balancing CAM Architecture
Based on the complementary characteristics of the N-CAM match line and the P-CAM match line, CAM can be partitioned into two parts; one is composed of N-CAMs, and the other is composed of P-CAMs. N-ML represents the match line of N-CAM, and P-ML indicates the match line of P-CAM. N-ML and P-ML of a word are connected with an NMOS BT which is controlled by the signal bal to balance the voltage level of N-ML and P-ML The whole charge balancing circuit is composed of NMOS bridge transistors (BTs) and one AND gate.
Table 1. The match line state of N-CAM and the P-CAM cell.
The signal bal is generated by the signals pre. One buffer is added at the output of the AND gate so that the bal signal can be generated after all CAM words complete their evaluations and the charge balancing timing diagrams are shown in Figure 6. In the proposed self-control automatic charge balancing CAM architecture, one CAM word is made up of 50% N-CAM cells and 50% P-CAM cells.
3.3. Operation of Automatic Charge Balance CAM
The search operation of the automatic charge balancing CAM architecture can be given by two phases namely initial and evaluation phase.
3.3.1. Initial Phase
The control Signal pre is set to 0 as shown in Figure 6 and Figure 7(a), the PMOS transistors are turned on to charge all N-ML to VDD, and pre\ is set to 1 which turns on the NMOS transistors to discharge all P-ML to GND. To prevent the short-circuit status, all of the search line pairs of N-CAM should be reset to 1. Simultaneously, the signal bal is 0 due to pre and BTs is automatically turned off to prevent short-circuit between of N-ML and P-MLs.
3.3.2. Evaluation Phase
The evaluation phase of charge balancing architecture includes two operations. The first operation is evaluating, and the second operation is balancing. During the period of evaluating operation, the stored data are compared with the search data; as shown in Figure 7(b) and Figure 7(c) when one or more CAM cells are mismatched, N-ML is discharged to GND and P-ML are discharged to VDD. Note that all BTs in the charge balancing circuit are still turned off until the N-ML and P-ML s signals are automatically generated. After evaluating, the conventional sense amplifiers can be used to detect the status of ML. Since the voltage level of either N-ML or P-ML is the same with the traditional CAM after evaluating, no special sense amplifier is needed. In Figure 7(a), for balancing operation the signals DN-ML and DP-ML are high and the signal bal also transfers from 0 to VDD to turn on BTs. The voltage level of is balanced P-ML with the voltage level N-ML of through the NMOS BT. Three voltage levels may occur after the balancing operation.
• VDD/2: Both N-CAM and P-CAM is match or mismatch.
• VDD: N-CAM is match and P-CAM is a mismatch.
• GND: N-CAM is a mismatch and P-CAM is a match.
For one given CAM word, there are four possible cases; the balanced voltage level of each state is given in Table 2.
1) The N-CAM and the P-CAM are both a match. Since no charge/discharge happens, the N-ML stays at VDD, and the stays at GND. The balanced voltage level is thus balanced to 0. For the next initial operation, P-ML the charges to VDD and discharges N-ML to GND again
2) N-CAM is a match and the corresponding P-CAM is a mismatch. Because the matched N-ML stays at VDD after evaluating, the balanced voltage level is high after charge balancing. For the next initial operation, only P-ML needs to be discharged to GND.
3) N-CAM is a mismatch and the corresponding P-CAM is a match. Because the matched P-ML stays at GND and the mismatched N-ML discharged to GND after evaluating, the balanced voltage level is low after charge balancing. For the next initial operation, only N-ML needs to be charged to VDD.
Figure 6. The automatic charge balancing CAM.
Figure 7. The charge balancing timing diagram (a) control signals; (b) P-CAM Match and N-CAM match at the first evaluation phase and P-CAM mismatch; N-CAM match at the second evaluation phase; (c) P-CAM mismatch and N-CAM mismatch at the first evaluation phase; P-CAM match and N-CAM Mismatch at the second evaluation phase.
Table 2.The balanced voltage level (BVL) of each state.
4) N-CAM and the corresponding P-CAM are both mismatch. After evaluating, the N-ML and P-ML are at GND and VDD state, respectively. Then, the charge balancing circuit balances the voltage level of the P-ML and the N-ML; both of them balance at (VDD/2). For the next initial operation, the (N-ML/P-ML) needs to only charge/discharge (VDD/2).
In Figure 6, there are two signals DP-ML and DN-ML which are essential for all the MLs are being evaluated and the balancing the voltage level for the next search. Figure 8 and Figure 9 show the signal DP-ML and DP-ML respectively.
DP-ML (Dummy P-ML) is generated by a dummy P-CAM word that consists of 2 P-CAM cells namely DP0 and DP2 and n-2 PMOS transistors. The structure of DP0 and DP2 are similar to the P-CAM cell, but the SL and BL of DP2 are modified. The search line and bit line pair of DP2 are connected to SL0, SL0\, BL0 and BL0\, the data stored in D0 and D2 are always opposite to one another. Since the data stored in DP0 and DP2 are always different it results to the worst case of DP-ML charge time.
The Dummy N-ML (D-ML) in Figure 9 which is generated by a dummy N-CAM word that consists of 2 P-CAM cells namely DN1 and DP3 and n-2 NMOS transistors. The structure of DN1 and DN3 are similar to the N-CAM cell, but the SL and BL of DN3 are modified. The search line and bit line pair of DN3 are connected to SL1, SL1\, BL1 and BL1\, the data stored in D1 and D3 are always opposite to one another. Since the data stored in D1 and D3 are always different it results to the worst case of DN-ML discharge time.
4. Simulation Results of the Proposed Architecture
Figure 10 shows the cadence waveform of the content addressable memory using automatic charge balancing with self-control mechanism. There are line A, line B, line C, line D which indicates the signal pre, signal bal in Figure 7, waveform of specific N-ML and waveform of specific P-ML. Figure 10 brings out three cases which are discussed below.
Case 1―InFigure 10 the second diagram, the P-ML, and the corresponding N-ML are a mismatch. First in the initial phase, P-ML is discharged to GND and N-ML which is charged to VDD. Secondly in the evaluation phase, the N-ML which holds in high voltage whereas P-ML also charges to high. Both of the N-ML and P-ML holds to high voltage in the balancing operation but in the next initial phase, only the P-ML needed to be discharged to GND and N-ML still remains at high voltage.
Case 2―InFigure 10 third diagram, the N-ML, and the corresponding P-ML are a match. There are 2 phases as follows that is in the initial phase, the operation is similar to the case. But in the evaluation phase, the P-ML holds to the low voltage and N-ML is discharged to GND. Both of them retain in the low voltage state in the balancing operation but in the next initial phase, N-ML will be recharged to VDD whereas P-ML remains at low voltage state.
Case 3―InFigure 10 the fourth diagram, the N-ML, and P-ML are a mismatch. As like the other cases, there are two phases namely in initial phase the operation is similar to case 1 and case 2. In the evaluation phase, the N-ML is discharged to low voltage and P-ML charges to high voltage. The P-ML balances the charge with N-ML in the balancing operation. So that in the next initial phase the N-ML only charges half of VDD.
Table 3 and Table 4 provide the dynamic and static power consumption of the proposed architecture. In the case of 4, 32, 64, 128 entries, the proposed architecture reduces the power about 8.02%, 7.82%, 19.37%, 49.54% respectively when compared to the traditional N-CAM. The dynamic power can be saved by increasing the number of CAM entries because more ML power can be saved. Table 4 which gives the static power consump-
Figure 8. The DP-ML.
Figure 9. The DN-ML.
Table 3. The dynamic power consumption of the proposed architecture.
Power Unit: Watt.
Table 4. The static power consumption of the proposed architecture.
Power Unit: Watt.
Figure 10. Cadence virtuoso waveform simulation.
tion comes from the architecture change like half N-CAM cells are replaced by P-CAM cells. Whereas the DN-CAM word and DP-CAM word consumes leakage power whey they are not active.
5. Low Power Master-Slave (MSML) Design
Another method to reduce the power dissipation caused by the match line is to combine the master-slave architecture with charge minimization technique.
5.1. Typical XOR CAM Cell
This XOR type CAM cell in Figure 11 is implemented by using cadence. To Implement CAM cell using MSML to reduce matchline switching power activity.
Figure 11. Typical XOR type CAM cell.
5.2. Features of MSML Design
In the conventional design, where only one single ML is used, the MSML design uses one master-ML (MML) and several lines of slave-MLs (SMLs) to perform the search operation.
Instead of discharging the entire MML to 0, only the mismatched slave MLs will draw the charge from the master ML, and then gets discharged. The charge loss is minimized.
Because by refilling the master ML by the charge distributed to the mismatched slave MLs, which is much less than the entire ML charge refill in the conventional design, the ML power consumption can be reduced effectively.
Theoretically, the MSML can reduce ML power by 50% in the worst case i.e. 50% power saving is possible, which is independent of the match case.
6. MSML Design
In Figure 12, MS2 consists of one MML and two SMLs. Unlike the conventional CAM design which uses a single ML, our design uses both MML and SML to perform the search operation. By sharing the charge between the MML and the SML, we can reduce the MML refill swing effectively, such that the search power dissipated in the MMLs can be largely reduced. From Figure 4, besides the MML and SML, an additional final-ML (FML) is used to indicate the match result. Note that the parasitic capacitance of the FML is generally smaller than that of the MML.
During the search operation, in this design there are two phases namely pre-charge and match evaluation phase respectively.
Pre-charge phase: In this phase, first the MML, FML are pre-charged to high and the control signal PRE is also high. All the SMLs are discharged to 0 that is SML1, SML2 shown in Figure 12. All the BLs is reset to 0 because the search data are not available, such that XOR results are also 0.
Match Evaluation Phase:this phase, to start with the matching process the search data have to be loaded on the search line where the control signal PRE is pulled down to 0. This is called match evaluation phase. Table 5 shows the key node voltage and path connection/disconnection for these match and mismatch cases.
Case 1―If both the SML1 and SML2 are a match then this is a match case. S1 and S2 both the charging path does not conduct. In the pre-charge phase all the ML logics are same that is MML is 1 so is the FML and both the SML1 and SML2 are 0.
Case 2―If either one of them SML1 and SML2 is mismatch then assuming that SML1 is mismatch and SML2 is a match. In the SML1 segment, because at least one share transistor is turned ON to conduct the charge sharing path S1, the MML charge will be distributed to the SML1. This will lead to a rise of the SML1 voltage, while the MML voltage level goes down. After the complete charge sharing, both the MML and SML1 will finally saturate to the same voltage that is final balance voltage. According to the charge sharing Equation (1), the final balance voltage VB can be derived as follows:
Figure 12. MSML design configured with two SMLs.
Table 5.Key node voltage and path connection/disconnection for this match and mismatch cases.
where CMML and CSML1 are the capacitances of MML and SML1, respectively, and VMML is the MML initial voltage. Because the MML capacitance is roughly two times the SML1 capacitance, the result can be simplified as 2VMML/3.
Case 3―If both the SML1 and SML2 is a mismatch then this is a match case. Because the charge sharing path S1 and S2 are conducted, the MML charge will be distributed to the SML1 and SML2. For MS2
Configuration, this is the worst case. The final balance voltage VB can be derived as follows as equation 2:
Case 3 is larger than that of case 2, so it is expected that the power consumption of case 3 must be larger than case 2.
7. MSML Power Consumption
In Figure 13, the match line power consumption for every case. Such that the ML power consumption will increase corresponding to the mismatched SML number. Because of increase of circuit overhead, the SML number increases as the real balance voltage decrease.
The worst case match delay of MSML CAM using different configurations is shown in Table 6. Comparing the four configurations of MSML design Ms8 shows the slight difference from the other configurations.
The power consumption tabulation is shown in Table 7 for various CAM design. Comparing the four configurations they show slight variations with one another.
Comparison of power consumption of CAM using automatic charge balancing and MSML design is shown in Table 8. For less entries of CAM like 4, 32 due to higher number of components, the ML power consumption are negligibly increased. By increasing the number of CAM entries further the ML power can be reduced as shown in Table 8.
Figure 13. ML Power Consumption for 128-bits CAM word with various MSML configurations.
Figure 14. MDs for 128-bits CAM word with various MSML configurations.
Table 6. Worst-case MD for various CAM design with different word size.
Table 7. Power consumption for various CAM design with different word size.
Power Unit: Watt.
Table 8. Comparison of CAM automatic charge balance and MSML design.
The CAM using automatic charge balancing with self-control mechanism and master-slave match line design has been proposed in this paper. The charge balancing technique is used to reduce the power consumption of CAM without any extra control signals, this design balances the voltage level of N-ML and P-ML. Based on gpdk 180 nm CMOS technology, the simulation of this paper shows that power consumption can be reduced up to 38.87% with respect to traditional N-CAM. The MSML design uses the charge refill technique to reduce the power consumption by minimizing the MML. The power consumption results of MSML design with respect to CAM using charge balancing self-control mechanism and traditional N-CAM is reduced up to 19% and 58% respectively. As a result, it is concluded that performance of power consumption of charge balance technique is better than that of the traditional N-CAM. However, MSML design is slightly better than the CAM using automatic charge balancing with self-control mechanism.