An Efficient Overflow Detection and Correction Scheme in RNS Addition through Magnitude Evaluation

Show more

1. Introduction

The Residue Number System (RNS) has gained prominence in recent years due to its seemingly inherent features such as parallelism and carry-propagation free arithmetic computations. Notwithstanding the fact that, RNS is currently being applied in Digital Signal Processing (DSP) intensive computations like digital filtering, convolutions, correlations, Discrete Fourier Transform (DFT) computations, Fast Fourier Transform (FFT) computations and Direct Digital Frequency synthesis [1] [2] [3] ; researchers in the area are still working hard around the clock in order that the RNS becomes a general purpose processor. These efforts have not completely come to fruition because of challenges, including conversion to and from RNS and decimal/binary number systems, the moduli sets to use, overflow detection and correction, magnitude evaluation, and scaling.

An RNS number X, is represented as ${x}_{i}={\left|X\right|}_{{m}_{i}}$ , where ${m}_{i}=\left\{{m}_{1},{m}_{2},\cdots ,{m}_{n}\right\}$ , a set of pairwise relatively prime integers such that ${m}_{1}\ne {m}_{2}\ne \cdots \ne {m}_{n}$ and $\mathrm{gcd}\left({m}_{1},{m}_{2}\right),\cdots ,\mathrm{gcd}\left({m}_{n-1},{m}_{1}\right)=1$ . The residue set ${x}_{i}=\left[{x}_{1},{x}_{2},\cdots ,{x}_{n}\right]$ is uniquely represented provided X lies within the legitimate range $\left[0,M-1\right]$ where $M={\displaystyle {\prod}_{i=1}^{n}{m}_{i}}$ is the Dynamic Range (DR) for the chosen moduli set. Let X and Y be two different integers within the DR, if $X\odot Y$ , ( $\odot $ are the arithmetic operations $+,-,\times ,\xf7$ ), results in a value that is outside the legitimate range, then overflow is said to have occurred.

Overflow in general computing occurs if a calculated value is greater than its intended storage location in memory [4] [5] ; this relates to the DR in RNS which situation usually arises during addition and multiplication operations and failure to detect it will lead to improper or wrong representation of numbers and calculated results. Thus detecting overflow is one of the fundamental issues in the design of efficient RNS systems [6] .

The conversion of an RNS number into its decimal/binary equivalent number (a process called reverse conversion) has long been mainly based on the Chinese Remainder Theorem (CRT) and the Mixed Radix Conversion (MRC) techniques with few modifications being their variants of recent times. Whiles the former deals with the modulo-M operation, the later does not but computes sequentially which tends to reduce the complexity of the architecture. Computations can be done using the MRC as follows:

$X={e}_{1}+{e}_{2}{m}_{1}+{e}_{3}{m}_{1}{m}_{2}+\cdots +{e}_{n}{m}_{1}{m}_{2}\cdots {m}_{n-1}$ (1)

where ${e}_{i},i=1,2,\cdots ,n$ are the Mixed Radix Digits (MRDs) and computed as follows:

${e}_{1}={x}_{1}$

${e}_{2}={\left|\left({x}_{2}-{e}_{1}\right){\left|{m}_{1}^{-1}\right|}_{{m}_{2}}\right|}_{{m}_{2}}$

${e}_{3}={\left|\left(\left({x}_{3}-{e}_{1}\right){\left|{m}_{1}^{-1}\right|}_{{m}_{3}}-{e}_{2}\right){\left|{m}_{2}^{-1}\right|}_{{m}_{3}}\right|}_{{m}_{3}}$

$\vdots $

${e}_{n}={\left|\left(\cdots \left(\left({x}_{3}-{e}_{1}\right){\left|{m}_{1}^{-1}\right|}_{{m}_{n}}-{e}_{2}\right){\left|{m}_{2}^{-1}\right|}_{{m}_{n}}-\cdots -{e}_{n-1}\right){\left|{m}_{n-1}^{-1}\right|}_{{m}_{n}}\right|}_{{m}_{n}}$ (2)

The MRDs ${e}_{i}$ are within the range $0\le {e}_{i}\le {m}_{i}$ , and a positive number, X, in the interval $\left[0,M\right]$ can be uniquely represented. The magnitude of a number can become crucial in the determination overflow in RNS. The sign of an RNS number is determined by partitioning M into two parts: $0\le X<\lfloor M/2\rfloor $ (for positive integers) and $\lfloor M/2\rfloor \le X<M$ (for negative integers).

Recently, some techniques have been developed to detect overflow without necessarily completing the reverse conversion process; in [7] , an algorithm to detect overflow in the moduli set $\left({2}^{n}-3,{2}^{n}-1,{2}^{n},{2}^{n}+1,{2}^{n}+3\right)$ by adding a redundant modulus 2 to this moduli set and making use of ROM and XOR gates was proposed. In [8] , a method for detecting overflow in the moduli set $\left({2}^{n}-1,{2}^{n},{2}^{n}+1\right)$ based on group of numbers is presented where numbers within $\left[0,M-1\right]$ are distributed among several groups. Then, by using the groupings, the scheme is able to diagnose in the process of addition of two numbers, whether overflow has occurred or not. The scheme in [3] evaluated the sign of the sum of two numbers X and Y and used it to detect overflow but adopted a residue-to-binary converter proposed by [9] . The scheme in [10] presented a scheme by an Operands Examination Method for overflow detection for the moduli set $\left({2}^{n}-1,{2}^{n},{2}^{n}+1\right)$ during RNS addition. All these schemes either relied on complete reverse conversion process as in the case of [3] , or other costly and time consuming procedures such as base extension, group number and sign detection as in [8] and [10] .

In this paper, a new technique for detecting and correcting overflow during the addition of two RNS numbers for the moduli set $\left\{{2}^{n}-1,{2}^{n},{2}^{n}+1\right\}$ is presented; the technique evaluates the sign of an RNS number by performing a partial reverse conversion using the mixed radix conversion method. The sign of the addends is evaluated using only the MRDs, which is then used to detect the occurrence of overflow during RNS addition. The rest of the paper is organized as follows: Section 2 presents the proposed method, an anticipated hardware implementation (albeit theoretical) is presented in Section 3 with its realization in Section 4. Numerical illustrations are shown in Section 5 whiles the performance of the proposed scheme is evaluated in Section 6. The final part of this paper is the conclusion in Section 7.

2. Proposed Method

Given the moduli set $\left\{{2}^{n}-1,{2}^{n},{2}^{n}+1\right\}$ , where ${m}_{1}={2}^{n}+1$ , ${m}_{2}={2}^{n}$ and ${m}_{3}={2}^{n}-1$ , then

$M={2}^{n}\left({2}^{n}+1\right)\left({2}^{n}-1\right)$ (3)

This implies

$M/2={2}^{n-1}\left({2}^{n}+1\right)\left({2}^{n}-1\right)=\left({2}^{n}+1\right)\left({2}^{2n-1}-{2}^{n-1}\right)$ (4)

Lemma 1: Given the moduli set $\left\{{2}^{n}-1,{2}^{n},{2}^{n}+1\right\}$ , where ${m}_{1}={2}^{n}+1$ , ${m}_{2}={2}^{n}$ and ${m}_{3}={2}^{n}-1$ for every integer $n>1$ , the following hold true [10] :

${\left|{m}_{1}^{-1}\right|}_{{m}_{2}}=1$ (5)

${\left|{m}_{2}^{-1}\right|}_{{m}_{3}}=1$ (6)

${\left|{m}_{1}^{-1}\right|}_{{m}_{3}}={2}^{n-1}$ (7)

Therefore, we can re-write (2) as;

${e}_{1}={x}_{1}$

${e}_{2}={\left|\left({x}_{2}-{e}_{1}\right)1\right|}_{{2}^{n}}={\left|{x}_{2}-{x}_{1}\right|}_{{2}^{n}}$

${e}_{3}={\left|\left(\left({x}_{3}-{e}_{1}\right){2}^{n-1}-{e}_{2}\right)1\right|}_{{2}^{n}-1}={\left|\left({x}_{3}-{e}_{1}\right){2}^{n-1}-{e}_{2}\right|}_{{2}^{n}-1}$ (8)

Theorem 1: For the given moduli set, any integer $X\ge M/2$ if and only if

${e}_{3}={2}^{n}-{2}^{n-1}$ (9)

or

${e}_{3}={2}^{n}-{2}^{n-1}-1\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{AND}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.17em}}{e}_{2}={2}^{n}-{2}^{n-1}$ (10)

for any $n>1$ .

Proof: If it can be shown that by substituting (9) and (10) into Equation (4) that, $X\ge \left({2}^{n}+1\right)\left({2}^{2n-1}-{2}^{n-1}\right)$ then, it implies $X\ge M/2$ .

Assume (9) is true, then

$\begin{array}{c}X={e}_{1}+\left({2}^{n}+1\right){e}_{2}+\left({2}^{n}-{2}^{n-1}\right){2}^{n}\left({2}^{n}+1\right)\\ ={e}_{1}+\left({2}^{n}+1\right)\left[{e}_{2}+{2}^{2n}-{2}^{2n-1}\right]\\ >\left({2}^{n}+1\right)\left({2}^{2n-1}-{2}^{n-1}\right),\text{\hspace{0.17em}}\forall n>1\end{array}$

Also, assume (10) is true, then

$\begin{array}{c}X\ge {e}_{1}+\left({2}^{n}-{2}^{n-1}\right)\left({2}^{n}+1\right)+\left({2}^{n}-{2}^{n-1}-1\right)\left({2}^{2n}+{2}^{n}\right)\\ ={e}_{1}+\left({2}^{n}+1\right)\left[{2}^{2n}-{2}^{2n-1}-{2}^{n-1}\right]\\ \ge \left({2}^{n}+1\right)\left({2}^{2n-1}-{2}^{n-1}\right),\forall n>1\end{array}$ ∎

Thus, from (9) and (10), it is possible to determine the sign of an RNS number X; whether $X\ge M/2$ (for a negative number) or $X<M/2$ (for a positive number).

The proposed method uses comparison by computing the MRDs of each of the addends to determine which half of the RNS range it belongs rather than performing a full reverse conversion. To detect overflow during addition of two addends X and Y based on the moduli set $\left\{{2}^{n}-1,{2}^{n},{2}^{n}+1\right\}$ , a single bit that indicates the sign of that addend is defined. Now, based on this bit, three cases will then be considered:

1) Overflow will definitely occur if both of the addends are equal to or greater than half of the dynamic range (M/2).

2) Overflow will not occur if both of the addends are less than M/2.

3) Overflow may or may not occur if only one of the addends is equal or greater than M/2 and will require further processing to determine whether overflow will occur or not.

Let the magnitude evaluation of the addends $\left(X,Y\right)$ be represented by $\beta $ , such that if $\beta =1$ or $\beta =0$ represents a positive number or a negative number respectively as shown in Equation (11). The evaluation of the undetermined case in (3) is also represented by a single bit λ in (12).

$\beta =\{\begin{array}{l}1;{e}_{3}={2}^{n}-{2}^{n-1}\\ 1;{e}_{3}={2}^{n}-{2}^{n-1}-1\text{\hspace{0.17em}}\text{AND}\text{\hspace{0.17em}}{e}_{2}={2}^{n}-{2}^{n-1}\\ 0;\text{otherwise}\end{array}$ (11)

and,

$\lambda =\{\begin{array}{l}1;{x}_{1}+{y}_{1}\ge {2}^{n}+1\\ 0;\text{otherwise}\end{array}$ (12)

The proposed method will then detect overflow as follows:

$overflow=\{\begin{array}{l}0;{\beta}_{X}+{\beta}_{Y}=0\\ 1;{\beta}_{X}\cdot {\beta}_{Y}=1\\ \lambda ;{\beta}_{X}\oplus {\beta}_{Y}=1\end{array}$ (13)

where $\left(+,\cdot ,\oplus \right)$ refer to the logical operations (OR, AND, XOR), respectively. For clarity, “1” means overflow occurs whilst “0” means no overflow.

Correction Unit

Let Z be the sum of the two addends. By substituting the individual MRDs for both addends (X and Y), Z can be obtained as follows;

$\begin{array}{c}Z=X+Y\\ =\left[{e}_{1}\left(X\right)+{e}_{2}\left(X\right){m}_{1}+{e}_{3}\left(X\right){m}_{1}{m}_{2}\right]+\left[{e}_{1}\left(Y\right)+{e}_{2}\left(Y\right){m}_{1}+{e}_{3}\left(Y\right){m}_{1}{m}_{2}\right]\\ =\left({e}_{1}\left(X\right)+{e}_{1}\left(Y\right)\right)+\left({e}_{2}\left(X\right)+{e}_{2}\left(Y\right)\right){m}_{1}+\left({e}_{3}\left(X\right)+{e}_{3}\left(Y\right)\right){m}_{1}{m}_{2}\end{array}$

by letting ${\psi}_{i}={e}_{i}\left(X\right)+{e}_{i}\left(Y\right)$ , we shall have

$Z={\psi}_{1}+{\psi}_{2}{m}_{1}+\psi {m}_{1}{m}_{2}$ (14)

Thus by adding the individual MRDs of the two addends, we obtain the sum Z according to (1) without having to compute separately for its MRDs. The value of Z obtained from (14) is the correct result of the addition whether overflow occurs or not. In case of overflow occurrence, the redundant modulus is employed by shifting M one bit to the left in order to accommodate the value.

3. Hardware Implementation

From Equation (8), the MRD’s ${e}_{1}$ , ${e}_{2}$ and ${e}_{3}$ can be represented in binary as;

${e}_{1}=\underset{n+1}{\underset{\ufe38}{{e}_{1,n}{e}_{1,n-1}\cdots {e}_{1,1}{e}_{1,0}}}$ (15)

${e}_{2}=\underset{n}{\underset{\ufe38}{{e}_{2,n-1}{e}_{2,n-2}\cdots {e}_{2,1}{e}_{2,0}}}$ (16)

${e}_{3}=\underset{n}{\underset{\ufe38}{{e}_{3,n-1}{e}_{3,n-2}\cdots {e}_{3,1}{e}_{3,0}}}$ (17)

Equations (15) to (17) can further be simplified as follows;

${e}_{1}={x}_{1}=\underset{n+1}{\underset{\ufe38}{{x}_{1,n}{x}_{1,n-1}\cdots {x}_{1,1}{x}_{1,0}}}$ (18)

$\begin{array}{c}{e}_{2}={\left|{x}_{2}-{x}_{1}\right|}_{{2}^{n}}={\left|{x}_{2}+{t}_{1}\right|}_{{2}^{n}}\\ ={\left|\underset{n}{\underset{\ufe38}{{x}_{2,n-1}{x}_{2,n-2}\cdots {x}_{2,1}{x}_{2,0}}}+\underset{n}{\underset{\ufe38}{{t}_{1,n-1}{t}_{1,n-2}\cdots {t}_{1,1}{t}_{1,0}}}\right|}_{{2}^{n}}\\ \underset{n}{=\underset{n}{\underset{\ufe38}{{e}_{2,n-1}{e}_{2,n-2}\cdots {e}_{2,1}{e}_{2,0}}}}\end{array}$ (19)

where,

$\begin{array}{c}{t}_{1}={\left|-{x}_{1}\right|}_{{2}^{n}}={\left|-\left(\underset{n+1}{\underset{\ufe38}{{x}_{1,n}{x}_{1,n-1}\cdots {x}_{1,1}{x}_{1,0}}}\right)\right|}_{{2}^{n}}={\left|-\left(\underset{n}{\underset{\ufe38}{{x}_{1,n-1}{x}_{1,n-2}\cdots {x}_{1,1}{x}_{1,0}}}\right)\right|}_{{2}^{n}}\\ ={\left|\underset{n}{\underset{\ufe38}{{\stackrel{\xaf}{x}}_{1,n-1}{\stackrel{\xaf}{x}}_{1,n-2}\cdots {\stackrel{\xaf}{x}}_{1,1}{\stackrel{\xaf}{x}}_{1,0}}}\right|}_{{2}^{n}}\end{array}$ (20)

and

$\begin{array}{c}{e}_{3}={\left|{2}^{n-1}{x}_{3}-{2}^{n-1}{e}_{1}-{e}_{2}\right|}_{{2}^{n}-1}={\left|{t}_{2}+{t}_{3}+{t}_{4}\right|}_{{2}^{n}-1}\\ ={\left|\underset{n}{\underset{\ufe38}{{t}_{2,n-1}\cdots {t}_{2,1}{t}_{2,0}}}+\underset{n}{\underset{\ufe38}{{t}_{3,n-1}\cdots {t}_{3,1}{t}_{3,0}}}+\underset{n}{\underset{\ufe38}{{t}_{4,n-1}\cdots {t}_{4,1}{t}_{4,0}}}\right|}_{{2}^{n}-1}\\ =\underset{n}{\underset{\ufe38}{{e}_{3,n-1}{e}_{3,n-2}\cdots {e}_{3,1}{e}_{3,0}}}\end{array}$ (21)

where

${t}_{2}={\left|{2}^{n-1}{x}_{3}\right|}_{{2}^{n}-1}={\left|{2}^{n-1}\underset{n}{\underset{\ufe38}{\left({x}_{3,n-1}{x}_{3,n-2}\cdots {x}_{3,1}{x}_{3,0}\right)}}\right|}_{{2}^{n}-1}=\underset{n}{\underset{\ufe38}{{x}_{3,0}{x}_{3,n-1}\cdots {x}_{3,2}{x}_{3,1}}}$ (22)

$\begin{array}{c}{t}_{3}={\left|-{2}^{n-1}{x}_{1}\right|}_{{2}^{n}-1}={\left|-{2}^{n-1}\underset{n+1}{\underset{\ufe38}{\left({x}_{1,n}{x}_{1,n-1}\cdots {x}_{1,1}{x}_{1,0}\right)}}\right|}_{{2}^{n}-1}\\ ={\left|-{2}^{n-1}\left({x}_{1,n}\times {2}^{n}+\underset{n}{\underset{\ufe38}{{x}_{1,n-1}\cdots {x}_{1,1}{x}_{1,0}}}\right)\right|}_{{2}^{n}-1}\end{array}$ (23)

Since, ${x}_{1}$ is a number that is smaller than ${2}^{n}+1$ , two cases are considered for ${x}_{1}$ . First, when ${x}_{1}$ is smaller than ${2}^{n}$ , and second, when ${x}_{1}$ is equal to ${2}^{n}$ [11] . If ${x}_{1,n}=0$ , we have

${t}_{31}={\left|-{2}^{n-1}\underset{n}{\underset{\ufe38}{\left({x}_{1,n-1}{x}_{1,n-2}\cdots {x}_{1,1}{x}_{1,0}\right)}}\right|}_{{2}^{n}-1}=\underset{n}{\underset{\ufe38}{{\stackrel{\xaf}{x}}_{1,0}{\stackrel{\xaf}{x}}_{1,n-1}\cdots {\stackrel{\xaf}{x}}_{1,2}{\stackrel{\xaf}{x}}_{1,1}}}$ (24)

Else if ${x}_{1,n}=1$ , the following binary vector can be obtained as

${t}_{32}={\left|-{2}^{n-1}\times {2}^{n}\left(\underset{n-1}{\underset{\ufe38}{00\cdots 0}}{x}_{1,n}\right)\right|}_{{2}^{n}-1}=0\underset{n-1}{\underset{\ufe38}{11\cdots 1}}$ (25)

Therefore, ${t}_{3}$ is calculated as

${t}_{3}=\{\begin{array}{l}{t}_{31},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}{x}_{1,n}=0\\ {t}_{32},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}{x}_{1,n}=1\end{array}$ (26)

And finally,

${t}_{4}={\left|-{e}_{2}\right|}_{{2}^{n}-1}={\left|-\underset{n}{\underset{\ufe38}{\left({e}_{2,n-1}{e}_{2,n-2}\cdots {e}_{2,1}{e}_{2,0}\right)}}\right|}_{{2}^{n}-1}=\underset{n}{\underset{\ufe38}{{\stackrel{\xaf}{e}}_{2,n-1}{\stackrel{\xaf}{e}}_{2,n-2}\cdots {\stackrel{\xaf}{e}}_{2,1}{\stackrel{\xaf}{e}}_{2,0}}}$ (27)

Let γ and ω represent the MRDs of the two integers X and Y respectively. Thus from equations (19) to (21), we have

${\psi}_{i}={\gamma}_{i}+{\omega}_{i}$ (28)

which implies

${\psi}_{1}={\gamma}_{1}+{\omega}_{1}=\underset{n+1}{\underset{\ufe38}{{\gamma}_{1,n}{\gamma}_{1,n-1}\cdots {\gamma}_{1,0}}}+\stackrel{n+1}{\stackrel{\ufe37}{{\omega}_{1,n}{\omega}_{1,n-1}\cdots {\omega}_{1,0}}}={\psi}_{1,n}{\psi}_{1,n-1}\cdots {\psi}_{1,0}$ (29)

${\psi}_{2}={\gamma}_{2}+{\omega}_{2}=\underset{n+1}{\underset{\ufe38}{{\gamma}_{2,n-1}{\gamma}_{2,n-2}\cdots {\gamma}_{2,0}}}+\stackrel{n+1}{\stackrel{\ufe37}{{\omega}_{2,n-1}\cdots {\omega}_{2,0}}}={\psi}_{2,n-1}{\psi}_{2,n-2}\cdots {\psi}_{2,0}$ (30)

finally,

${\psi}_{3}={\gamma}_{3}+{\omega}_{3}=\underset{n+1}{\underset{\ufe38}{{\gamma}_{3,n-1}{\gamma}_{3,n-2}\cdots {\gamma}_{3,0}}}+\stackrel{n+1}{\stackrel{\ufe37}{{\omega}_{3,n-1}\cdots {\omega}_{3,0}}}={\psi}_{3,n-1}{\psi}_{3,n-2}\cdots {\psi}_{3,0}$ (31)

and so, Z is implemented as;

$Z={z}_{1}+{z}_{3}+{z}_{4}=\underset{3n+1}{\underset{\ufe38}{\stackrel{2n}{\stackrel{\ufe37}{0\cdots 0}}\underset{n+1}{\underset{\ufe38}{{z}_{1,n}\cdots {z}_{1,0}}}+0\underset{3n}{\underset{\ufe38}{{z}_{3,3n-1}\cdots {z}_{3,0}}}+\stackrel{n+1}{\stackrel{\ufe37}{0\cdots 0}}\underset{2n}{\underset{\ufe38}{{z}_{4,2n-1}\cdots {z}_{4,0}}}}}$ (32)

where,

${z}_{1}={\psi}_{1}$ (33)

$\begin{array}{c}{z}_{3}={z}_{2}+{2}^{2n}{\psi}_{3}=\underset{n}{\underset{\ufe38}{{\psi}_{3,n-1}\cdots {\psi}_{3,0}}}\stackrel{2n}{\stackrel{\ufe37}{00\cdots 0}}\u22c9\u22ca\underset{2n}{\underset{\ufe38}{{z}_{2,2n-1}\cdots {z}_{2,0}}}\\ ={z}_{3,3n-1}{z}_{3,3n-2}\cdots {z}_{3,1}{z}_{3,0}\end{array}$ (34)

$\begin{array}{c}{z}_{2}={2}^{n}{\psi}_{2}+{\psi}_{2}=\underset{n}{\underset{\ufe38}{{\psi}_{2,n-1}\cdots {\psi}_{2,0}}}\stackrel{n}{\stackrel{\ufe37}{00\cdots 0}}\u22c9\u22ca\underset{n}{\underset{\ufe38}{{\psi}_{2,n-1}\cdots {\psi}_{2,0}}}\\ ={z}_{2,2n-1}{z}_{2,2n-2}\cdots {z}_{2,1}{z}_{2,0}\end{array}$ (35)

and,

${z}_{4}={2}^{n}{\psi}_{3}=\underset{n}{\underset{\ufe38}{{\psi}_{3,n-1}\cdots {\psi}_{3,1}{\psi}_{3,0}}}\stackrel{n}{\stackrel{\ufe37}{00\cdots 0}}={z}_{4,2n-1}{z}_{4,2n-2}\cdots {z}_{4,1}{z}_{4,0}$ (36)

4. Hardware Realization

The hardware realization of the proposed scheme is divided into four parts as shown in Figures 1-4. First is the Partial Conversion Part (PCP) shown in Figure 1, which evaluates the MRDs based on (18), (19) and (21) with their parameters clearly defined according to (20) and (21) - (27). The PCP begins with an Operands Preparation Unit (OPU) which prepares the operands in (20), (22) and (26) by simply manipulating the routing of the bits of the residues. Also, an n-bit 2:1 Multiplexer (MUX) is used for obtaining (26). ADD1 is an n-bit Carry Propagate Adder (CPA) and is used to compute (19), meanwhile (21) is obtained by using an $\left(n-1\right)$ -bit CPA as ADD2 whose save ( ${s}_{1}$ ) and carry ( ${c}_{1}$ ) are then added using ADD3 which is also an $\left(n-1\right)$ -bit CPA. These MRDs are used to determine the sign of the RNS number in Figure 2. Thus, the critical path for the PCP unit is made up of one $\left({2}^{n}\right)$ modulo adder and two $\left({2}^{n}-1\right)$ modulo adders.

Figure 1. Partial conversion part (PCP).

Figure 2. Magnitude evaluation part (MEP).

Figure 3. Overflow detection part (ODP).

Figure 4. Overflow correction part (OCP).

Second, is the Magnitude Evaluation Part (MEP) shown in Figure 2, which evaluates whether an RNS number is positive or negative according to Equation (11). The MEP uses one AND gate and an OR gate. These are both two input monotonic gates. Next, is the Overflow Detection Part (ODP) which compares the sign bits of the two addends by using an AND gate according to (13) which is then ORed with the evaluated bit of the undetermined case in (12) as shown in Figure 3. This is where the scheme detects the occurrence of overflow during the addition of two numbers.

Lastly, in Figure 4 is the Overflow Correction Part (OCP). The OCP evaluates the individual MRDs of the two addends separately to achieve the sum Z in (14). This is done using five adders; four regular CPAs and one carry save adder (CSA). This is computed according to (29) - (36). ADD4, ADD5 and ADD6 add separately the MRDs ${e}_{1},{e}_{2}$ and ${e}_{3}$ respectively for the two addends. The result of ADD4 is of importance because it is used in evaluating the undetermined case in (12). ${z}_{2}$ is a result of concatenation as well as ${z}_{3}$ which do not require any hardware. ADD7 is a CSA which computes the result of ${z}_{1},{z}_{2}$ and ${z}_{3}$ whose save ( ${s}_{2}$ ) and carry ( ${c}_{2}$ ) are added using ADD 8 which is a CPA in order to get accurate sum Z whether overflow occurs or not. The schematic diagrams for the proposed scheme are presented in Figures 1-4.

The area (A) and time (D) requirements of the proposed scheme are estimated based on the unit-gate model as used in [12] and [13] for fair comparison. In this model, each two-input monotonic gate such as AND, OR, NAND, NOR has area $A=1$ and delay $D=1$ , each two-input gate XOR/XNOR has $A=D=2$ , The area and delay of an inverter is a negligible fraction of a unit, and it is thus assumed to require zero units of area and delay [14] . A 2:1 multiplexer has an area $A=3$ and delay $D=2$ ; A full adder has an area of seven gates and a delay of four gates but a CSA has a constant delay. Also, the adder requirements based on this model as presented in [14] is adopted for the comparison since the adopted adders are similar to the adders used for the proposed scheme. The results state that an estimation modulo;

$\left({2}^{n}\right):A=5n+\left(\frac{3}{2}\right)n{\mathrm{log}}_{2}n,$

$D=2{\mathrm{log}}_{2}n+3$

$\left({2}^{n}-1\right):A=12n+3n\left({\mathrm{log}}_{2}n-1\right),$

$D=2{\mathrm{log}}_{2}n+3$

Therefore, the hardware requirements of the scheme are as follows:

${A}_{PCP}={A}_{ADD1}+{A}_{MUX}+{A}_{ADD2}+{A}_{ADD3}=23n+\left(\frac{15}{2}\right){\mathrm{log}}_{2}n+3$

${A}_{MEP}=2\left({A}_{AND}\right)=2$

${A}_{ODP}=2\left({A}_{AND}\right)=2$

${A}_{OCP}={A}_{ADD4}+{A}_{ADD5}+{A}_{ADD6}=63n+14$

The estimated delay of the scheme will be as follows:

${D}_{PCP}={D}_{ADD1}+{D}_{ADD2}+{D}_{ADD3}=4{\mathrm{log}}_{2}n+5$

${A}_{MEP}=2\left({A}_{AND}\right)=2$

${A}_{ODP}=2\left({A}_{AND}\right)=2$

${D}_{OCP}={D}_{ADD4}+{A}_{ADD7}+{A}_{ADD8}=9gates$

Now, in order to make an effective comparison, the proposed scheme is divided into two: Proposed Scheme I for when the OCP is not included in the comparison and Proposed Scheme II for the OCP being included in the comparison. The delay of the OCP overrides the delay of the delays of the MEP and the ODP if Proposed Scheme II is consider since they will all be computed in parallel and the critical path in that case will be dictated by the OCP. The area for the PCP and the MEP is double for two numbers X and Y but this is not the case for the delay of the two numbers since they are computed in parallel. Thus, the total area and delay of the proposed schemes are:

${A}_{TOTAL\left(I\right)}=2{A}_{PCP}+2{A}_{MEP}+{A}_{ODP}=46n+15{\mathrm{log}}_{2}n+10$

${D}_{TOTAL\left(I\right)}={D}_{PCP}+{D}_{MEP}=4{\mathrm{log}}_{2}n+7$

and,

${A}_{TOTAL\left(II\right)}=2{A}_{PCP}+2{A}_{MEP}+{A}_{ODP}+{A}_{OCP}=109n+15{\mathrm{log}}_{2}n+24$

${D}_{TOTAL\left(II\right)}={D}_{PCP}+{D}_{OCP}=4{\mathrm{log}}_{2}n+16$

5. Numerical Illustrations

This subsection presents numerical illustrations of the proposed scheme.

Checking overflow in the sum of 49 and 21 using RNS moduli set {3, 4, 5}

$X=49={\left(4,1,1\right)}_{RNS\langle 5|4|3\rangle}={\left(100,01,01\right)}_{RNS\langle 101|100|11\rangle}$

$Y=21={\left(1,1,0\right)}_{RNS\langle 5|4|3\rangle}={\left(001,01,00\right)}_{RNS\langle 101|100|11\rangle}$

$Z={\left(\left(100,01,01\right)+\left(001,01,00\right)\right)}_{RNS\langle 101|100|11\rangle}={\left(000,10,01\right)}_{RNS\langle 101|100|11\rangle}$

A reverse conversion of ${\left(000,10,01\right)}_{RNS\langle 101|100|11\rangle}$ will result in the decimal number 10. Whilst the sum of the decimal numbers 49 and 21 is 70 which is obvious of overflow occurring.

Checking for RNS overflow using the proposed technique

${e}_{2}\left(49\right)={\left|01-100\right|}_{100}=001$

${e}_{3}\left(49\right)={\left|\left(01-01\right)\left(10\right)-01\right|}_{11}={\left|-01\right|}_{11}=10={2}^{2}-2$

which implies, $\beta \left(49\right)=1$ from (11).

Also,

${e}_{2}\left(21\right)={\left|01+|-001{|}_{100}\right|}_{100}={\left|01+11\right|}_{100}=000$

${e}_{3}\left(21\right)={\left|\left(11-01\right)\left(10\right)-11\right|}_{11}={\left|100-11\right|}_{11}=01={2}^{2}-3$

But ${e}_{2}\left(21\right)=000$ , which implies $\beta \left(21\right)=0$ from (11).

Therefore, from (13) $overflow=\lambda $ and needs further processing.

Since ${e}_{3}\left(49\right)+{e}_{3}\left(21\right)=10+01=11>{2}^{2}-2$ , the scheme detects overflow occurring after processing.

Correction unit

$\begin{array}{c}Z=\left(100+001\right)+\left(01+00\right)\left(101\right)+\left(10+01\right)\left(101\right)\left(100\right)\\ =1000110={\left(70\right)}_{decimal}\end{array}$

Checking overflow in the sum of 10 and 11 using RNS moduli set {3, 4, 5}

$X=49={\left(4,1,1\right)}_{RNS\langle 5|4|3\rangle}={\left(100,01,01\right)}_{RNS\langle 101|100|11\rangle}$

$Y=11={\left(1,3,2\right)}_{RNS\langle 5|4|3\rangle}={\left(001,11,10\right)}_{RNS\langle 101|100|11\rangle}$

$Z={\left(\left(000,10,01\right)+\left(001,11,10\right)\right)}_{RNS\langle 101|100|11\rangle}={\left(001,01,00\right)}_{RNS\langle 101|100|11\rangle}$

RNS to decimal conversion of ${\left(001,11,10\right)}_{RNS\langle 101|100|11\rangle}$ will result in the decimal number 21, which is correct result of 10 + 11.

Checking for RNS overflow using the proposed algorithm

${e}_{2}\left(10\right)={\left|10-000\right|}_{100}=10={2}^{2}-2$

${e}_{3}\left(10\right)={\left|\left(01-000\right)\left(10\right)-10\right|}_{11}={\left|10-10\right|}_{11}=00<{2}^{2}-3$

which implies, $\beta \left(10\right)=0$ since ${e}_{3}\left(10\right)=00<{2}^{2}-3$ , from (11).

Also,

${e}_{2}\left(11\right)={\left|11+|-001{|}_{100}\right|}_{100}={\left|11+11\right|}_{100}=10={2}^{2}-2$

${e}_{3}\left(11\right)={\left|\left(10-001\right)\left(10\right)-10\right|}_{11}={\left|10-10\right|}_{11}=00<{2}^{2}-3$

this implies, $\beta \left(11\right)=0$ since ${e}_{3}\left(10\right)=00<{2}^{2}-3$ , from (11).

Thus, from (13) $overflow=0$ , which implies no overflow has occurred according to the proposed scheme after processing.

Correction unit

$\begin{array}{c}Z=\left(000+001\right)+\left(10+10\right)\left(101\right)+\left(00+00\right)\left(101\right)\left(100\right)\\ =010101={\left(21\right)}_{decimal}\end{array}$

6. Performance Evaluation

The performance of the proposed scheme is compared to schemes in [3] and [8] ; the scheme in [8] does not contain a correction unit; the scheme by [3] has a correction unit but is not included in the comparison. And so both schemes do not have the correction component in the comparison. Table 1 shows the analysis of the proposed scheme with that of similar state-of-the art schemes.

As shown in Table 1, the proposed scheme for detecting overflow (Proposed Scheme I) in the given moduli set is very cheap in terms of hardware resources

Table 1. Area, delay comparison.

and faster than the scheme by [8] but requires a little hardware resources than the scheme by [3] albeit slower than the Proposed Scheme I. However, the complete proposed scheme (Proposed Scheme II) for detecting and correcting overflow requires more hardware resources than the other compared schemes but faster than both schemes by [3] and [8] .

Clearly, Proposed Scheme I completely outperforms the similar state-of-the-art scheme by [8] for detecting overflow, but the trust of this work is to detect and correct overflow anytime it occurs; in so doing it has made tremendous gains in speed as shown in Table 1.

Table 2 shows a detailed analysis of the complexities and delay of the proposed scheme with that of the similar state-of-the-art schemes.

Table 2 reveals interesting results theoretically, from the analysis it is clear that the Proposed Scheme I requires less resources than what is required by [8] . From the table, smaller values of n shows that Proposed Scheme II requires more resources than that by [8] but drastically improves upon this requirements up to over 51% better than [8] for higher values of n (i.e. n > 4), this is clearly shown in the graph in Figure 5. The analysis from the table also shows that whilst for smaller values of n (say n = 1), the Proposed Scheme I is better than the scheme by [3] in terms of hardware resources, it tends to require up to about 18% resources more than that by [3] .

From Figure 5, the scheme by [8] sharply increases for higher values of n followed by the Proposed Scheme II whilst the scheme by [3] requires the lesser resources. Regarding the delay, the proposed schemes (Proposed I and Proposed II) completely outperforms both schemes by up to over 35% than the scheme by [8] and over 90% faster than the scheme by [3] as shown in Table 2 and in Figure 5. It is worth noting that whiles the scheme by [3] performs better in terms of hardware resources, it tends to be the worst performer for speed and the

Table 2. Area, delay analysis for various values of n.

Figure 5. Graphs of area and delay analysis of the various compared schemes.

percentage difference shows that Proposed I is more efficient. It is clear from the graphs that in terms of delay, the scheme in [3] sharply increases with increasing values of n whiles the marginal increase of the rest of the schemes are minimal.

7. Conclusion

Detecting overflow in RNS arithmetic computations is very important but can be difficult, more so, if it has to be corrected. In this paper, an ingenious technique of detecting overflow by use of the MRC method through magnitude evaluation as well correcting the overflow when it occurs was presented. This technique did not require full reverse conversion but used the MRDs to evaluate the sign of a number to detect the occurrence of overflow. With this technique, the correct value of the sum of two numbers is guaranteed whether overflow occurred or not. The scheme has been demonstrated theoretically to be very fast than similar-state-of-the-art scheme but required a little more hardware resources. However, the Proposed Scheme I, which is the one without the correction component completely outperformed the scheme in [8] in terms of both area and delay requirements. Also, results from Table 2 and Figure 5 showed that for higher values of n, the Proposed Scheme II also outperformed the scheme by [8] . Future works will focus on simulating the theoretical results and implementing it on FPGA boards.

References

[1] Omondi, A. and Premkumar, B. (2007) Residue Number Systems: Theory and Implementation, vol. 2. Imperial College Press, London.

https://doi.org/10.1142/p523

[2] Agbedemnab, P.A. and Bankas, E.K. (2015) A Novel RNS Overflow Detection and Correction Algorithm for the Moduli Set {2^n−1,2^n,2^n+1}. International Journal of Computer Applications in Technology, 110, 30-34.

https://doi.org/10.5120/19403-0925

[3] Younes, D. and Steffan, P. (2013) Universal Approaches for Overflow and Sign Detection in Residue Number System Based on {2n − 1, 2n, 2n + 1}. The Eighth International Conference on Systems, Seville, 27 January-1 February 2013, 77-81.

[4] Daabo, M.I. and Gbolagade, K.A. (2012) RNS Overflow Detection Scheme for the Moduli set {M − 1, M}. Journal of Computing, 4, 39-44.

[5] Daabo, M.I. (2015) Overflow Detection Schemes for Residue Number System Architecture. PhD Thesis, University for Development Studies, Tamale, Ghana.

[6] Debnath, R.C. and Pucknell, D.A. (1978) On Multiplicative Overflow Detection in Residue Number System. Electronics Letters, 14, 129-130.

https://doi.org/10.1049/el:19780088

[7] Askarzadeh, M., Hosseinzadeh, M. and Navi, K. (2009) A New Approach to Overflow Detection in Moduli Set 2n−3, 2n−1, 2n+1, 2n+3. 2009 Second International Conference on Computer and Electrical Engineering, 1, 439-442.

https://doi.org/10.1109/ICCEE.2009.197

[8] Rouhifar, M., Hosseinzadeh, M., Bahanfar, S. and Teshnehlab, M. (2011) Fast Overflow Detection in Moduli set {2^n-1,2^n,2^n+1}. International Journal of Computer Science Issues, 8, 407-414.

[9] Piestrak, S.J. (1995) A High-Speed Realization of a Residue to Binary Number System Converter. IEEE Transactions on Circuits and Systems. Part II: Analog and Digital Signal Processing, 42, 661-663.

https://doi.org/10.1109/82.471401

[10] Siewobr, H. and Gbolagade, K.A. (2014) RNS Overflow Detection by Operands Examination. International Journal of Computer Applications in Technology, 85, 1-5.

https://doi.org/10.5120/14938-2906

[11] Molahosseini, A.S., Navi, K., Dadkhah, C., Kavehei, O. and Timarchi, S. (2010) Efficient Reverse Converter Designs for the New 4-Moduli Sets {2^n−1, 2^n, 2^n+1, 2^(2n+1)−1} and {2^n−1, 2^n+1, 2^n, 2^(2n)+1} Based on New CRTs. IEEE Transactions on Circuits and Systems I: Regular Papers, 57, 823-835.

https://doi.org/10.1109/TCSI.2009.2026681

[12] Zimmermann, R. (1999) Efficient VLSI Implementation of Modulo (2^n±1) Addition and Multiplication. Proceedings 14th IEEE Symposium on Computer Arithmetic, Adelaide, 14-16 April 1999, 158-167.

[13] Chang, C.H., Low, J. and Yung, S. (2011) Simple, Fast, and Exact RNS Scaler for the Three-Moduli Set {2 ^n−1, 2^n, 2^n+1}. IEEE Transactions on Circuits and Systems I: Regular Papers, 58, 2686-2697.

https://doi.org/10.1109/TCSI.2011.2142950

[14] Sousa, L. (2015) 2^n RNS Scalers for Extended 4-Moduli Sets. IEEE Transactions on Computers, 99, 1-14.