Deep Neural Network Based Behavioral Model of Nonlinear Circuits

Show more

1. Introduction

Generating accurate circuit models is a common and efficient step towards system level design of electronic devices. To meet rigorous requirements for modern electronic systems, developing efficient modeling method has become an important research topic. According to the type of data needed for extraction, circuit models are divided into two categories: physical models and behavioral models [1]. Physical models are created by analyzing the circuitry of the devices, while behavioral models characterize the devices in terms of input and output signals, without resorting to their internal constitution.

Usually, modeling linear circuits is simple because their parameters are constant and the output is proportional to the input. However, most circuits are nonlinear. The nonlinearities contained in transistors or diodes make the modeling process relatively hard. Over the past decades, artificial neural networks have been proved to be a powerful tool for nonlinear regression. Neural networks trained from measured data can represent the nonlinear behavior of electronic devices. Several neural network based modeling applications have been reported recently. Cao Y. and Zhang Q.J. presented a new approach for developing recurrent neural network models of nonlinear circuits [2]. Tarver C. *et al.* developed a neural-network method to combat nonlinearities in power amplifiers [3]. Chen Z. *et al.* proposed a method for data-driven behavioral modeling of electronic circuits using recurrent neural networks [4].

Theoretically, traditional shallow neural networks with a single hidden layer can approximate any nonlinear function with arbitrary accuracy if the number of neurons is increased without constraint. However, increasing the hidden node number causes exponential increasing on the number of model parameters, and also requires much more training examples [5]. In recent years, deep neural networks (DNNs) have attracted a lot of researchers. Unlike traditional neural networks, deep neural networks possess multiple hidden layers and need fewer parameters. It is demonstrated that deeper networks can fit more complex functions by composing the functions learned in earlier layers [6].

In this paper, a nonlinear circuit behavioral modeling technique is developed on the basis of deep neural networks. The remaining parts of the paper are organized as follows. Section 2 describes the structure of a deep feedforward neural network. In Section 3, a power amplifier circuit is created as the object to be modeled. Training and validation are presented in Section 4.

2. Deep Feedforward Neural Networks

There is no clear threshold of depth that divides shallow neural networks from deep neural networks. But it is mostly agreed that the deep structure of a neural network requires two or more hidden layers. The number of neurons may be equal or different in each of the hidden layers. These neurons find the mathematical manipulation to obtain output from the input, whether it is a linear or nonlinear relationship.

In this work, a feedforward neural network with multiple hidden layers is adopted to model nonlinear circuits. In a multi-layer neural network, the neurons are arranged in layered fashion, in which the input and output layers are separated by multiple hidden layers. This layer-wise architecture is referred to as feedforward because the information only travels forward in the network, through the input nodes then through the hidden layers and finally through the output nodes. Although a single hidden layer with enough neurons is sufficient to make the network a universal approximator, there are substantial benefits to using many such hidden layers.

Figure 1 shows a feedforward network with two hidden layers. It contains *m* input nodes and one output node. The input data vector is represented by

$X={\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{m}\end{array}\right]}^{\text{T}}$, (1)

where the superscript T denotes the transpose of the matrix. The first hidden layer contains *n* neurons. The connection weight matrix from the input layer to the first hidden layer is

${W}_{1}=\left[\begin{array}{cccc}{w}_{111}& {w}_{112}& \cdots & {w}_{11m}\\ {w}_{121}& {w}_{122}& \cdots & {w}_{12m}\\ \vdots & \vdots & \ddots & \vdots \\ {w}_{1n1}& {w}_{1n2}& \cdots & {w}_{1nm}\end{array}\right]$. (2)

The bias of the first hidden layer is

${B}_{1}={\left[\begin{array}{cccc}{b}_{11}& {b}_{12}& \cdots & {b}_{1n}\end{array}\right]}^{\text{T}}$. (3)

*F*(∙) is the activation function in hidden layers. The sigmoid function and the hyperbolic tangent function are two classical activation functions used for incorporating nonlinearity in neural networks. The sigmoid function outputs a value in (0, 1), which is helpful in performing computations that should be interpreted as probabilities. The hyperbolic tangent function has a shape similar to that of the sigmoid function, except that it is horizontally rescaled and vertically translated to (−1, 1). Since the voltage and the current of electric circuits may be both positive and negative, the hyperbolic tangent function is preferable to the sigmoid function in this application. Furthermore, its mean-centering and larger gradient with respect to sigmoid makes it easier to train. The hyperbolic tangent function is given by

$F\left(z\right)=\mathrm{tanh}\left(z\right)=\frac{{e}^{z}-{e}^{-z}}{{e}^{z}+{e}^{-z}}$. (4)

So the output of the first hidden layer is calculated by

${H}_{1}=\mathrm{tanh}\left({W}_{1}\times X+{B}_{1}\right)$. (5)

Figure 1. A feedforward neural network with two hidden layers.

The second hidden layer contains *p* neurons. The connection weight between two hidden layers is

${W}_{2}=\left[\begin{array}{cccc}{w}_{211}& {w}_{212}& \cdots & {w}_{21n}\\ {w}_{221}& {w}_{222}& \cdots & {w}_{22n}\\ \vdots & \vdots & \ddots & \vdots \\ {w}_{2p1}& {w}_{2p2}& \cdots & {w}_{2pn}\end{array}\right]$. (6)

The bias of the second hidden layer is denoted by

${B}_{2}={\left[\begin{array}{cccc}{b}_{21}& {b}_{22}& \cdots & {b}_{2p}\end{array}\right]}^{\text{T}}$. (7)

Hence, the output of the second hidden layer is

${H}_{2}=\mathrm{tanh}\left({W}_{2}\times {H}_{1}+{B}_{2}\right)$. (8)

The output of the network is

$y={W}_{3}\times {H}_{2}$ (9)

where *W*_{3} is the connection weight from the second hidden layer to the output and is given by

${W}_{3}=\left[\begin{array}{cccc}{w}_{31}& {w}_{32}& \cdots & {w}_{3p}\end{array}\right]$. (10)

In neural network training, the loss function is determined as the difference between the actual output and the predicted output from the model for one training example, while the average of the loss function for all the training examples is termed as the cost function. Briefly, the loss function is for a single training example, while the cost function is the average loss over the complete training data set. The loss function, which is sensitive to the application at hand, is critical to training and modeling process. A lot of loss functions, such as cross entropy and hinge loss, have been adopted in different types of applications. For regression problems, the squared error loss is most frequently used. Squared error loss for each training example is the square of the difference between the actual and the predicted values. Here, mean squared error (MSE) is determined as the cost function. It is computed by

$\text{MSE}=10\mathrm{lg}\left[\frac{1}{K}{\displaystyle \underset{i=1}{\overset{K}{\sum}}{\left({\stackrel{^}{y}}_{i}-{y}_{i}\right)}^{2}}\right]\text{\hspace{0.17em}}\left(\text{dB}\right)$, (11)

where *K* is the number of examples in the data set,
${\stackrel{^}{y}}_{i}$ is the *i*-th example output, and
${y}_{i}$ is the *i*-th output value of the model.

MSE measures the average squared difference between the actual and predicted values from the model. The aim of training is to minimize MSE. Gradient descent is one of the most popular algorithms to optimize neural networks. It seeks to determine the steepest descent and reduces the number of iterations and time taken to search large quantities of data points [7]. The gradient descent continuously updates parameters incrementally when an error calculation is completed to improve convergence. Model parameters are updated by

${\theta}_{q+1}={\theta}_{q}-\eta \times {\nabla}_{\theta}J\left(\theta \right)$ (12)

where *θ* denotes the parameters to be optimized (namely the connection weight *W* and the bias *B* in this application), the subscript *q* stands for the iteration number, *η* is the learning rate,
${\nabla}_{\theta}$ is the gradient with respect to *θ*, and
$J\left(\theta \right)$ is the cost function.

3. Power Amplifiers

The parameters of linear circuits are constant. Their output response is directly proportional to the input. Contrary to linear circuits, the parameters of nonlinear circuits vary with current and voltage. Only a few simple nonlinear circuits are adequately described by equations that have a closed form solution. Most of them may possess multiple solutions or may not possess a solution at all [8]. Therefore, analyzing nonlinear circuits is usually difficult and we often characterize them by means of models.

Nonlinear circuit models are categorized into physical models and behavioral models. Physical models are generated according to the internal circuitry of the devices, while behavioral models are established by means of input-output signals. For a complicated electronic system, acquiring its precise physical model is impractical. So the behavioral model is a better choice. In behavioral modeling, the electronic system is treated as a black box, whose internal constitution is unknown. The aim of behavioral modeling is to depict the system’s input-output relation in a mathematical form [9].

In nonlinear circuits, the superposition principle does not hold and there is a natural generation of harmonic frequencies. The nonlinear circuits are also capable of exhibiting a self-sustained oscillation, which may be desired, as in the case of frequency dividers and free-running oscillators, or undesired, as in the case of power amplifiers (PAs) [10].

The power amplifier is a typical nonlinear device, which is widely used in electronic systems. It receives an electrical signal and reprocesses it to amplify or increase its power. The main features of a power amplifier are the circuit’s power efficiency and the maximum amount of power that the circuit is capable of handling. To attain large output power and high energy efficiency, power amplifiers are often driven to maximum ratings, which results in serious nonlinear distortion.

Next, we take a push-pull PA as an example to illustrate the proposed modeling techniques. The schematic of the power amplifier is shown as Figure 2. The circuit simulation is performed in Multisim. There are two stages in the amplifier circuit. The first stage is a small-signal amplifier and the second stage is an output transformerless complementary-symmetry amplifier [11]. The input signal is provided by a function generator XFG1, the output signal is observed by an oscilloscope XSC1, and the distortion of the output signal is measured by a distortion analyzer XDA1.

An ideal amplifier is capable of amplifying a pure sinusoidal signal to provide a larger version and the resulting waveform is a pure single-frequency sinusoidal signal. When the device works at nonlinear regions, distortion occurs and the output will not be an exact duplicate of the input signal [12]. The distortion of the output signal is measured by total harmonic distortion (THD), which is defined as

$\text{THD}=\sqrt{{d}_{1}^{2}+{d}_{2}^{2}+\cdots {d}_{N}^{2}}\times 100\%$. (13)

In Equation (13),
${d}_{i}\left(i=\text{1},\text{2},\cdots ,N\right)$ is the *i*-th harmonic distortion determined by

${d}_{i}=\frac{\left|{A}_{i}\right|}{\left|{A}_{1}\right|}\times 100\%$. (14)

where *A*_{1} is the fundamental amplitude and *A _{i}* is the

Besides linearity, power efficiency is also an important consideration. The power efficiency of an amplifier is defined as the ratio of the output power *P*_{out} to the input power *P*_{in}. The power efficiency and the THD with different amplitude of the input signal *V*_{Ain} are listed in Table 1. The power and the THD data are acquired by the wattmeter and the distortion analyzer in Multisim respectively. When the input signal is small, the linearity of the device is good but the power efficiency is poor. As the input increases, the linearity descends but the power efficiency ascends. For a practical PA, there must be a compromise between its linearity and power efficiency.

Figure 2. The schematic of the PA to be modeled.

Figure 3. Input signal vs. output signal (*V*_{Ain} = 0.2 V and *f *= 1 KHz).

Figure 4. Input signal vs. output signal (*V*_{Ain} = 1.8 V and *f *= 1 KHz).

Table 1. Testing results of the power amplifier in Figure 2.

4. Training and Validation

The proposed DNN model is trained and validated in Tensorflow, which is a popular machine learning framework developed by Google. Tensorflow is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications, and much more. It contains a symbolic math library which is specially developed for machine learning applications such as deep neural networks.

To model the power amplifier discussed in Section 3, a neural network with three hidden layers is constructed. There are 25 neurons in each hidden layer. A part of Python code is as follows.

importtensorflow as tf

importnumpy as np

importmatplotlib.pyplot as plt

PA_dat = np.loadtxt('E:\deep\PA_dat.txt') # load raw data generated by Multisim simulator

a=25 # number of neurons in each hidden layer

...

W1=tf.Variable(tf.random_normal([1,a]), name="weight1") # weights between input and 1st hidden layer

B1=tf.Variable(tf.random_normal([1,a]), name="bias1") # bias of 1st hidden layer

S1=tf.nn.tanh(tf.add(tf.matmul(XX,W1),tf.matmul(tf.ones([c,1]),B1))) # output of 1st hidden layer

...

init=tf.global_variables_initializer() # initialize network parameters

cost=tf.reduce_mean(tf.square(Y-Z)) # cost function: MSE

learning_rate=0.05 # learning rate

optimizer=tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # gradient descent

training_epochs=1000 # epoch number

withtf.Session() as sess: # train the model

sess.run(init)

for epoch in range(training_epochs):

sess.run(optimizer, feed_dict={X:train_X, Y:train_Y})

For the purpose of comparison, a five-order polynomial model trained by the same data set is utilized as a reference model. The learning curves of two models are shown in Figure 5. After 1000 training epochs, the MSE of the DNN model is about −126 dB, while the MSE of the polynomial model is around −113 dB. Apart from higher accuracy, the DNN model also exhibits a faster convergence rate than the polynomial model does. However, it should be noticed that the DNN model uses more parameters and is more complicated.

The validation data set is generated by Multisim circuit simulator too. The frequency of validation signals ranges from 0.1 KHz to 1 MHz. One of the input-output waveform pairs is shown in Figure 6. The input signal consists of three frequency components, whose mathematical expression is

${v}_{\text{in}}=0.8\mathrm{sin}\left(2\text{\pi}{f}_{1}t\right)+0.5\mathrm{sin}\left(2\text{\pi}{f}_{2}t\right)+0.4\mathrm{sin}\left(2\text{\pi}{f}_{3}t\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left(\text{V}\right)$ (15)

where *f*_{1} = 10 KHz, *f*_{2} = 20 KHz, and *f*_{3} = 30 KHz. The validation result is shown in Figure 7. It illustrates that the model precisely fits the output data in validation set. In the validation set, the MSE of the DNN model is about −123 dB, while the MSE of the polynomial model is −112 dB. Therefore, compared with the commonly used polynomial model, the proposed DNN model improves accuracy in both training and validation data sets.

Figure 5. Learning curves of DNN model and polynomial model.

Figure 6. One of input-output waveforms used for validation.

Figure 7. Model validation.

5. Conclusion

This paper presented a DNN based behavioral modeling approach for nonlinear circuits. A power amplifier was taken as an example to illustrate the proposed modeling method. A feedforward deep neural network with three hidden layers was adopted to model the amplifier. The results show that compared with the commonly used polynomial model, the proposed model not only improves precision but also provides a faster convergence rate. Applying other newly developed neural networks to modeling nonlinear circuits is an obvious future extension of this work.

References

[1] Jin, Z., Zhou, Y. and Song, Z. (2009) Behavioral Modeling of RF Power Amplifiers Using Modified Volterra Series. Journal of Circuits, Systems, and Computers, 18, 351-359.

https://doi.org/10.1142/S0218126609005113

[2] Cao, Y. and Zhang, Q.J. (2009) A New Training Approach for Robust Recurrent Neural-Network Modeling of Nonlinear Circuits. IEEE Transactions on Microwave and Techniques, 57, 1539-1553.

https://doi.org/10.1109/TMTT.2009.2020832

[3] Tarver, C., Jiang, L., Sefidi, A. and Cavallaro, J.R. (2019) Neural Network DPD via Backpropagation through a Neural Network Model of the PA. Proceedings of the 53rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, 3-6 November 2019, 358-362.

https://doi.org/10.1109/IEEECONF44664.2019.9048910

[4] Chen, Z., Raginsky, M. and Rosenbaum, E. (2017) Verilog-A Compatible Recurrent Neural Network Model for Transient Circuit Simulation. Proceedings of IEEE 26th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), San Jose, 15-18 October 2017.

https://doi.org/10.1109/EPEPS.2017.8329743

[5] Liu, Z., Hu, X., Liu. T., Li, X., Wang, W. and Ghannouchi, M.G. (2020) Attention-Based Deep Neural Network Behavioral Model for Wideband Wireless Power Amplifiers. IEEE Microwave and Wireless Components Letters, 30, 82-85.

https://doi.org/10.1109/LMWC.2019.2952763

[6] Charu, C.A. (2018) Neural Networks and Deep Learning. Springer Nature Switzerland AG, Cham.

[7] Basodi, S., Ji, C., Zhang, H. and Pan, Y. (2020) Gradient Amplification: An efficient Way to Train Deep Neural Networks. Big Data Mining and Analytics, 3, 196-207.

https://doi.org/10.26599/BDMA.2020.9020004

[8] Tavazoei M.S., Kakhki M.T. and Bizzarri F. (2020) Nonlinear Fractional-Order Circuits and Systems: Motivation, a Brief Overview, and Some Future Directions. Open Journal of Circuits and Systems, 1, 220-232.

https://doi.org/10.1109/OJCAS.2020.3029254

[9] Riaza, R. (2020) Homogeneous Models of Nonlinear Circuits. IEEE Transactions on Circuits and Systems-I, 67, 2002-2015.

https://doi.org/10.1109/TCSI.2020.2968306

[10] Robert, L.B. and Louis N. (2011) Electronic Devices and Circuit Theory. 11th Edition, Pearson Education Inc., New Jersey.

[11] Bansal, R. and Majumdar, S. (2017) Nonlinear Modelling of Differential Amplifier Circuit. Proceedings of the IEEE international Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, 22-24 March 2017, 743-749.

https://doi.org/10.1109/WiSPNET.2017.8299860

[12] Almudena, S. (2009) Analysis and Design of Autonomous Microwave Circuits. John Wiley & Sons, Inc., Hoboken, New Jersey.