History was made in 2016 when Google’s AlphaGo computer program defeated a Go world champion. One of the key components of this program is the deep neural network and the amazing performance of AlphaGo provides motivation and excitement for extensive research in the area of deep learning. One technique accountable for the success of deep learning is the use of many layers of neural networks in which the output of one layer can be the input of the other and each layer is made up of a number of interconnected neurons. More importantly, there is a nonlinear activation functions inside each neuron, otherwise all these deep neural networks are essentially a single layer network. This is due to the fact that composition of many linear transformations is again a linear transformation which can serve only as a linear regression technique. It is the nonlinearity of neural networks that gives their ability to approximate any continuous functions to solve complicated tasks like language translations, image classifications, or even playing the Go game. Furthermore, the main source of this nonlinearity comes from the nonlinear activation function inside each neuron of the networks.
Mathematically neural networks can be viewed as weighted directed graphs in which neurons are nodes and connections among the neurons are directed edges with weights. The weights represent the strength of the interconnection between neurons. Neural networks learn by adjusting its weights and bias iteratively during a training session to produce the desired output. So each neuron makes decisions by summing the weighted evidence (input). The rules to change these weights during training are called the learning algorithm. Neural networks have found a wide array of applications in supervised learning, unsupervised learning, and reinforcement learning.
Quantum computers are rapidly developing and along with it, the availability of public accessible quantum computers is more of a reality today. This tend makes the study of quantum machine learning algorithms on a real quantum computer possible. With the unique quantum mechanical features such as superposition, entanglement, interference, quantum computing offers a new paradigm for computing. Research has shown that artificial intelligence and in particular machine learning can benefit from quantum computing. It is reasonable to hope that the next historical breakthrough in artificial intelligence like AlphaGo may well be realized on a quantum computer.
We have finished two reports recently using IBM’s 5Q quantum computer [1 , 2]. One is analyzing a distance-based quantum classifier where we show the prediction probability distributions of this classifier on several well-designed datasets to reveal the inner working of this classifier and extend the original binary classifier quantum circuit to a multi-class classifier. The second work is to compare quantum hardware performance on the decision making of an AI agent between an ion trap quantum system and a superconducting quantum system. Our investigation suggests that the superconducting quantum system tends to be more accurate and underestimate the values which are used by the learning agent while a previous research shows their system tends to overestimate .
Since the early days of quantum computing, how to use it in the areas of machine learning to gain the quantum speed up has been a long time endeavor [4 - 7]. Neural networks can perform versatile learning tasks like clustering, classification, regression, pattern recognition, and more. As such the classical neural networks are among the top targets for researchers to find their quantum counterpart. Numerous efforts have been made and claimed but unfortunately without much success . One of the major difficulties is how to create a quantum nonlinear activation function like the classical sigmoid function inside a neuron, the fundamental building unit of each network layer, since the operations on quantum states are required to be linear and unitary under the laws of quantum mechanics.
Using a new technique called repeat-until-success, the work in  shows that it is possible to design a quantum circuit to create a nonlinear activation function inside a quantum neuron. So it can function as a classical neuron to be in the state or , but can also be in the superposition of both states and , a feat classical neuron cannot accomplish. The key to their design of the nonlinear activation function is using a periodic tangent function. Therefore, it requires the input of this function to be restricted in the range of [0, π/2), which is a serious constraint for its uses in real world problems. The periodicity of the function also makes it not suited for being trained with the efficient gradient descent method since its derivatives oscillate. When data is big, an efficient training algorithm is much desired.
One work after  is reported in , where nonlinearity is realized in a quantum perceptron using the evolution of a Hamiltonian. Inspired by the work in [9 , 10], we set our goal to remove the restriction in  by creating a new nonlinear activation function that can take any real numbers as its input and its neuron be trained with gradient descent just like a classical neuron, making this neuron to enjoy features of a fully quantum neuron as well as a fully classical neuron.
2. Neural Networks
Neural networks are typically organized in layers, each of which is made of neurons. One kind of commonly used neurons have binary states which are proposed by McCulloch-Pitts . A neuron can be in an active state or a resting state, represented by 1 or 0 in mathematical notation.
Figure 1 shows the structure of a neuron: input and their corresponding weights and bias b and the activation function f that combines all the components of the input to generate an output. Note that without f, the output is a straightforward linear transformation of the input. Also the value of can be any real number. A neuron is a simple commuting unit with many inputs and one output.
There are several inputs for one neuron with one weight for each input, the weight of that specific connection. When the neuron receives inputs, it sums all the inputs multiplied by its corresponding connection weight plus a bias. The purpose to have a bias is to make sure that even when all the inputs are 0 there can be an output from the neuron. For mathematical convenience, we usually treat the bias as a normal weight that corresponding to an input of constant 1. After computing the weighted sum of its inputs, the neuron passes it through its activation function, which normalizes the result to get the desired output, depending on the purpose of the learning task such as classification or regression. So the key feature of neurons is that they can learn. The behavior of a neural network depends on both the weights and the activation function. Some simple examples of activation functions are step function that returns 1 if the input is positive or 0 otherwise and sigmoid function which is a smooth version of the step function.
The output of one layer becomes the input for the next layer. The first layer (input layer) receives its inputs and its output serves as an input for the next layer. This relay of information is repeated until reaching the final layer (output layer). The networks with this kind of layout are called feedforward neural networks. Other layouts of neural networks are also possible. A neural network can learn from data and store its learned knowledge inside the weights for the connections among the neurons.
Figure 1. A diagram to show the work of a neuron: input x, weights w, bias b, activation function f.
3. Design of a quantum neuron
Biologically inspired, classical artificial neurons is a mathematical function serving as a model of biological neurons. The hard part of creating a quantum neuron is the design of a nonlinear activation function as the laws of the quantum mechanics require the operations on quantum states be linear.
3.1. Repeat-Until-Success (RUS) Circuit
To implement a quantum algorithm on a quantum computer needs to translate the high level description of the algorithm into a low level physical quantum circuit representation. This task is usually accomplished by two steps: the first is to select a universal gate set, and the second is a decomposition algorithm that can create a quantum circuit with a sequence of the gates from this set. The Solovay-Kitaev theorem [12 , 13] is the first result that guarantees a single qubit unitary operation can be efficiently approximated by a sequence of gates from a universal gate set. Since then, many advances have been made but the circuits designed so far are all deterministic. In , a new approach is discovered, i.e., using non-deterministic quantum circuits. In this kind of circuits, a unitary operation is applied to a quantum state only if a certain expected measurement outcome is observed. Otherwise a cheap unitary operation can be utilized to reverse it. This process can then be repeated until the desired unitary operation is performed and therefore these circuits are called “Repeat-Until-Success” (RUS) circuits. A clear advantage of RUS circuits is their extremely low resource cost.
3.2. Yet, Another Quantum Neuron
Different versions of neurons have been proposed, but all of them fall short as true quantum neurons . So why is this neuron promising? The short answer is that this time its nonlinear activation function can be executed on a quantum computer.
The idea to create a nonlinear activation function inside a quantum neuron  is using a qubit to represent a superstition of two states and with this formula:
, where and is a
quantum operator defined by the Pauli Y operator. When or = 1, this qubit can be either or . In this case, it works like a classical neuron, but when , this qubit is in the superposition of and and no classical neurons can do that. The novelty of the work in  is using the circuit in Figure 2 to implement this idea on a quantum computer. Further, this circuit can be repeated, say k times,
to move any point closer to one of the ends of the interval . So when this can
Figure 2. The Repeat-until-success (RUS) circuit created in  to generate their nonlinear activation function (Figure 1(c) in ). It realizes a rotation with an angle when the measurement on q is 0, otherwise the circuit can be repeated until a 0 is measured. where , the three Pauli operators and M represents a measurement.
move the output of the circuit closer to and otherwise move it closer to . It takes k extra ancilla qubits to carry out the k repetitions.
The activation function used in  (Figure 3) is that is based on periodic tangent function. As a result, the input to the activation function of their quantum neuron is limited to the range of [0, π/2), which is a serious constraint on any real applications. As Figure 1 shows, the input to the activation function can be any real numbers. The approach to solve this problem taken by  is to use a scaling factor to confine the input within [0, π/2), which has to be tuned for each different application of
their neuron. Our approach to resolve this difficulty is to use this function, ,
based on sigmoid function that is not periodic (Figure 3). Another advantage of our activation function is that our threshold point is 0 that is usually better than the one π/4 used in . As we know that the domain for arcsine is [−1, 1] and its range is [−π/2, π/2], the domain for arctangent is all real numbers and its range is (−π/2, π/2). Essentially, the two activation functions are based on the function’s half range [0, π/2) which look like a typical sigmoid function. Finally, how to train these neurons is critical for their practical use in real world problems. As the right plot in Figure 3 shows the activation in  is not suited for gradient descent as its derivatives oscillate over a large domain.
Note that sigmoid function is a widely used activation function in classical neural
networks. With its range to be between 0 and 1, it is a good choice if the network needs to produce a probability at the end. It has a very nice mathematics property, , so its derivative is easy to compute and the function itself can be considered as a smoothed out version of a step function. But the maximum value of its derivative is 0.25, during the backpropagation training process the errors are being squeezed by a quarter at each layer, to say the least. Near the two ends of the sigmoid function, its values tend to be flat, implying the gradient is almost zero. As a result, it gives rise to a problem of vanishing gradients. Therefore it may not be a good choice for deep neural networks.
In order to show the ability of the quantum neurons, Nelder-Mead algorithm is employed to train them to solve the XOR problem in . To demonstrate the advantage of our activation function, here we use gradient descent to train the quantum neurons with our nonlinear activation function to solve the OR problem and another simpler binary classification problem. To this end, two binary classification datasets are described in Table 1.
Figure 3. The left function is used in  and the middle one is our own invention, and the right one is the left one with a larger domain to show the periodic nature of this function.
4.1. Test Our Neurons on IBM’s 5Q Processor and IBM’s Quantum Simulator
Figure 5 and Figure 6 illustrate that our quantum neurons can differentiate the data points between classes 0 and 1 within 4 steps of training regardless it is on simulator or on real device. Here we calculate the probability for each data point belonging to class 1, which means that when value of the probability gets closer to 1, it is more likely the point is in class 1, otherwise, it is more likely to be in class 0. One trend in these figures is that the gaps between probabilities for being in class 1 and 0 in the simulator are bigger than those in the real device. And it is easier to distinguish the data points if the gaps are bigger, thus our experiments imply that the performance of our quantum neurons is better on simulator than that on the real device.
4.2. The Advantage of Our Activation Function in Generating a ReLU Activation Function
Another common activation function is Rectified Linear Units (ReLU) . This is a simple function, nonetheless is nonlinear. It has become popular in recent years because it is efficient to calculate and can speed up the training process for large networks, compared with the more complicated functions like sigmoid function. Its derivative takes a value of 1 so it rectifies the gradient vanishing problem introduced by sigmoid function. The gradient of the ReLU function does not vanish as x increases, a sharp contrast with sigmoid function. However, its derivative is zero when x is negative, which can result
Table 1. Two datasets created for binary classification to test the quantum neurons with our nonlinear activation function.
in “dead” neurons and they can never be activated during the whole training period. Nonetheless, it is used in most convolutional neural networks or deep learning.
The work in  proposed a circuit (Figure 7) to use their nonlinear activation function to generate the ReLU function. Here we use our sigmoid based nonlinear function to implement this circuit. To show the major difference between their function and ours, we intentionally use a larger domain for their function, i.e., (−π/2, π/2) to highlight the issue that may cause by the periodicity of their function. We can see from Figure 8, their ReLu is only valid if the domain is [0, π/2), and beyond that it does not look like a normal ReLu. In contrast, our ReLU function behaves like a normal ReLU function regardless of the size of domain.
Figure 5. To test the performance of our nonlinear activation function, we train the quantum neurons with our function on the dataset 1 and dataset 2. There are two data points in this dataset where x_0 is in class 0 and x_1 is in class 1. Similarly there are four data points with x_00 in class 0 and x_10, x_01, and x_11 in class 1. We calculate the values of the probability for each data point belonging to class 1 using IBM’s simulator with 8192 shots.
Figure 6. The same as in Figure 5, but this time we use 1000 shots on the IBM’s 5Q computer to calculate the values.
Figure 7. A circuit created in  to use their nonlinear activation function to generate the ReLU function (Figure 9 in ).
Figure 8. ReLU functions generated by the activation function in  (top plot) and ours (bottom plot) using the circuit in Figure 8. P(110) means the probability of seeing 110 in the three measurements: 0 from , 1 from , and 1 from .
, 1 from , and 1 from . These three measurements are taken in sequence to generate the ReLU curves in Figure 8.
To summarize our findings, our nonlinear activation function offers the following advantages that the original one in  cannot offer: 1) It is not periodic so it can take any real numbers. 2) It can be trained with efficient gradient decent. 3) It can generate a ReLU function that looks more like a classical ReLU function.
As demonstrated by Google’s AlphaGo, deep learning has gained its reputation from its unprecedented success in so many areas including computer vision, speech recognition, natural language processing, and many more. The backbone of this great achievement is the workhorse of neuron networks, which is a computational model inspired from the biological neural networks. Mathematically neural networks are nonlinear statistical techniques to estimate unknown functions based on the training of the input data. The most important element in a neural network is neuron and key feature of neuron is that it can learn.
As quantum computing gets more into machine learning, how to create quantum neural networks has been an attractive topic . One of the challenges in the design of quantum neurons is how to create a nonlinear activation function as quantum operations are required to be unitary and linear. There have been a number of publications that focus on creating quantum neural network [4 - 7]. One research has found that none of the proposals for quantum neural networks fully exploits both the advantages of quantum physics and computing in neural networks .
One recent breakthrough  shows that it is possible to construct a simple quantum circuit to generate a nonlinear activation function. However, this function is based on a periodic function making it can only take input in the range of [0, π/2), a serious restriction for its uses in the real world problems. One report after  uses the evolution of a Hamiltonian to produce nonlinearity in a quantum perceptron .
Inspired by the work in [9 , 10], we think the activation function in  could be further investigated and clarified. For this goal, our study proposes a new nonlinear activation function to remove this limit, thus making it possible to create a real quantum neuron. Another advantage of our neurons is that they can be trained with efficient gradient descent while the original one cannot. Our neurons are tested on IBM’s quantum simulator and IBM’s 5Q quantum computer on two datasets for binary classification and show their benefit in generating a ReLU activation function that is closer to the classical counterpart than that from .
After finishing our study reported here, we find one recent paper , which creates a quantum neural network with classical learning parameters. As a result, their network can be trained with gradient descent as well.
We thank the IBM Quantum Experience for the use of their quantum computers and the IBM researchers, Dr. Andrew W. Cross, Dr. Douglas T. McClure, and Dr. Ali Javadi, who help me learn how to use their quantum computers.