Interpolation of Generalized Functions Using Artificial Neural Networks

Show more

1. Introduction: Main Definitions, Representations and Relations between Known Distributions

$\delta \left(x\right)=\{\begin{array}{l}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\ne 0,\\ \infty ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}x=0,\end{array}$ (1)

is used in the Green’s representation formula for the general solution of nonhomogeneous boundary value problems. Later in 1930s, Paul Dirac systematically used the δ function to describe a point charge localized at a given point. In practical analysis, the definition of the Dirac’s δ(1) must be supplemented by

${\int}_{-\infty}^{\infty}\delta \left(x\right)\text{d}x}=1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\displaystyle {\int}_{-\infty}^{\infty}f\left(x\right)\delta \left(x-{x}_{0}\right)\text{d}x}=f\left({x}_{0}\right),$ (2)

for arbitrary continuous function f.

On the other hand, Heaviside used the θ function given by

$\theta \left(x\right)=\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}x>0,\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}x<0,\end{array}$ (3)

to extend the notion of the Laplace integral transform in telegraphic communications. At this, the value θ(0) depends on particular problem and can be one of 0, 1 or 0:5.

According to the definition, the Dirac’s and Heaviside’s functions are related by

$\theta \left(x\right)={\displaystyle {\int}_{-\infty}^{x}\delta \left(\xi \right)\text{d}\xi},$ (4)

In other words, θ is the antiderivative of δ in the sense of generalized functions. On the other hand, the function max {x, 0} is the antiderivative of θ, therefore

${\int}_{-\infty}^{x}\theta \left(\xi \right)\text{d}\xi}={\displaystyle {\int}_{-\infty}^{x}\left(x-\xi \right)\delta \left(\xi \right)\text{d}\xi}=\mathrm{max}\left\{x,0\right\},$

which direct follows from the second equality in (2) with f(x) = x.

There exists a linear relation between the Heaviside’s generalized function and the sign function defined by:

$\theta \left(x\right)=\frac{1}{2}+\frac{1}{2}\mathrm{sin}\left(x\right).$

Evidently, here the value θ(0) = 0.5 is considered. However, it is possible to write such a formula with θ(0) = 0 or θ(0) = 1.

Other known generalized functions can be defined through θ. For instance, the characteristic function defined by

${X}_{\left[a,b\right]}\left(x\right)=\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\in \left[a,b\right],\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{else},\end{array}$

can be expressed in terms of θ according to

${X}_{\left[a,b\right]}=\theta \left(x-a\right)-\theta \left(x-b\right).$

The rectangular function defined by

$rect\left(x\right)=\{\begin{array}{l}1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left|x\right|<0.5,\\ 0.5,\text{\hspace{0.17em}}\text{\hspace{0.17em}}x=0.5,\\ 0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left|x\right|>0.5,\end{array}$

can be expressed in terms of θ as follows:

$rect\left(x\right)=\theta \left(x+\frac{1}{2}\right)-\theta \left(x-\frac{1}{2}\right)$

or

$rect\left(x\right)=\theta \left(\frac{1}{2}-{x}^{2}\right).$

As above, here the value θ(0) = 0.5 is considered as well.

Another well-known generalized function is defined through the convolution

$R\left(x\right)=\theta \left(x\right)\ast \theta \left(x\right),$

where * denotes the convolution operation. This function is called ramp function and has many applications in engineering (it is used in the so-called half-wave rectification, which is used to convert alternating current into direct current by allowing only positive voltages), artificial neural networks (it serves as an activation function), finance, statistics, fluid mechanics, etc.

According to the definition of the Heaviside function, the rump function can be represented also as

$R\left(x\right)=\theta \left(x\right)\ast \theta \left(x\right)={\displaystyle {\int}_{-\infty}^{x}\theta \left(x-\xi \right)\theta \left(\xi \right)\text{d}\xi}={\displaystyle {\int}_{-\infty}^{x}\theta \left(\xi \right)\text{d}\xi}=x\theta (x)$

2. Approximation of Main Generalized Functions by Means of Locally Measurable Functions

The theory of generalized functions is a very well developed mathematics subject crucial for rigorous analysis of many applied systems. Nevertheless, their rigorous definitions are completely useful in numerical analysis, because they are even not proper functions. In numerical analysis proper functional approximations of the generalized functions are used instant.

In practice, the approximation of generalized functions is based on construction of a sequence f_{n} of measurable functions giving the desired generalized function in limit when n → ∞. For instance, the sequence

${\delta}_{n}\left(x\right)=\frac{n}{\sqrt{\text{\pi}}}\mathrm{exp}\left(-{n}^{2}{x}^{2}\right),$

which is also called Gauss kernel, tend to the Dirac’s generalized function when n → ∞. The sequence δ_{n} is called δ-like sequence. Several other δ-like sequences can be found in literature. Examples include

${\delta}_{n}\left(x\right)=\frac{1}{\text{\pi}}\frac{n}{1+{n}^{2}{x}^{2}},$

which is also called Poisson kernel

${\delta}_{n}\left(x\right)=\frac{1}{2\text{\pi}s\left(x\right)}\mathrm{sin}\left(nx+\frac{1}{2x}\right),s\left(x\right)=\mathrm{sin}\left(\frac{1}{2}x\right),$

which is also called Dirichlet kernel, and

${\delta}_{n}\left(x\right)=\frac{1}{2\text{\pi}n{s}^{2}\left(x\right)}{\mathrm{sin}}^{2}\left(\frac{n}{2}x\right),$

which is also called Fejér kernel. Note that for all mentioned kernels,

${\delta}_{n}\in {L}_{loc}^{1}\left(-\infty ,\infty \right)$

i.e. they are locally measurable functions.

Taking into account (4), similar θ-like sequences can be constructed for approximating Heaviside’s θ. For instance,

${\theta}_{n}\left(x\right)=\frac{1}{2}\left[1+\mathrm{tanh}\left(nx\right)\right],$ (5)

often referred to as logistic function,

${\theta}_{n}\left(x\right)=\frac{1}{2}\left[1+\frac{2}{\text{\pi}}\mathrm{arctan}\left(nx\right)\right],$

${\theta}_{n}\left(x\right)=\frac{1}{2}\left[1+erf\left(nx\right)\right],$

also called erf-approximation,

${\theta}_{n}\left(x\right)=\frac{1}{2}\left[1+\frac{2}{\text{\pi}}si\left(\text{\pi}nx\right)\right],$

${\theta}_{n}\left(x\right)=\mathrm{exp}\left[-\mathrm{exp}\left(-nx\right)\right],$

Similar sequences can be constructed for the functions sign, χ, rect and R above. For example,

$rec{t}_{n}\left(x\right)=\mathrm{exp}\left[-\mathrm{exp}\left(-n\left(\frac{1}{4}-{x}^{2}\right)\right)\right]$

and

$rec{t}_{n}\left(x\right)=\frac{1}{2}\left[1+\mathrm{tanh}\left(n\left(\frac{1}{4}-{x}^{2}\right)\right)\right]$ ,

can be viewed as rect-like sequences.

On the other hand, the expression

${R}_{n}=x\mathrm{exp}\left[-\mathrm{exp}\left(-nx\right)\right]$

can be used as approximation to the ramp function.

3. Approximation of Generalized Functions Using Artificial Neural Networks

The neural network providing approximation consists of an input layer, a hidden layer and output layer. The quadratic error of approximation

$\epsilon ={\left(f\left(x\right)-{f}_{app}\left(x\right)\right)}^{2}$

is considered, where f is the original function and f_{app} is its approximation. Moreover, in all examples below the θ-like sequence (5) considered. Other sequences can be applied exactly in the same way. The learning rate is always fixed to 10^{−}^{3} for simplicity.

Approximation of the rect function for different number of nodes is presented in Figure 1. A better approximation with less error can be obtained by increasing the learning rate or the number of nodes. The error is plotted on Figure 2, from which it is obvious that the least error is $\epsilon ~{10}^{-4}$ It is evident from Figure 3 that the known Gibbs phenomenon does not occur here [9] .

Figure 1. Approximation of rect function in [−1, 1] with 50 nodes (upper) and 100 nodes (lower).

Figure 2. Quadratic error approximation of rect function in [−1, 1] with 100 nodes.

Figure 3. Function fit (upper) and regression behavior (middle) and network performance (lower) for rect function in [−1, 1] with 100 nodes.

4. Conclusions

Possibilities if artificial neural network approximation of generalized functions is considered by means of locally measurable approximations of the Dirac’s delta function and Heaviside’s theta function. Considering the quadratic error of approximation, the rect function is approximated taking into account the relation between the rect and Heaviside’s theta functions. It is shown that due to the usage of neural networks, the Gibbs phenomenon does not occur. Using similar representation formulas and other approximations to the Heaviside’s function, the characteristic or sign functions can also be approximated by artificial neural networks.

The results can be employed in numerical analysis of problems containing discontinuous phenomena, switching dynamics, etc.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Teodorescu, P.P., Kecs, W.W. and Toma, A. (2003) Distribution Theory: With Applications in Engineering and Physics. Wiley-VCH Verlag, Weinheim.

[2] Vladimirov, V.S. (2002) Methods of the Theory of Generalized Functions. Taylor & Francis, London, New York.

[3] Grubb, G. (2009) Distributions and Operators. Springer, Berlin, Heidelberg.

[4] Weigend, A.S. and Gershenfeld, N.A. (Eds.) (1994) Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley, Reading.

[5] Zhang, W. and Barrion, A. (2006) Function Approximation and Documentation of Sampling Data Using Artificial Neural Networks. Environmental Monitoring and Assessment, 122, 185.

https://doi.org/10.1007/s10661-005-9173-6

[6] Zainuddin, Z. and Ong, P. (2007) Function Approximation Using Artificial Neural Networks. WSEAS Transactions on Mathematics, 1, 173-178.

[7] Zhang, Q. and Benveniste, A. (1991) Approximation by Nonlinear Wavelet Networks. IEEE Transactions on Neural Networks, 6, 3417-3420.

[8] Ferrari, S. and Stengel, F. Smooth Function Approximation Using Neural Networks. Lecture Notes.

http://ce.sharif.edu/courses/84-85/2/ce667/resources/root/Seminar_no_10/tnnlatestmanu.pdf

[9] Weisstein, E.W. (2013) Gibbs Phenomenon. From MathWorld—A Wolfram Web Resource.