Predictive Block-Matching Algorithm for Wireless Video Sensor Network Using Neural Network

Show more

1. Introduction

A good wireless multimedia sensor network (WMSN) must be energy efficient, for the possibility of operating in harsh environment and lack of maintenance [1] [2] [3] . However, video encoding requires massive amount of calculation power. Taking H.264 standard as an example, processing an HDTV (720P) video signal in real time requires a 3600 GIPS processor with 5570 GB/s memory band-width [4] [5] , which exceeds the computing power of most desktop processor. Apart from implementing more advanced processers with higher energy efficiency ratio, which will increase the hardware cost, using more efficient encoding algorithm is a more practical option.

For a video encoding system, predictive coding is the key part and requires more calculation than other parts such as entropy coding. Motion estimation plays an important role in predictive coding [6] [7] . Therefore, in the field of video stream transmission for WMSNs, an energy-efficient block-matching algorithm is of high importance. There are variety of block-matching search algorithms for motion estimation, such as three-step search, four-step search, diamond search, modified diamond-square search and gradient descent search, etc. [6] [7] . Prediction technique proposed by Zhang and Zafar [8] , [9] uses a simple prediction model to reduce the search windows size for block-matching, therefore significantly reduced the calculation complexity of motion estimation, with only small sacrifice on image quality. It has been widely used by many researchers in different block-matching algorithms [10] [11] [12] . The prediction model proposed by Zhang and Zafer is straight forward, simply assuming that the current block has the same motion vector as the block at the same place in the previous frame or around it in the same frame. However, in our test the model did not provide high prediction accuracy. Instead of taking a mean value, we believe the reference motion vectors have certain weights. Therefore, we thought of artificial neural network (ANN).

An ANN mimics somewhat the learning process of a human brain. Instead of complex rules and mathematical routines, ANNs are able to learn the key information patterns within a multidimensional information domain [13] [14] . In this report we aim to use a simple back propagation neural network instead of the traditional prediction model, to improve the prediction accuracy. The results show that the proposed neural network can significantly improve the prediction accuracy and leads to better image quality. Because of the simplicity of the neural network, the required calculation is little.

2. Background Research

2.1. Motion Estimation

The principle of block-matching motion estimation is to find out the movement of each microblock compared with the previous frame. Getting motion vector is the first step to do predictive encoding for video compression. To find out the movement information, the current frame is divided into 16 × 16 pixels micro- blocks. Each microblock is compared to the reference frame in a search window of a certain size, therefore locate a microblock which has a minimum MAD (mean absolute difference) value as is shown in Figure 1. The lengths of the microblock moved along the x axis and y axis in pixels are the motion vectors.

2.1.1. Three Step Search

Three-step search (TSS) algorithm was first introduced in 1981 by Koga et al. [15] . It is a classic search algorithm for motion estimation. It is an efficient algorithm which needs only 3 three steps to finish the search in a search window

Figure 1. Block-matching motion estimation.

of ±7. If the search window is larger than ±7, then this algorithm needs more than 3 steps to finish the search. TSS uses a coarse-to-fine search pattern. Starting from the initial point, TSS compares 8 search points around it with a certain search step and finds the direction of block movement, then perform a more accurate search by reducing the search step. The description of TSS algorithm is detailed as follows:

Step 1: The step size is set as half of (search window size −1). Together with the initial point, the SAD values of 8 search points at the distance of step size are compared.

Step 2: The point which has the minimum SAD value is picked as the new initial point, the step size is halved.

Step 1 and 2 are repeated until the step size is smaller than 1.

Figure 2 shows the search pattern of TSS algorithm.

2.1.2. Enhanced Modified Orthogonal Search

Orthogonal search (OS) algorithm was introduced by Soongsathitanon et al. [16] in 2005. The search pattern is horizontally and vertically conducted alternatively (Figure 3(a)). It may be considered as improved TSS algorithm, which takes less number of search points. However, this algorithm has the same disadvantage as TSS does: it is inefficient for small motion estimation.

Modified Orthogonal search (MOS) algorithm was introduced by Metkar et al. [17] in 2010, meant to solve the small motion estimation problem. It adds 8 new search points as a 3 × 3 square around the initial point in the first search step (Figure 3(b)). The added 8 search points improve its performance on small motion estimation.

Enhanced modified orthogonal search (EMOS) algorithm was introduced by Pandian et al. [18] in 2011. It replaced the 3 × 3 square by a small diamond search pattern (Figure 3(c)). EMOS further improved the motion estimation performance compared with MOS, especially in reducing search points.

Figure 2. Three-step search algorithm (search window ±7).

Figure 3. Search algorithms for motion estimation [18] . (a) Orthogonal search algorithm; (b) Modified orthogonal search algorithm; (c) Enhanced modified orthogonal search algorithm.

Step 1: An extra small diamond search pattern around the initial point is added in addition to the original horizontal search points in the first step of OS algorithm. If the initial point (0, 0) of the search window has the minimum SAD value, the block is assumed to have zero movement and search will be terminated. Otherwise, the point which has the minimum SAD value is selected as the new initial point.

Step 2: Step size is halved if the minimum SAD point is any one of the points in the small diamond pattern. 2 new search points at a distance of step size in the horizontal direction from the initial point together with the initial point itself are compared. The initial point is moved to the winning point.

Step 3: Two points in the vertical direction at a distance of the step size from the initial point are compared with the initial point. The initial point is moved to the winning one.

Step 4: Halve the step size until it is smaller than 1, etc.

The early termination of the algorithm at step 1 helped to reduce the computations significantly also retain the simplicity and regularity of EMOS algorithm. The number of search points needed to find the motion vector for EMOS as compared with the MOS, OS and TSS algorithms.

2.2. Predictive Block-Matching Motion Estimation Scheme

Predictive block-matching motion estimation scheme was introduced in 1991, namely predictive pattern search (PPS) [8] [9] . PPS has two prediction modes: inter-block prediction and inter-frame prediction. Inter-block prediction assumes neighboring blocks usually have the same or similar moving direction and distance. Therefore, via the reference of a nearby motion vector, the search window for the current block can be reduced, hence reduces the number of search points. Inter-frame prediction assumes a current block moves to the same or similar direction at the same or a similar distance compared with the corresponding block in the previous frame. Figure 4 demonstrates the way inter-block and inter-frame prediction works.

Predictive Enhanced Modified Orthogonal Search (PEMOS), performs both inter-block and inter-frame prediction. The estimation of the current vector can be calculated using an autoregressive model given by [8] [9] :

$E\left\{{V}_{i+1}\left(m,n\right)\right\}={\displaystyle \underset{p,q\in \theta}{\sum}}{\alpha}_{p,q}{V}_{i}\left(m-q,n-q\right)$ (1)

where $\left\{{\alpha}_{p\mathrm{,}q}\right\}$ is a set of prediction coefficients and $\theta $ can be a causal, semi- causal or noncausal set defined by [8] :

Causal set: $\theta =\left\{p,q:q>0\forall p\right\}\cup \left\{p,q:q=0,p>0\right\}$

Semi-causal set: $\theta =\left\{p,q:q>0\forall p\right\}\cup \left\{p,q:q=0,\forall p\ne 0\right\}$

Noncausal set: $\theta =\left\{p,q:\forall \left(p,q\right)\in \delta \right\}$

Figure 4. Predictive pattern search. (a) Inter-block predictive pattern search scheme; (b) Inter-frame predictive pattern search scheme.

According to [8] and [9] , for inter-block prediction, a casual model is used. For inter-frame prediction, any one of the three might be used.

Predictive search scheme has been added to different search algorithms and proved effective. For example, [10] proposed predictive three step search, which adds predictive block-matching scheme to three step search. The results show that predictive three step search significantly reduces computational complexity, with negligible decrease on image quality.

Another example is [11] , which added predictive block-matching scheme to EMOS. EMOS is believed to be one of the most efficient block-matching algorithms [6] . According to [11] , the proposed algorithm used Zhang and Zafer’s prediction model to estimate the motion vectors, therefore the first step of EMOS can be skipped, i.e., a smaller search window is used. If the estimated motion vector is very small, an even smaller search pattern will be used. As a result, PEMOS can reduce computational complexity up to 30%, also with very slight decrease on image quality.

2.3. Back Propagation Neural Network

Rumelhart and McCelland published a book namely Parallel Distributed Processing [19] , discussed the algorithm Error Back Propagation in detail. Today back propagation neural network has become one of the most widely used artificial neural network models [13] . It is widely used in the field of function approximation, pattern recognition, data mining, system identification, automation technology and so on [14] .

Figure 5 shows a typical single hidden layer back propagation neural network

Figure 5. An example of back propagation neural network.

model. It includes an input layer, a hidden layer and an output layer.

For the hidden layer, we have:

${y}_{i}=f\left(ne{t}_{j}\right),\text{\hspace{0.17em}}j=1,2,\cdots ,m$ (2)

where

$ne{t}_{j}={\displaystyle \underset{i=0}{\overset{n}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{v}_{ij}{x}_{i},\text{\hspace{0.17em}}j=1,2,\cdots ,m$ (3)

For the output layer, we have:

${o}_{k}=f\left(ne{t}_{k}\right),\text{\hspace{0.17em}}k=1,2,\cdots ,l$ (4)

where

$ne{t}_{k}={\displaystyle \underset{j=0}{\overset{m}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{w}_{jk}{y}_{j},\text{\hspace{0.17em}}k=1,2,\cdots ,m$ (5)

In Equations (2) and (3), the transform function f (x) can be sigmoid functions similar as Equations (7) or (8), according specific applications.

$f\left(x\right)=\frac{1}{1+{\text{e}}^{-x}}$ (6)

$f\left(x\right)=\frac{1-{\text{e}}^{-x}}{1+{\text{e}}^{-x}}$ (7)

3. Proposed Neural Network for Predictive Block-Matching

A back propagation neural network model with 5 inputs, 1 hidden layer which contains 5 neutrons, and 1 output is used in this report (Figure 6).

Figure 6. Proposed back propagation neural network.

The five inputs are: the motion vector at the same place in the previous frame, the motion vectors at top left, top, top right and left of the current block in the current frame. The number of hidden neutrons is decided applying the following equation [20] :

$\begin{array}{l}\text{Number}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{hidden}\text{\hspace{0.17em}}\text{neurons}\\ =\frac{1}{2}\left(\text{inputs}+\text{outputs}\right)\cdot \sqrt{\text{number}\text{\hspace{0.17em}}\text{of}\text{\hspace{0.17em}}\text{training}\text{\hspace{0.17em}}\text{patterns}}\end{array}$ (8)

The number of training patterns was assumed to be 4. Although the actual training patterns are more than 4, in order to keep the simplicity of neural network model, we consider motion vector values (positive, positive), (positive, negative), (negative, positive), (negative, negative) as 4 patterns. The training data contains 2000 sets of values generated using full search in the platform MATLAB. A linear transform function is used for computation simplicity.

4. Results and Discussion

In order to find out the effectiveness of neural network prediction, three block-matching algorithms introduced in Section 0were tested, respectively full search, three step search and enhanced modified orthogonal search. We recorded prediction accuracy with different prediction schemes, image quality in terms of peak signal to noise ratio (PSNR) and energy efficiency in terms of average search points required per block.

In our test, for traditional prediction scheme, the mean value of inter-block and inter-frame prediction is used for the estimation of motion vector $E\left[{V}_{i+1}\left(m\mathrm{,}n\right)\right]$ .

Several videos were tested using different block-matching search algorithms. Table 1 shows the prediction accuracy of the proposed back propagation neural network compared with traditional prediction scheme. Tables 2-7 illustrate average search points needed per block, and the average peak signal-to-noise ratio (PSNR) which represents the image quality for different video and different search algorithms.

Table 1. Prediction accuracy comparison between neural network and traditional method.

Table 2. PSNR (dB) performance with full search (search window ±7).

Table 3. Average search points per block with full search (search window ±7).

Table 4. PSNR (dB) performance with three step search (search window ±7).

Table 5. Average search points per block with three step search (search window ±7).

Table 6. PSNR (dB) performance with EMOS (search window ±7).

Table 7. Average search points per block with EMOS (search window ±7).

Table 3 and Table 6 show that using prediction schemes does have negative effect on image quality, causing the drop of average PSNR. Videos with larger movements such as “Bus” and “Stefan” appears to be more affected compared with videos with smaller movements such as “Akiyo” and “Claire”. However the average search points decrease to around a quarter using predicting schemes. Compared with traditional prediction scheme, neural network prediction can provide slightly higher image quality, because of its higher prediction accuracy as is shown in Table 1.

Using predictive block-matching schemes on three step search and EMOS shows similar results, as is illustrated in Tables 4-7. Orthogonal search (OS) can be seen as an improved version of TSS, and EMOS is improved OS. EMOS is already one of the most efficient block-matching algorithms. However, using predictive schemes is still an effective way to further improve its efficiency. Using neural network to predict the motion vector again is more accurate than the traditional approach, and leads to higher image quality without adding too much computation complexity.

Table 8 illustrates the computational complexity of the neural network for each block. Take EMOS as an example, assuming number of required search points per block is 7. The required numerical operation includes 3584 add or sub and 1792 absolute value for each block. The additional calculation complexity for the neural network is negligible compared with the existing calculation. Considering the improvement on prediction accuracy and image quality, it is very feasible to use neural network for motion vector prediction.

5. Future Work

As the proposed back propagation neural network requires little calculation, in the future, implementation on real-time system might be carried out. The simplicity of the neural network model makes it not difficult to be implemented on either FPGA systems or embedded systems with general processors. Its performance in practice is to be found out and compared with the MATLAB simulation results after that.

6. Conclusion

This report proposed a predictive model for block-matching based on neural network. This model is a back propagation neural network with 5 neutrons, aiming to improve the prediction accuracy of predictive block-matching schemes, with little computation cost. The results approve that the proposed model is effective. Firstly, Compared with the traditional predictive scheme, neural network shows great advantage in prediction accuracy, especially when the video has large movements. In addition, we did three sets of tests, respectively: without using prediction,

Table 8. Computational complexity of back propagation neural network.

using neural network prediction, and using traditional prediction scheme. For each case, three widely used block-matching algorithms were tested. Using neural network results in better image quality than using traditional prediction, and the value average search points stay the same or become less. In some cases, the image quality is better than without using prediction. Because of the simplicity of the neural network model, it can be easily implemented on existing designs, and requires very little computational power. In conclusion, using the proposed neural network is an effective way to improve the image quality of video encoding systems using predictive block-matching algorithms, and the cost on computational complexity is very small.

Acknowledgements

The work was partly supported by the Zhejiang Natural Science Foundation under grant No. LY13F010014.

References

[1] Akyildiz, I.F., Melodia, T. and Chowdhury, K.R. (2007) A Survey on Wireless Multimedia Sensor Networks. Computer Networks, 51, 921-960.

https://doi.org/10.1016/j.comnet.2006.10.002

[2] Akyildiz, I.F., Melodia, T. and Ghosal, D. (2008) Wireless Sensor Network Survey. Computer Networks, 14, 32-39.

[3] Yick, J., Mukherjee, B. and Chowdhury, K.R. (2007) Wireless Multimedia Sensor Networks: A Survey. IEEE Wireless Communications, 52, 2292-2330.

[4] Chen, T.C., Lian Jr, C. and Chen, L.G. (2007) Hardware Architecture Design of an H. 264/AVC Video Codec. Proceedings of the 2006 Asia and South Pacific Design Automation Conference, Yokohama, 24-27 January 2006, 750-757.

[5] Chen, T.C., Chien, S.Y. and Huang, Y.W. (2006) Analysis and Architecture Design of an HDTV720p 30 Frames/s H. 264/AVC Encoder. IEEE Transactions on Circuits and Systems for Video Technology, 16, 673-688.

https://doi.org/10.1109/TCSVT.2006.873163

[6] Chee, C.S., Jambek, A.B. and Hussin, R. (2012) Review of Energy Efficient Block-Matching Motion Estimation Algorithms for Wireless Video Sensor Networks. 2012 IEEE Symposium on Computers & Informatics (ISCI), Penang, 18-20 March 2012, 241-246.

https://doi.org/10.1109/ISCI.2012.6222702

[7] Huang, Y.W., Chen, C.Y. and Tsai, C.H. (2006) Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 42, 297-320.

https://doi.org/10.1007/s11265-006-4190-4

[8] Zafar, S., Zhang, Y.Q. and Baras, J.S. (1991) Predictive Block-Matching Motion Estimation for TV coding. I. Inter-Block Prediction. IEEE Transactions on Broadcasting, 37, 97-101.

https://doi.org/10.1109/TBC.1991.1492730

[9] Zhang, Y.Q. and Zafar, S. (1991) Predictive Block-Matching Motion Estimation for TV Coding. II. Inter-Frame Prediction. IEEE Transactions on Broadcasting, 37, 102-105.

https://doi.org/10.1109/11.99095

[10] Amirpour, H., Mousavinia, A. and Shamsi, N. (2013) Predictive Three Step Search (PTSS) Algorithm for Motion Estimation. 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP), Zanjan, 10-12 September 2013, 48-52.

https://doi.org/10.1109/IranianMVIP.2013.6779948

[11] Yan, Z.G. and Welsen, S. (2016) Predictive Enhanced Modified Orthogonal Search Algorithm with Two Patterns for Wireless Video Sensor Network. Asia Pacific Conference on Multimedia and Broadcasting, Bali, 17-19 November 2016, 60-65.

[12] Ahmed, Z., Hussain, A.J. and Al-Jumeily, D. (2011) Mean Predictive Block Matching (MPBM) for Fast Block-Matching Motion Estimation. 3rd European Workshop on Visual Information Processing, Paris, 4-6 July 2011, 67-72.

[13] Demuth, H.B., Beale, M.H. and De Jess, O. (2014) Neural Network Design, Martin Hagan.

[14] Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117.

[15] Koga, T. (1981) Motion-Compensated Interframe Coding for Video Conferencing. Proceedings of National Telecommunication Conference, New Orleans, 29 November-3 December 1981, G5.3.1-G5.3.5.

[16] Soongsathitanon, S., Woo, W.L. and Dlay, S.S. (2005) Fast Search Algorithms for Video Coding Using Orthogonal Logarithmic Search Algorithm. IEEE Transactions on Consumer Electronics, 51, 552-559.

[17] SMetkar, S.P. and Talbar, S.N. (2010) Fast Motion Estimation Using Modified Orthogonal Search Algorithm for Video Compression. Signal, Image and Video Processing, 4, 123-128.

https://doi.org/10.1007/s11760-009-0104-9

[18] Pandian, S., Bala, G.J. and Anitha, J. (2011) Enhanced Modified Orthogonal Search for Motion Estimation. Recent Advances in Intelligent Computational Systems, Trivandrum, 22-24 September 2011, 907-910.

[19] Rumelhart, D.E. and McClelland, J.L. (1988) Parallel Distributed Processing.

[20] Kalogirou, S.A. and Bojic, M. (2000) Artificial Neural Networks for the Prediction of the Energy Consumption of a Passive Solar Building. Energy, 25, 479-491.