Human Face Super-Resolution Based on Hybrid Algorithm

Show more

1. Introduction

Image super-resolution is a classical problem in the domain of computer vision. It aims to infer an HR image with crucial information from the given LR images. Face hallucination is a branch of image super-resolution, which develops domain specific prior knowledge with strong cohesion to face domain. It was first introduced by Baker and Kanada [1] and has attracted growing attention due to practical importance in many face based applications such as face recognition, face alignment and so on. As the development of machine learning, there are numerous learning-based methods which have been proposed to solve the face hallucination problem. Learning based algorithms have been seen to achieve higher magnification factor with better visual quality than the other super resolution techniques such as bi-cubic interpolation and reconstruction based techniques.

The algorithm based on Interpolation is the most basic method in face super-resolution research, including the nearest neighbor interpolation, bilinear interpolation, bicubic Interpolation etc. The method based on reconstruction has a fast speed and made a little improvement in image quality. However, because it is limited by the original information of the image, the ambiguity caused by low resolution sampling cannot be removed. Feerman et al. [2] , proposed a method of image super-resolution based on sample learning, and obtained the vector set in the external database through the nearest neighbor search. Timofte combined the sparse learning dictionary and the domain embedding method, and proposed the method of fixed domain regression, which improves the processing speed of the algorithm. Dong et al. [3] , proposed a deep convolutional neural network (CNN) based single image super resolution method and showed that the traditional sparse coding-based algorithm can also be seen as a kind of deep convolutional network. The end-to-end mapping between LR images and HR images was optimized in Dong’s SR method, which achieves excellent reconstruction performance.

Inspired by the above literature, we apply the deep learning theory to illusory face hallucination reconstruction [4] . The model of deep convolution neural network is improved, the convolution neural network is added to the pool layer, the convolution kernel size is adjusted, the parameters are reduced, and the operation speed is increased. Finally, the iterative back projection method is used to reconstruct the face image after post-processing [5] .

2. Methodology

Image acquisition process may be affected by motion blur, optical blur, signal aliasing caused by down sampling and all kinds of noise. The picture is polluted by all of the above. Elad proposed a matrix vector approach to describe low resolution image imaging models [2] .

$y=HDXw+n$ (1)

X represents high resolution images, y means low resolution images, and N represents additive Gauss noise. D, H and w denote the down sampling matrix, the fuzzy matrix and the geometric transformation matrix respectively. Hallucination face is the inverse process of face image degradation. The purpose is to give the low resolution image y to restore the original high resolution image X.

Convolutional Neural Networks are a biologically inspired variant of multi-layered perceptron networks (MLP’s), specialized for image processing. First popularized by LeCun et al. in they are similar to other hierarchical feature extraction methods such as the Neocognitron and HMAX.

The structure of a typical CNN consists of alternating layers of convolutional and pooling followed by an output classification layer. Each type of layer contains several feature maps, or groups of neurons, in a rectangular configuration.. The receptive field itself is simply a number of weighted connections, that is, each connecting edge has a weight. The group of weights applied by a neuron is called a weight kernel. A distinguishing property of these networks is that all neurons in a feature map share the same weight kernel. The idea behind this configuration is that a spatial feature detector should be useful across an entire image, instead of just at a particular location; for example a vertical edge detector. The convolutional layers in the network perform the majority of processing in these networks, with the feature maps in the pooling layers simply down-sampling their corresponding feature map in the convolutional layer [6] .

In the past few years, the method based on deep learning has been improved and updated. It is not only applied to the image classification in the field of computer vision, but also from face recognition to semantic segmentation. Recently, deep learning method has also been applied to low level vision tasks, including image denoising, image enhancement, image super-resolution and so on. The seminal work of image super-resolution convolutional neural network (SRCNN) was done by Dong et al. [7] .

The model is mainly composed of three volumes, which generally simulate a sparse layer. Three coiling layers accomplish the following tasks: patch extraction and representation, nonlinear mapping and reconstruction [8] .

1) Patch extraction and representation: A patch is extracted from a low resolution image and each patch is represented as a high dimensional vector. These high dimensional vectors are composed of a set of feature maps, and the dimension of the vector is equal to the number of maps.

2) Non-linear mapping: this step nonlinearly maps each high dimensional vector onto another high dimensional vector. Each mapped vector is conceptually the representation of a HR patch. These vectors comprise another set of feature maps.

3) Reconstruction: this step aggregates the above HR patch-wise representations to generate the final HR image. This image is expected to be similar to the ground truth R.

2.1. Iterative Back Projection Algorithm

The iterative back projection algorithm proposed by Irani is the representative method of the original image restoration [4] . It is used as the post-processing in the processing of the image super-resolution algorithm. The result of the single image interpolation is usually used as the initial solution of the high resolution image. According to the system model, the quasi low resolution image can be expressed as:

${y}_{0}=H{x}_{0}+n$ (2)

If the x is equal to the original high resolution image and the upper analog imaging process conforms to the actual situation, the analog low resolution sequence y_{0} is the same as the actual low resolution image y, and if it is different, the difference between y and y_{0} is projected back to x_{0} for correction.

In IBP algorithm, HR image is obtained by utilizing the backward projection of the error projection matrix based on the difference between simulated LR images and the observed LR images with up-sampling, reverse blur filter and reverse motion transform [5] [9] . This process iteratively calculates the estimated HR image until the energy of the error comes to a minimum value, or the iteration step achieves the maximum number. The simulated LR images, the error projection matrix for the acquisition process, and the IBP algorithm iterative reconstruction process can be written as:

${E}^{n}=\frac{1}{N}N{\displaystyle {\sum}_{K=1}^{N}{H}_{k}^{BP}\left({y}_{k-}{\stackrel{^}{y}}_{k}^{\left(n\right)}\right)}$ (3)

${\stackrel{^}{z}}^{\left(n+1\right)}={\stackrel{^}{z}}^{\left(n\right)}+\lambda {E}^{\left(n\right)}$ (4)

where $\stackrel{^}{z}$ and ${\stackrel{^}{z}}^{\left(n+1\right)}$ denote the super resolution image gained from the (n)th and (n + 1)th iteration respectively, ${\stackrel{^}{y}}_{k}^{\left(n\right)}$ denotes the (n)th simulated LR images of ${\stackrel{^}{z}}^{\left(n\right)}$ under the imaging degradation model, ${E}^{\left(n\right)}$ is the difference between the simulated LR images and the observed LR images, ${H}_{k}^{BP}$ is the (k)th back projection operation and λ is the iteration step.

2.2. The Hybrid Algorithm

The reconstruction algorithm of the iterative back-projection algorithm is not outstanding, but it can be combined with other super resolution methods to improve the performance. In this paper, the super-resolution algorithm based on convolution neural network is improved and combined with the iterative back-projection algorithm [10] , a new composite algorithm is proposed.

2.2.1. The Convolution Layer

In the convolution layer, we mainly consider the influence of the size and number of convolution kernel on the processing effect and processing speed of the model, Convolution neural network proposed by Dong et al. [3] [7] . And super-resolution model have three volumes of convolution kernel size of 9 × 9, 1 × 1, 5 × 5. The improved algorithm in this paper has a three layer convolution kernel size of 3 × 3, 1 × 1, 3 × 3. As shown in the following Figure 1.

The larger the convolution kernel size is in convolution, the better the super-resolution effect, but it will also increase the corresponding computation [11] . In our improved convolutional neural network model, the first layer convolution kernel is changed to 3 × 3, which can effectively reduce the number of parameters and cover the features of the image. If the second layer convolution kernel increases, the parameter will increase. The convolution kernel size of second layers is still 1 × 1. The size of the convolution kernel of third layers is 3 × 3. As the number of convolution cores may affect the quality of super-resolution images, we train and test the number of iterations of the convolution neural network model of 3-1-3 and 3-1-5 respectively. The experimental results show that each iteration 100 times 3-1-5 model takes 7.1 seconds more than the 3-1-3 model; The average PSNR 3-1-5 model of the phase output super

Figure 1. The flow chart of super-resolution algorithm.

resolution image is 0.1dB higher than that of the 3-1-3 model. Considering the processing results and computing speed, we select the 3-1-3 model.

The number of convolution kernel and convolution kernel size determines the super resolution effect together. In the super-resolution algorithm 9-1-5 model proposed by Dong and others, the first convolution kernel number and the second layer convolution kernel number improvement algorithm are tested in the 3-1-3 model for different size and the final selection ${n}_{1}=64,{n}_{2}=64$ .

According to the selection of the number of convolution kernel [10] we chose and the test image under the condition of the same iteration number and the same learning rate in the improved algorithm 3-1-3 model. The PSNR value of super resolution image test is $5\times {10}^{4}$ and the learning rate is ${10}^{-3}$ . The comparison data obtained are shown in Table 1.

At the same time, the influence of the amount of calculation and information on the speed of the image super-resolution is considered. The size of the patch is selected to be 33 × 33, and the size of the sub image is properly increased. Experimental results show that the increase of input block size improves training speed and shortens training time.

2.2.2. Pooled Layer

Dong et al. proposed to apply convolution neural network model to image super-resolution processing, there is no pool layer in the model. In addition to improving the size and number of convolution kernels, we also introduce pooled layers [12] . After introducing the pool layer into the first, second volume layer, the introduction of the pool layer can reduce the output vector, reduce the dimension and speed up the training. The phenomenon of over fitting can be avoided after adding the pool layer. After introducing the pool layer, the network depth reaches five layers, and the network structure with deeper layers is more conducive to the learning of image data. At the same time, the number of parameters can be reduced by pooling layer [13] [14] .

To make the pooling unit have translation invariance, that is, after a small

Table 1. The test PSNR values of different convolution kernel numbers.

translation, the image still produces the same characteristics as before. We can choose the continuous range in the image as the pool area and only pool the features generated by the same hidden unit, so the number of the pool layer input feature graph will not change, but the size of the feature graph will be reduced. This process is actually a down sampling process [15] . It can be expressed as a formula:

${X}_{j}^{k}=f\left[{\beta}_{j}^{k}\text{down}\left({x}_{j}^{k-1}+{b}_{j}^{k}\right)\right]$ (5)

2.3. Architecture of Hybrid Algorithm

The whole model of the hybrid algorithm is divided into nine layers, including four coiling layers, two pool layers, two subsampling layers (one for lower sampling, the other for upper sampling), and the other for differential stratification [13] [16] . The detailed description of the specific functions of each layer is as follows:

1) The first five layers are the framework of the super-resolution algorithm of the convolution neural network (SRCNN model), which are mainly implemented in the five layers: the extraction and expression of the patch, the nonlinear mapping and reconstruction.

2) Down-sampling layer. This operation down-samples the image derived from the third layer. As a result, a LR version of reconstruction image is obtained.

3) Difference layer. This operation calculates the difference between down-sampling version of the original HR image and the corresponding counterpart we acquired above. The difference is treated as reconstruct-ion simulation error, and it also can be considered as a prior guidance that has been introduced

4) Up-sampling layer. This operation up-samples the simulation error to generate the simulation error of HR version.

5) Update layer. This operation performs a convolution with the above simulation error, and then the final HR image is updated based on the synthesis of the third layer’s result with the convolution version of simulation error.

3. Experimental Results

Considering the test and contrast, the image degradation model we adopted is to generate low resolution images by sampling the original high-resolution images after Gauss blur. The initial high resolution image is input into the convolution neural network model of off-line training, and the final high resolution image is generated by the processing of network learning and optimization after the processing of the model parameters [15] . In order to ensure the objectivity and rationality of the comparison, the learning results of the network model with the same iteration number are selected for comparative test. The maximum iteration number of the composite algorithm is $5\times {10}^{5}$ due to device constraints.

We choose a Bicubic [17] method, the NE algorithm [18] , the ScSR algorithm [19] [20] and the SRCNN algorithm [21] for comparative test. We evaluate and compare the performance of our proposed models in terms of peak signal to noise ratio (PSNR) and structure similarity (SSIM). The model is tested on 500 images of BANVA dataset. Here are the visual qualities of the face hallucination results generated by our method and other competing ones in Figure 2.

Table 2 shows the comparison results in terms of PSNR and SSIM between the proposed improved method and some competing methods. The best results are high-lighted in bold. As one can see, our proposed method improves the results both in terms of PSNR and SSIM. And they all have an up-scaling factor of in our experiments. We can make some observations from the results. Obviously, the proposed modified net-work has already achieved better results

Figure 2. Results of images with different algorithms.

Table 2. The result of PSNR (dB) and SSIM on test images using different methods.

than the competing methods, which validates that introducing image prior for face hallucination works well.

4. Conclusion

In this paper, by analyzing the training process of convolutional neural networks, we have made a series of improvements to the image super-resolution algorithm based on convolutional neural networks. Compared with the traditional algorithm, the results show that the improved algorithm has better reconstruction effect, higher edge sharpness and clearer picture. Improved convolutional neural network algorithm can achieve better results with less iteration and significantly reduce training time.

References

[1] Baker, S. and Kanade, T. (2000) Hallucinating Faces. IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, 28-30 March 2000, 83-88.

https://doi.org/10.1109/AFGR.2000.840616

[2] Freeman, W.T. and Pasztor, E.C. (1999) Learning Low-Level Vision. International Conference on Computer Vision, Kerkyra, 20-27 September 1999, 1182-1189.

https://doi.org/10.1109/ICCV.1999.790414

[3] Dong, C., et al. (2014) Learning a Deep Convolutional Network for Image Super-Resolution. European Conference on Computer Vision, 184-199.

[4] Irani, M. and Peleg, S. (1990) Super Resolution from Image Sequences. 10th International Conference on Pattern Recognition, Atlantic City, 16-21 June 1990.

https://doi.org/10.1109/ICPR.1990.119340

[5] Plank, J., Beck, M. and Elwasif, W. (1999) IBP: The Internet Backplane Protocol, 93, 679-688.

[6] Lei, J., et al. (2018) An Image Guided Algorithm for Range Map Super-Resolution. Society of Photo-Optical Instrumentation Engineers, 43.

https://doi.org/10.1117/12.2288074

[7] Dong, C., et al. (2016) Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 295-307.

https://doi.org/10.1109/TPAMI.2015.2439281

[8] Zhou, S.K., Chellappa, R. and Moghaddam, B. (2004) Intra-Personal Kernel Space for Face Recognition. IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, 19 May 2004, 235-240.

https://doi.org/10.1109/AFGR.2004.1301537

[9] Yan, W.U. (2009) One Improved Super-Reconstruction Algorithm Based on IBP Theory. Infrared, 12, 11-15.

[10] Gao, H., Zeng, J. and Zhao, Y. (2016) Super-Resolution Reconstruction Algorithm Based on Adaptive Convolution Kernel Size Selection. Applications of Digital Image Processing XXXIX, Article ID: 997120.

[11] Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2012) ImageNet Classification with Deep Convolutional Neural Networks. International Conference on Neural Information Processing Systems, 1097-1105.

[12] Wang, N., et al. (2014) A Comprehensive Survey to Face Hallucination. International Journal of Computer Vision, 106, 9-30.

https://doi.org/10.1007/s11263-013-0645-9

[13] Grm, K., et al. (2018) Face Hallucination Using Cascaded Super-Resolution and Identity Priors.

[14] Jiang, J., et al. (2018) Deep CNN Denoiser and Multi-Layer Neighbor Component Embedding for Face Hallucination. IJCAI.

https://doi.org/10.24963/ijcai.2018/107

[15] Shi, J., et al. (2018) Hallucinating Face Image by Regularization Models in High-Resolution Feature Space. IEEE Transactions on Image Processing, 27, 2980-2995.

https://doi.org/10.1109/TIP.2018.2813163

[16] Ma, X., Zhang, J. and Qi, C. (2010) Hallucinating Face by Position-Patch. Pattern Recognition, 43, 2224-2236.

https://doi.org/10.1016/j.patcog.2009.12.019

[17] Ruangsang, W. and Aramvith, S. (2017) Efficient Super-Resolution Algorithm Using Overlapping Bicubic Interpolation. 6th Global Conference on Consumer Electronics, Nagoya, 24-27 October 2017, 1-2.

[18] Cao, Q.I., et al. (2016) Super-Resolution Algorithm of Infrared Video Image Based on Sparse Representation.

[19] Yang, J., et al. (2010) Image Super-Resolution via Sparse Representation. IEEE Transactions on Image Processing, 19, 2861-2873.

https://doi.org/10.1109/TIP.2010.2050625

[20] Wang, Y., et al. (2018) Improved Algorithm of Image Super Resolution Based on Residual Neural Network. Journal of Computer Applications.

[21] Wang, Y., et al. (2016) End-to-End Image Super-Resolution via Deep and Shallow Convolutional Networks.