By calculating and , the output of the BRNN can be
obtained according to formula (14) , and the output of each layer of the DBRNN can also be obtained in turn.
4.5. Time Domain Backward Propagation Algorithm
Because the implementation of the RNN considers the consistency and causality characteristics of speech signals, unlike other neural networks that only need to calculate the error signals for backward propagation from the top-down in each layer, the RNN still needs to calculate the error signals for propagation based on the time domain. Therefore, the algorithm is called the time domain backward propagation algorithm (Back-Propagation Through Time, BPTT). At the time of implementation, it first sets the RNN as one-way and the single-hidden layer as the foundation, and then extends it to the BRNN of a single layer. In the end, it implements all operations of the DBRNN. Assuming that , the loss function of the round’s iteration training sample X is marked as , similar to the DNN. The error signals of the output layer and hidden layer at time t are respectively marked as:
There are two sources of error signals being propagated to the hidden layer at moment t. One is error signals of the output layer at moment t, and the other is the error signals of the hidden layer at the moment . Using the chain rule, it can obtain:
From formula (17), it can be seen that the error signals of the neural network will be propagated with the inverse time axis from moment T to moment 1. The algorithm is also named the BPTT.
Therefore, the gradient of the RNN can be obtained as:
After that, the parameters of the model can be constantly changed by the stochastic gradient descent (SGD) algorithm until they are optimal.
For the BRNN, since at each moment it has all the characteristics of bidirectional dependence, similarly, it can obtain:
At last, it can obtain the formula of the gradient computations as follows:
Then, it will update the model’s parameters. By being based on these, it can further implement the learning and training of the DBRNN.
5. Experiments and Result Analysis
5.1. Experimental Environment
The relevant experimental equipment is shown below:
Ø Hardware: 1) The core processing unit of the module adopts a Samsung S5PV210 64/32-bit processor, which is based on the CortexTM-A8 kernel of ARM, has a 1 GHZ dominant frequency, an L1 cache of 32/32 KB data/instruction, an L2 cache of 512 KB, and high performance computing power of 200 million instruction sets per second (2000 DMIPS). 2) It has an onboard speech processing module that can amplify, filter, sample, and convert with A/D or D/A and digitize the speech signal, a LINE audio input/output interface, and a microphone (MIC) input interface. 3) The onboard modules include ZigBee, RFID, WIFI, GPRS, RS232 serial port, USB interface and so on.
Ø Software: It uses the Linux operating system based on the embedded development as the developmental platform. Its kernel is small, easy to cut, very suitable for embedded systems, and well supports the CPU of the ARM architecture, and it supports a large number of external devices. The size and function of the systems can all be customized and have rich driver programs.
The main programs of deep hybrid neural networks speech recognition semantic controls are developed based on the Linux operating system and the compilation tools on the host machine. Then, it cross-compiles the programs being implemented to generate execution codes for the ARMS5PV210 processor and burns them to the developmental motherboard.
5.2. Experimental Process and Results
The implementation process of speech recognition semantics control is shown below. First, speech recognition can be divided into two parts, namely, speech training and recognition. In the process of training speech signals, input devices (for example, microphones and so on) can be used to obtain speech signals, make A/D conversions, and encode and decode digital signals. They can use the hybrid neural networks presented by us to conduct learning and training, and the training results are burned into the Flash so that achieve recognition in the subsequent speech recognition stage. Second, in the speech recognition phase, after the input speech signal is processed by the audio digital signal encoding decoder, the system notifies the embedded Linux operating system based on the ARM CortexTM-A8 and makes the match with the reference samples stored in the Flash. Thus, the best identification results are obtained, and they switch to the corresponding semantic vocabularies. Finally, it achieves corresponding I/O output controls by the system call functions of the embedded Linux operating system that is based on the semantic results. For example, it can realize the operation of turning on and turning off LED lights in intelligent furniture, other industrial equipment, and so on. The Linux system kernel controls the ARM CortexTM-A8 and calls its drivers, which should be implemented for the system call operations at least for the open, read, write, close and other system calls  . In the experiment, we also refer to the developmental boards of YueQian and the phonetic components of Hkust XunFei  . The experimental results are as follows.
To connect the power of the developmental board and the serial port line (one end to the PC, and the other end to development board), we use the software SecureCRT developed by us to download the programs to the ARM CortexTM-A8 board and conduct the cross-compilations. Voice data are obtained through recording devices, and the results are shown in Figure 6.
We use the ESP8266 tool developed by us to burn and write the hybrid neural networks and other algorithms presented by us to the storage of the ARM CortexTM-A8 board for embedded speech recognition processing. The results are shown in Figure 7.
The speech recognition semantics control system being implemented in this paper has stronger functions. It can realize the recognition of voice data from audio files and realize the recognition of voice data directly from the microphone and other input devices. The results are shown in (a) and (b) of Figure 8.
It has also realized the recognition of voice data directly from the microphone and other input devices, for example, the voice data “开灯” (Turning on light) and “关灯” (Turning off light). In the experiment, we have used six lights with ID numbers corresponding from 1 to 6 and have realized the switch operation of any light, such as No. 3 and No. 6. The results are shown in (a), (b) and (c) of Figure 9.
Based on the recognition process above, two types of circuit boards are further used to respectively realize the control of the lights. The results are shown in (a), (b) and (c) of Figure 10.
Figure 6. The process of cross-compiling and recording sounds (the speech recognition control of this paper is based on Chinese).
Figure 7. The process of the recognition algorithm programs being burned and written.
Figure 8. The recognition of voice data from audio files (the speech recognition control of this paper is based on Chinese).
Figure 9. The recognition of voice data directly from the microphone and other input devices (the speech recognition control of this paper is based on Chinese).
(a) (b) (c)
Figure 10. The control of the lights of two kinds of circuit boards respectively being realized (the speech recognition control of this paper is based on Chinese).
6. Summary and Prospect
The purpose of this paper was to assess the semantic interaction control for constructing the intelligent ecology of Internet of Things and conducting critical component research. First, we present a kind of novel intelligent deep hybrid neural network algorithm based on a deep bidirectional recurrent neural network integrated with a deep backward propagation neural network. This has realized acoustic analysis, speech recognition and natural language understanding for jointly constituting human-machine voice interaction. Second, we design a voice control motherboard using an embedded chip from the ARM series as the core, and the onboard modules include ZigBee, RFID, WIFI, GPRS, an RS232 serial port, a USB interface and others. Third, we take advantage of the algorithm, software and hardware to make machines “understand” speech of people and “think” and “comprehend” human intentions in order to structure critical components for intelligent vehicles, intelligent offices, intelligent service robots, intelligent industries and so on in order to structure intelligent ecology of the Internet of Things. At last, the experimental results denote that the study of the semantics interaction control based on an embedding has a very good effect, fast speed and high accuracy, consequently realizing the intelligent ecology construction of the Internet of Things.
After the realization of the intelligent ecological construction of the Internet of Things through semantic interaction control, we will further complete the commercialization and scale use of the promotion, which are the directions of our future efforts.
This research was funded by the National Natural Science Foundation (Grand 61171141, 61573145), the Public Research and Capacity Building of Guangdong Province (Grand 2014B010104001), the Basic and Applied Basic Research of Guangdong Province (Grand 2015A030308018), the Main Project of the Natural Science Fund of JiaYing University (Grant number 2017KJZ02) and the key research bases being jointly built by Provinces and cities for humanities and social science of regular institutions of higher learning of Guangdong province (Grant number 18KYKT11), the authors are greatly thanks to these grants.
Compliance with Ethical Standards
(In Case of Funding) Funding
This study was funded by the National Natural Science Foundation (grant number 61171141, 61573145), the Public Research and Capacity Building of Guangdong Province (grant number 2014B010104001), the Basic and Applied Basic Research of Guangdong Province (grant number 2015A030308018), the Main Project of the Natural Science Fund of JiaYing University (grant number 2017KJZ02) and the key research bases being jointly built by Provinces and cities for humanities and social science of regular institutions of higher learning of Guangdong province (grant number 18KYKT11).
Conflicts of Interest
Hai-jun Zhang declares that he has no conflict of interest. Ying-hui Chen declares that she has no conflict of interest.
If Articles Do Not Contain Studies with
Human Participants or Animals by Any of The Authors, Please Select One of The Following Statements) Ethical Approval:
This article does not contain any studies with human participants or animals performed by any of the authors.
 Xu, J., Yang, G., Yin, Y.F., Man, H. and He, H.B. (2014) Sparse-Representation-Based Classification with Structure-Preserving Dimension Reduction. Cognitive Computation, 6, 608-621.
 Zhang, S.X., Zhao, R., Liu, C., et al. (2016) Recurrent Support Vector Machines for Speech Recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 20-25 March 2016, 5885-5889.
 Zhang, H.-J. and Xiao, N.-F. (2016) Parallel Implementation of Multilayered Neural Networks Based on Map-Reduce on Cloud Computing Clusters. Soft Computing, 20, 1471-1483.
 Li, D. (2016) Industrial Technology Advances: Deep Learning from Speech Recognition to Language and Multimodal Processing. APSIPA Transactions on Signal and Information Processing, Cambridge University Press, Cambridge.
 Weng, C., Yu, D., Seltzer, M.L., et al. (2015) Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition. IEEE/ACM Transaction on Audio Speech & Language Processing, 23, 1670-1679.
 Hernández-Munoz, J., Vercher, J., Munoz, L., Galache, J., Presser, M., Gómez, L. and Pettersson, J. (2011) Smart Cities at the Forefront of the Future Internet. In: Domingue, J., Galis, A., Gavras, A., Zahariadis, T. and Lambert, D., Eds., The Future Internet, Springer-Verlag, Berlin, Heidelberg, 447-462.
 Yun, M. and Yuxin, B. (2010) Research on the Architecture and Key Technology of Internet of Things (IoT) Applied on Smart Grid. 2010 International Conference on Advances in Energy Engineering, Beijing, 19-20 June 2010, 69-72.
 Bi, Z., Xu, L. and Wang, C. (2014) Internet of Things for Enterprise Systems of Modern Manufacturing. IEEE Transactions on Industrial Informatics, 10, 1537-1546.
 Gauvain, J.-L. and Lee, C.-H. (1994) Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains. IEEE Transactions on Speech and Audio Processing, 2, 291-298.
 Leggetter, C.J. and Woodland, P.C. (1995) Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Maikov Models. Computer Speech & Language, 9, 171-185.
 Memisevic, R. and Hinton, G.E. (2010) Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines. Neural Computation, 22, 1473-1492.
 Bengo, Y., Courcille, A. and Vincent, P. (2013) Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798-1828.
 Dahl, G., Yu, D., Deng, L., et al. (2012) Context-Dependent Pretrained Deep Neural Networks for Large Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 20, 30-42.
 Hinton, G.E., Li, D., Dong, Y., et al. (2012) Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine, 29, 82-97.
 Hinton, G.E. and Sejnowski, T.E. (1986) Learning and Relearning in Boltzmann Machines. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, Vol. 1, 282-317.
 Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning Internal Representations by Error Propagation. In: Rumelhart, D.E. and McClelland, J.L., Eds., Parallel Distributed Processing, MIT Press, Cambridge, Vol. 1, 318-362.
 Jiang, X., Zhang, Y., Zhang, W., et al. (2013) A Novel Sparse Auto-Encoder for Deep Unsupervised Learning. Proceedings of the International Conference on Advanced Computational Intelligence, Hangzhou, 19-21 October 2013, 256-261.
 Ng, A., Ngiam, J., et al. (2014) UFLDL Tutorial.
 Ashton, K. (2009) That “Internet of Things” Thing. RFiD Journal, 22, 97-114.
 Sundmaeker, H., Guillemin, P., Friess, P. and Woelfflé, S. (2010) Vision and Challenges for Realising the Internet of Things, Cluster of European Research Projects on the Internet of Things—CERP IoT.
 Gluhak, A., Krco, S., Nati, M., Pfisterer, D., Mitton, N. and Razafindralambo, T. (2011) A Survey on Facilities for Experimental Internet of Things Research. IEEE Communications Magazine, 49, 58-67.
 Atzori, L., Iera, A. and Morabito, G. (2011) SIoT: Giving a Social Structure to the Internet of Things. IEEE Communications Letters, 15, 1193-1195.
 Cherrier, S., Salhi, I., Ghamri-Doudane, Y.M., Lohier, S. and Valembois, P. (2014) BeC3: Behaviour Crowd Centric Composition for IoT Applications. Mobile Networks and Applications, 19, 18-32.
 Cherrier, S., Ghamri-Doudane, Y.M., Lohier, S. and Roussel, G. (2014) Fault-Recovery and Coherence in Internet of Things Choreographies. IEEE World Forum on Internet of Things, Seoul, 6-8 March 2014, 532-537.
 Segars, S. (1998) ARM9 Family High Performance Microprocessors for Embedded Applications. Proceedings International Conference on Computer Design. VLSI in Computers and Processors, Austin, 5-7 October 1998, 230-235.
 You, Y., Qian, Y., He, T., et al. (2015) An Investigation on DNN-Derived Bottleneck Features for GMM-HMM Based Robust Speech Recognition. Proceedings of IEEE China Summit and International Conference on Signal and Information Processing, Chengdu, 12-15 July 2015, 30-34.
 Qian, Y., He, T., Deng, W., et al. (2015) Automatic Model Redundancy Reduction for Fast Back-propagation for Deep Neural Networks in Speech Recognition. Proceedings of International Joint Conference on Neural Networks, Killarney, 12-17 July 2015, 1-6.
 Huang, J.T., Li, J. and Gong, Y. (2015) An Analysis of Convolutional Neural Networks for Speech Recognition. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, 19-24 April 2015, 4989-4993.
 iFLYTEK (2017).