China is the country that sells the most industrial robots, and it has the largest industrial-robot market in the world. As revealed in the Investigation and Analysis of the Current Situation of the Industrial Robot Industry in China from 2020 to 2026 and its Development Trend Forecast Report , the sales of industrial robots in China have continued to rise since 2002, increasing China’s share of the global market year by year. In 2018, the sales volume of industrial robots in China reached 140,000 units, accounting for 33% of the world’s sales volume; the number of total robots in use was 575,000, accounting for 23% of the world; and the ontology market sales reached 26 billion Yuan, and the scale of the integrated market exceeded 100 billion Yuan, ranking first in the world. The local enterprises in China are becoming more and more competitive. The domestic industrial robot manufactures in China have sprung up, such as Siasun Robotics and Automation and Midea Group. In 2016, Media acquired the KUKA Robotics Corporation, one of the top four industrial robot manufacturers. China has vast market demand for industrial robots thanks to its complete industrial chain, which involves 39 industry categories and 110 sub-categories in the national economy . Among all categories, the automobile manufacturing industry is currently the field that applies industrial robots most widely and has accomplished a complete industrial chain. However, the safety incidences caused by robots have attracted more and more social attention globally since 1978, in which the first death due to the sudden failure of robots happened in Hiroshima, Japan . The safety issue of industrial robots has also become a critical topic in China, especially considering its largest sales, sales volume, and industrial robots possession in the world.
Traditional fault detection relies on a series of methods derived from signal processing, including expert systems, artificial neural networks, fuzzy mathematics, and fault tree analysis. The data quality for those approaches is constrained by the sensor embedded in the signal acquisition card. In a real-world environment, the sensor will be affected by external factors such as heat, humidity, and electromagnetic radiation. Consequently, sensor failure is one of the main reasons for system abnormalities in industrial robots . The usage of industrial robots in automobile manufacturing includes welding, spraying, handling, and other special usages. Each type of robot has many models produced by manufacturers from Japan, Germany, the United States, and China. The communication protocols of robots are inevitably sophisticated, and they have not reached a single uniform standard. Besides, as the electromagnetic interference shielding of the internal environment is often not considered, thermal radiation and other effects can lead to inaccurate measurement with the monitoring system embedded in the execution system. Apart from signal interference, aging of devices, metal fatigue, and the lack of maintenance are also essential factors that lead to robots’ failure   .
To avoid the problems of inconsistent communication protocols and sensor interference, we propose a vision-based fault detection method for industrial robots in this paper. The proposed method does not rely on a specific underlying communication protocol adopted by the robot, and it performs fault diagnosis in a non-contact manner. According to the periodic motion pattern, the motion mode of the industrial robots can be extracted, which is used to estimate the robot’s current fault state. Our method can be used in the real-time analysis of industrial robots’ operation video. Image segmentation is used to extract the area of interest that contains the industrial robot in each video frame. The image hashing method   is applied to encode the robot’s action and posture. Each code corresponds to a unique running state of the robot, and the periodic pose coding sequence serves as the normal robot motion mode. When the robot’s action mode in a video frame deviates from the known normal motion mode, it will trigger an abnormal action warning  .
Industrial robots are an important production tool for intelligent manufacturing and advanced industry. China is a big manufacturing country globally, which has deployed a large number of industrial robots in the manufacturing industry. Unfortunately, as some factories fail to pay close attention to the maintenance, industrial robots may accidentally cause injury to the surrounding personnel under high intensity, high workload, and long-time operation, leading to irreparable losses. The traditional fault diagnosis method  is mainly based on sensors and signal processing, combined with mathematical analysis for robot fault diagnosis. Such a method is expensive and sensitive to sensor readings, resulting in the following problems:
1) The sensor data can be disturbed by the external environment. Dust, water vapor, electromagnetic radiation, aging, and shell damage can cause sensor misreading and even failure.
2) The communication protocols of different types of robots are complex and inconsistent. The family of industrial robots is huge, with various forms and brands. As each manufacturer has developed its own communication protocol, it is challenging to design a general detection method that applies to the majority of types of industrial robots.
3) As the monitoring system is embedded in the execution system, the two systems often affect each other, resulting in the oversize and hybridity as a whole and the interference between different systems.
4) A large number of sensors are all over the industrial robot motors, arms, end effectors, and other locations, which leads to the high cost of robot repairing and maintenance.
On the contrary, data-driven and vision-based approaches leverage the advance in image segmentation. Convolutional neural networks represented by AlexNet  have been winning the ImageNet challenges starting from 2012. Since then, computer vision has made significant progress. In recent years, mask-RCNN  has reached a milestone in the field of semantic segmentation, achieving real-time semantic segmentation of objects at the pixel level. There are various types of task-oriented industrial robots in practice, including welding and transporting, and palletizing. The size and shape of robots also differ significantly as their use cases vary. Due to various interference and environmental noise in workshops, collecting sufficient reliable data samples for deep learning methods remain a challenge.
Image data analysis requires extracting the key part of the information and reducing the dimension of data to facilitate storage and calculation in practice. Image hashing provides a solution to such requirements. Local sensitive hash  classifies data according to a hash function and stores similar data in the same hash bucket. Spectral hash  regards the encoded data as a graph cut and reduces the data dimension after analyzing the high-dimensional data with spectrum analysis. Anchor graph hash  replaces the original adjacency matrix with the approximate adjacency matrix generated by the nearest neighbor graph between the centroid of data clusters and each sample data. To simplify the calculation, supervised discrete hash  maps hash codes to the labels, thus avoid calculating the similarity matrix.
In this paper, we propose a new vision-based fault detection method for industrial robots. To extract the hash sequence of the normal motion pattern, we apply image segmentation, image hashing, sequence pattern analysis to the scene in which industrial robots periodically operate. We can then detect fault behaviors by comparing the hash value of the current motion with that of the normal motion pattern.
3. Fault Detection for Industrial Robots
The proposed fault detection framework for industrial robots is shown in Figure 1. In the system, a single frame of the video stream serves as an input. If the robot operates normally on the current frame, the system will continue detecting for the next frame. If the robot becomes abnormal, an emergency stop command will be sent to the robot console. When monitoring robots in an industrial environment, the single frames of robot action images are obtained by slicing the monitoring video. The image is segmented in real-time to extract the region of interest that contains the industrial robot. The hash value of the current frame is then calculated from the extracted region of interest. All the calculated hash values are stored in the hash sequence library. In this library, the robot motion modes are obtained by pattern extraction. The hash sequence of action modes is used as the evaluation metric to characterize the robot’s current motion posture. In terms of the matching module, we apply the sequential matching method. To account for the environmental noise, we further apply the approximation matching in reliable confidence intervals. If the matching module’s output is normal, the same calculation continues for the next frame; if the output is abnormal, the robot’s emergency stop control program will be triggered immediately to the console for emergency braking.
3.1. Image Segmentation
RGB, CMY, HSV, and HSI are the most commonly used in practice among all color spaces. RGB defines the colors based on the principle of object luminescence and human eye recognition. It is a hardware-oriented color model because it fails to separate brightness and saturation. In contrast, CMY is based on object reflection and is mainly used in the printing industry (e.g., the four-color CMYK ink cartridge in a printer). HSV and HSI are collectively referred to as the HSX model, which is convenient for digital color processing. As for the channels, H stands for hue, S for saturation, and X can be either I for intensity or V for value.
Thanks to their surface painting, industrial robots can distinguish themselves from the surrounding working environment with distinctive colors in general. As one of the final steps at production, industrial robots’ surface is sprayed with the integrated surface coating. Under the HSV color model, one can segment the robot regions via band filtering. Let us consider blue industrial robots as an example. By looking up the RGB to HSV conversion table, we can observe that the blue color falls into the following ranges under the HSV model: 100 - 124 for hue, 43 - 255 for saturation, and 46 - 255 for value. We can further narrow the bandwidth with prior knowledge on the color range, which leads to 100 - 110 for hue, 103 - 255 for saturation, and 46 - 255 for value. As illustrated in Figure 2, the segmentation results are the least sensitive to saturation. Contrarily, the robot region cannot be fully extracted when the hue and value are either too high or too low. With the ranges specified above, the proposed model can achieve precise segmentation of industrial robots and effectively avoid various external environmental disturbances.
Figure 1. Fault detection framework of industrial robots.
Figure 2. Adjusting HSV to segment blue industrial robot accurately.
3.2. Image Hashing
The image hash algorithm calculates a hash value of the industrial robot’s action in the proposed method. As a metric for image similarity in the engineering field, image hashing is widely used in image searching/retrieval. The hash-sequence library stores the image hash values of the robot in chronological order. On the one hand, the subsequent calculation can be effectively reduced as we convert 2-dimensional images to 1-dimensional hash sequences, reducing the complexity for both computation and searching. On the other hand, the target data for processing is effectively compressed as the hash values become fingerprints for the original images. The robot’s postures are encoded in the hash values, and the Hamming distances between them reflect the degrees of difference. For applications that require low latency at the cost of low image resolution, we adopt the average hashing algorithm, which first down-samples the input images to 8 × 8 pixel blocks and then binarizes each pixel by checking whether the intensity is larger than the average intensity of the block (i.e., 1 for true and 0 for false). Finally, the algorithm applies a hash function on the flattened 64-bit array. The advantages of the image hashing module are therefore twofold: it compresses the input data and provides an identifier for robot postures. Considering the fault tolerance of image hashing, in this method, we assume that the robot is in normal motion as long as confidence intervals are approximately matched.
3.3. Operation Mode Extraction of Industrial Robots
In the hash sequence library, the hash values are collected and stored in chronological order. The algorithm’s input is a hash sequence , where are the hash values, and is the total number of hash values of the sequence. For a sufficiently long period, the hash sequence library contains a series of complete robot motion periods. The algorithm’s output is the motion mode of the industrial robot, denoted by a hash sequence 3: , where H represents the hash value and is the number of all hash values of the sequence satisfying . The algorithm divides the original hash sequence at a random split index, obtaining two sub-sequences and , and iteratively computes the common prefix subsequence set. Eventually, the best common prefix subsequence among all divisions is considered the output for input hash sequence 3.
3.4. Abnormal Detection of Industrial Robots
As for the final sequential extreme approximation matching stage, template-matching algorithms such as brute-force (BF) and Knuth-Morris-Pratt (KMP) are often adopted. BF algorithm matches the target string and template string character by character, and the KMP algorithm optimizes it with backtracking. Similar to the letters in string matching, each hash value in our proposed method can be structured as the unit character. Unlike reverse order search and fuzzy matching used in some database systems, our matching module matches hash values sequentially according to their timestamps. To enhance the system robustness against noise interference, we treat two hash values as approximately equal if their Hamming distance is less than a small threshold (e.g., 5).
4.1. Experiment Setup
The industrial robots’ operation videos are simulated by the Industrial Robot Center of the Wuhan Institute of Technology (WIT). When simulating the operation, we add a scene containing the robot’s abnormal state after ten or more consecutive cycles of normal periodic movements. Once a method detects the abnormal robot action, it should trigger the alarm immediately. The experiments are conducted on a desktop with Intel Core i7-8700KF CPU, 32 g memory, and Nvidia 1080 Ti GPU. Supportive software environment and libraries include Python 3.7 with PyCharm, OpenCV-Python, and NumPy.
4.2. Experiment Design
One assumption for the experiment design is that industrial robots operate in pre-defined periodic motions under normal conditions. Thus, the target task is to distinguish the normal and abnormal motion states of industrial robots. The patterns of normal motions can be extracted from the periodic fluctuated signals according to historical motion information. By contrast, the abnormal motions do not follow such a motion pattern. To segment the robot’s actual motion region accurately, we specify the color range under the HSV color model. To facilitate pose detection, we apply the image hashing method to encode an industrial robot’s motion state. The hashing method converts continuous video frames to consecutive hash sequences. We then extract the suffix array from ten or longer periodic hash sequences, which serves as the hash-sequence template characterizing industrial robots’ normal motion. At test time, the method calculates a hash value for the current frame and compares it with the normal-motion template. If the current frame’s hash value matches the template hash sequence within the confidence interval, the robot is considered normal; otherwise, an alarm is triggered.
4.3. Analysis of Experiment Results
The video data set is provided by the robot center of Wuhan Institute of technology. The video data set contains the simulation video of the industrial robot pressing the screw on a working table. We repeat the video ten times to further simulate the normal periodic movements of surface welding. Following the repeated normal movements, we append an abnormal movement of breaking away from the squeezing screw, which is obtained by reversing the original video clip. In the image segmentation, HSV color space is used to extract the industrial robot with a blue hue (H) range of 100 - 124, saturation (S) range of 103 - 255, and lightness (V) range of 46 - 255. The average image hash (ahash) statistics of the segmented images are as follows. The original image frame of video elements, the hash distribution of robot area frame after image segmentation, and the hash value of the robot posture is shown in Figure 3:
The average processing speed of hash value with the same interval is 0.073 s per frame from the experimental analysis, and the average processing speed of hash value with different intervals is 0.164 s per frame. The accuracy and the reproductivity rate of the system is 99.02% and 98.58%, respectively. The performance of our proposed method meets the requirements for deployment in
Figure 3. Simulation of normal and abnormal states of the industrial robot, as well as the distribution of hash values after image segmentation.
practice. As suggested by the simulation, industrial robots’ normal operations are periodic, and abnormal movements become outliers compared to the normal-movement distribution. According to the result shown in Figure 3, color filtering can segment target robots from the surrounding environment, reducing the impact of noise. The bottom-right corner of Figure 3 illustrates the results from average hashing, where the horizontal axis denotes the frame indices, and the vertical axis means the hash values. From frame 49 to frame 310, the robot is in normal movements. When it comes to frames 310 to 330, the hash values appear to be outliers compared to normal ones. Therefore, the method detects the robot’s abnormal movement and triggers the emergency stop program.
In conclusion, the simulation experiment demonstrates that the proposed method can accurately detect industrial robots’ abnormal actions from their normal periodic movements. A motion template is built by extracting the motion pattern from normal actions. By comparing the current frame’s hash value with that of the motion template, one can tell the robot’s current motion condition. In terms of image segmentation, the HSV color space is suitable for filtering and capturing the blue region containing target robots. However, the current image segmentation module is not yet perfect because false positives and negatives still exist in the outputs: the blue marker for measuring 3D coordinates is also included, whereas the welding guns and their bases, together with the shaded areas, are sometimes missing. Our image segmentation module currently adopts the traditional image-analysis methodology and could become less robust under complex working environments that contain more interference. In future work, we will incorporate deep-learning approaches to push the envelope on segmentation accuracy further.
 Dong, K. (2019) 2020-2026 Investigation and Analysis of the Current Situation of China’s Industrial Robot Industry and Forecast of Its Development Trend. Mechanical and Electronic Industry, No. 8, 229-333.
 Krizhevsky, A., Sutskever, I. and Hinton, G. (2012) ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the 25th International Conference on Neural Information Processing Systems, Volume 1, 1097-1105.
 He, K.M., Gkioxari, G., Dollár, P. and Girshick, R. (2017) Mask R-CNN. IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 386-397.
 Indyk, P. (1998) Approximate nearest Neighbors: Towards Removing the Curse of Dimensionality. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, May 1998, 604-613. https://doi.org/10.1145/276698.276876
 Shen, F.M., Shen, C.H., Liu, W. and Shen, H.T. (2015) Supervised Discrete Hashing. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 7-12 June 2015, 37-45. https://ieeexplore.ieee.org/document/7298598