In recent years, the problem of frequent cases of missing children has becoming widespread concerns in China. So far, there is no public statistics on the number of missing children in China. A reference data is that, according to the statistics of the public security administration of Shanghai Public Security Bureau, the public security departments at all levels accepted 9627 missing persons registered in 2001, while the total number of disappearances was only 4526 times in 1995, up 2.13 times. The first and two phases of the emergency release platform for children’s missing information in the Ministry of Public Security were run in May and November, 2016. With the assistance of the new media and mobile terminal applications, the public security organs can immediately push the information of the missing children to the people nearby. So that more people could obtain accurate information and timely provide clues to help public security organs crack down on trafficking cases as soon as possible and find the missing children  . A series of problems have not been solved, such as the low rate of searching for missing children, missing the best time to find missing children and enthusiastic people unable to implement the rescue  . Now, we can often see systems for finding missing children, such as the China’s Child Safety Emergency Response (CCSER)  , Rui Jie Xun Zi  and Bao Bei Hui Jia  . Among them, the CCSER can form an early-warning protection ring with different radius based on the location information of children’s disappearance within 3 hours. Rui Jie Xun Zi provides clues for parents through biometric technology, comparing uploaded photos and missing children’s repositories. Bao Bei Hui Jia is a volunteer association’s public service website, which aims to provide a platform for information communication between parents of missing children and volunteers. In the United States, AMBER Alert  is widely used in 50 states in the United States and covers 18 countries worldwide. It uses the US emergency alert system to disseminate alarm messages widely through commercial radio stations, satellite radio stations, television stations, and electronic mail, electronic street signs, and text messages. At the same time, the Federal Bureau of Investigation of American law enforcement agencies, set up a “National Center for missing & exploited children”  , and opened a variety of language all-weather hotline to rescue the abducted children in the shortest possible time. However, most of the existing missing children search systems belong to the “network search” model  , and they can’t fully mobilize social resources to participate in the search for missing children at home and abroad. Considering the public participation is low and information dissemination speed is slow in the first time that the parents found the child missing, the Missing Children Mobile GIS Mutual Assistance System of China has emerged as the times requirement  .
With the popularity of smart mobile devices (mobile phone, IPAD, etc.) as well as the rapid development of mobile Internet technology  , more and more missing children’s information is uploaded to database of the MCMAS. It is means that there is a huge amount of missing children’s data with location information. We call this kind of explosive growth of new data for missing children spatially big data. These data have the characteristics of large volume, heterogeneity and uncertainty. It is important that the data is closely related to the disappearance of missing children. Extracting interesting and useful patterns from spatial datasets is more difficult than extracting the corresponding pattern from traditional numeric and categorical data due to the complexity of spatial data types, spatial relationships and spatial autocorrelation. There is a rampant growth of spatial data and a number of needs arise as spatial data mining techniques, modeling semantic rich spatial properties such as topology, statistical interpretation models for spatial pattern, improving computational efficiency and model, preprocessing spatial data and many others. There are many techniques like classification, decision tree, fuzzy logic, neural networks applied for mining spatial data  .
Traditional data analysis is generally based on statistical analysis. Common statistical analysis methods include analysis of variance  and regression analysis  . Analysis of variance is a statistical method to test whether all the population mean is equal. It can be seen as an extension of t test. It studies the influence of variable types on dependent variables, such as whether they are related or not. The method is used to determine whether the independent variable has a significant effect on the dependent variable by examining whether the mean of each data set is equal.
In the statistical analysis of the data, there are a large number of variables changes with another variable situation, the corresponding change causal often cannot use mathematical formulas to describe, only through the statistical work to a large number of observation data to find the relationship and rules between them. Regression analysis is the usual method for solving this problem.
By reducing the influence of outliers, the observation data are analyzed, calculated and summarized in a quantitative perspective  . Most of the recent work on spatial data has used various clustering techniques due to the nature of the data. Clustering is the partitioning of a given set of data into several disjoint subsets in a multidimensional space. The process does not depend on the prior information about the class attributes of the data, and considers only the characteristic attributes of the data itself as well as the correlation (measurement or similarity) between the data  . The main task of clustering algorithm is to group the data set according to the principle of “intra class similarity” and “class dissimilarity”. The concise and intuitive method is that each class of data is sufficiently close to one of the representative points or core points of the class, and at the same time away from the representative points or core points of other classes, which are generally referred to as clustering centers. Typical algorithms include average link  , K-means algorithm  , and K-medoids algorithm  . The average link and the K-means algorithm are all represented by the mean value. In contrast, the K-means algorithm has lower computational complexity. Since the average value is easily affected by outliers or noises, the K-medoids algorithm selects the smallest data from each class as the clustering center. This improves the robustness of noise points and outliers, but also increases computational complexity and is not suitable for large-scale data. Kaufman and Ng introduced random sampling techniques, and successively proposed Clara  and Clarans algorithm  , which overcome the complexity of the K-medoids algorithm, and make it suitable for large-scale data.
As a result, we can see that most of the analysis results are monotonic, abstract, and non-visual data reports. The limitation is that it can only show the statistical results in a certain range, but lack the dynamic analysis of the regionally spatial information. Especially, the result is poor granularity, low efficiency when faces of large amounts of data. Data analysts can only rely on personal experience and memory to infer. It is not conducive to the development of missing children big data exploration and knowledge discovery. Visual analysis is an analysis method for large scale, dynamic, fuzzy and uncertain data. Its advantage is that the abstract data can be transformed into vivid visual images, which can fully mobilize the subjective ability of people in the aspects of perception, cognition, insight, judgment, summary, decision-making and so on  . In this paper, we will use the spatial data mining technology, visualization technology and GIS technology to explore the geographical features of the missing children and its influencing factors by analyzing the spatial and temporal information in the form of the thematic map.
This paper is structured as follows: Section 1 describes the background of the Missing Children Mobile GIS Mutual Assistance System of China and its main functions. Section 2 gives the main idea of the fabrication of thermodynamic diagram and shows the results of the distribution of missing children in China and province regions. And then, we used method of variance analysis  and regression analysis  to explore the relationships among of population density and economic status of missing children. Finally, the conclusion of our study and the future outlook of our works and MCMAS are summarized in Section 3.
2. The Missing Children Mobile GIS Mutual Assistance System of China
2.1. System Introduction
The Missing Children Mobile GIS Mutual Assistance System of China (MCMAS) is an intelligent and efficient visualization platform for integration and sharing of missing children’s information and is launched by the team of teachers and students from Beijing University of Civil Engineering and Architecture in the year 2015. Based on analysis of actual problems during parents looking for missing children, we improve the single-mode-search to the public spatial-location-based multi-mode-search  . In contrast to previous approaches, we have increased public participation in helping families of missing children search for lost children. In order to achieve the spatially automatic tracking and efficient message feedback mechanism, the MCMAS mainly uses the technologies of face recognition, message accurately pushing and mobile internet time sequencing of map label, which greatly reduces the seeking time of missing children and improving the successful rescue rate  .
2.2. System Function
The MCMAS provides an intelligent and efficient information sharing and publishing platform for missing children’s relatives and society. Integration of spatial data as measures enables aggregation of geometric properties, therefore providing a better global understanding of georeferenced phenomena  . Its database shows the aggregation of missing children at different locations in different time periods, which reflects the spatial and temporal characteristics of missing children. Where is the missing place with high incidence in the city? What is the rule of the spatial distribution? Such questions are a matter of social concern that researchers have been paying close attention to for a long time and are not easy to figure out. However, the MCMAS is able to provide a reliable data support for the study of such problems. In this paper, we try to use the GIS visual analysis method  as an analytical tool to perform the big data mining and analysis of missing children.
3. Methods and Experiment
3.1. Experiment Data
The provinces and cities in China are the research areas, especially focus on the missing children high-density gathering area. The basic data in Table 1 used include: 1) The missing children data obtained from database during May 25, 2016 to May 25, 2017 and the amount of data reached more than a thousand. Data is characterized by fine granularity to individuals, covering the scale of the country, to meet the requirements of large model ideas for data, and spatial attribute data is accurate coordinates, which is directly reflect the analysis of the spatial distribution of missing children. 2) The 6th national population census data of China (2011 year). 3) Gross Domestic Product (GDP) data from the National Bureau of Statistics in 2016 of China. The missing child data will be preprocessed based on ArcGIS platform. The number of missing children is converted into color values, divided into different levels of density, and combined with the street block identified in the city.
Table 1. Data types and sources.
aThe Missing Children Mobile GIS Mutual Assistance System of China; bGross Domestic Product.
3.2. Experiment Methods
Due to the strong spatial dispersion characteristics of the missing children’s data, the data processing is performed by the clustering algorithm, the intensity and color are calculated, and then the spatial thermodynamic diagram of the missing children’s data is generated for spatial analysis results.
3.2.1. Data Clustering
Since the missing child data contains geographical coordinates, geographic coordinate information can be used for geographical clustering. In typical algorithms, due to the fact that missing children have the same size of data clusters and not too many clusters, they can only be used as general purpose clustering algorithms in plane geometry  . These characteristics conform to the conditions of K-means algorithm. In order to improve the efficiency and precision of clustering, K-means algorithm    is selected to optimize the squared error criterion by iterative optimization process, and finally to divide the data set. The process is shown in Figure 1, where yellow represents input data or intermediate data, blue represents the specific processing process, red represents judgment processing, and green represents output results, and the specific process is:
1) K entities are selected randomly, and each entity is considered as the centroid of a cluster.
Figure 1. Procedure of cluster of missing children data.
2) According to the nearest distance principle, the remaining entities are assigned to the nearest centroid entities, and the centroid of each cluster is recalculated.
3) The steps (2) are repeated until the square error criterion converges, and the clustering process is completed. Among them, the squared error criterion is defined as the Equation (1):
where E denotes the square error criterion, p denotes the space entity, and mi denotes the centroid of Ci.
3.2.2. Calculate the Intensity of the Particle
Intensity refers to the depth of the hot spot, the depth of the hot color directly to show the characteristics of the data changes. The intensity equation is presented as the Equation (2):
where I represents the transparency of the particle, Z represents the temperature to be represented eigenvalue in the data set, Zmin and Zmax represent the minimum and maximum values of the eigenvalue respectively.
3.2.3. Generate Thermodynamic Diagram
The coordinates of each missing child as the center point, and according to the gradient value to draw a transparent gradient of the circle. The center point is the darkest, and the color from the edge of the circle to the center is gradually faded until the gradient of all points is drawn. And then it draws each pixel of the transparency image through the color matrix to complete the mapping, generate thermodynamic diagram.
3.2.4. Analysis of Thermodynamic Diagram
The spatial thermodynamic diagram is taken as the main visualization method to illustrate the missing children analysis results. The thermodynamic diagram with the color and brightness of the different reflect the differences in data space, which is closer to the color red indicates that the data density is relatively higher, the closer to the green data density is lower and the continuous change from red to green. The thermodynamic diagram is divided into 10 levels, which means that the number of missing children in different density value, and the highest regional value is 10 and the lowest area value is 1. After preprocessing, the missing children gathering area with different density level was obtained in the whole country.
It was found that the high missing density area (especially the density value more than five) showed a concentric circle distribution pattern in Figure 2, that is, the number density of missing children decreased gradually from the center to the periphery. Through the observation of the overall distribution of missing
Figure 2. Thermodynamic graph of missing children of China (2016.5-2017.5).
children, found in China and GDP, the population distribution is similar, considering the specific requirements of the spatial and temporal characteristics of missing children’s data, this paper takes the high-density gathering area of missing children with the number density level greater than 5 as the key analysis area to explore the spatial characteristics of missing children. After we screened out high density area, we carry out correlation further between the corresponding population density and economic status in high density zone provinces through the ArcGIS platform, and explore the impact of population density and economic status of missing children.
4. Experiment Results and Discussions
4.1. According to the Analysis of the Region
This study focused on the China region, in comparison with other regions of the world situation, the domestic provinces as reference, the missing children in 2016 shown that the distribution of heat in high density areas of missing children mostly concentrated in the East, and radially to the central gradually increased, e.g. Figure 2. In the provinces and municipalities of China, the main high-density gathering area has twelve (see Figure 3). The four areas with the highest proportion of missing children are Beijing, Guangdong, Henan and Shanghai, respectively, and the corresponding missing children are 19.7%, 11%, 11%, and 10%, e.g. Figures 4-7. According to the results, the missing children in the regional space to show a certain law, which the eastern part of the large cities show a high concentration of high-density, low density in the western provinces or zero.
In Figure 4, we can see that the missing children are mainly distributed in the six districts of Beijing: the number of Chaoyang is the most serious, followed by Fengtai, Haidian. Relatively few cases of missing children in suburban areas, suburban counties almost zero. Chaoyang is one of the biggest areas which in the near suburbs of Beijing and its population of permanent resident is 3.545 million; therefore, the population base is big. Moreover, Chaoyang is the important foreign affairs activity area of Beijing and its foreign communication activity is frequent. The above potential factors are likely to have an impact on children abduction. Fengtai district is one of the downtown of Beijing and it has a highly modern railway passenger station of China, Beijing south railway station. Beijing south railway station’s passenger traffic is higher, the area and receiving trains is larger in China. Every day, the population of passengers who come and leave to Beijing is great; the stream of people is so big that the personnel are more intricate, which makes the children lost cases easily to happen. There are many universities and schools in Haidian district. Haidian is the famous scientific, cultural and educational area of the capital. So the number of teenagers is greater, and the children disappeared incidents are easy to occur.
Figure 3. Regional distribution of missing children of China (2016.5-2017.5).
Figure 4. Beijing thermodynamic graph of missing children (2016.5-2017.5).
In Figure 5, we can see that the missing children are mainly distributed in Guangdong, Guangzhou, Dongguan and Shenzhen. The number of missing children in the rest of the area is small or zero. Guangzhou is the capital city of Guangdong province, megacity and national central city in China. Its economy is developed, and the population is great. Guangzhou’s large number of employment opportunities, higher labor remuneration and so on are attracting so
Figure 5. Guangdong thermodynamic graph of missing children (2016.5-2017.5).
many migrant workers come to there that the waves of population are large. The above elements are more likely to trouble children’s safety. Dongguan city is an important transportation hub and foreign trade port in Guangdong. It admits tourists from all over the world annually, and provides good employment opportunities for external personnel. Hence, the population distribution is intensive, the personnel are complex, and the children abduction is serious. Shenzhen is a national economic center, international city, and the first China’s reform and opening up the first special economic zone. The city boundary has the largest port of entry and exit in China. Shenzhen’s long history, rich culture, tourism resources, and well-developed economic attract numbers of friends from all over the country even around the world, which is the potential effect of children lost cases.
In Figure 6, we can see that the situation of missing children in Henan, Zhengzhou and Nanyang are more serious, the rest of the situation is more optimistic. Zhengzhou is the capital of Henan province, which is important city in
Figure 6. Henan thermodynamic graph of missing children (2016.5-2017.5).
the central area of China. The population of Zhengzhou is 9.569 million, therefore the population base and the population density is great. It has a relatively good traffic and education resources. However, the above factors provide sufficient conditions for the disappearance of children. Nanyang is the second-rate central city, the politics, economy, culture, science and education, transportation, financial and trade center of southwest of Henan province. Its urban size is bigger, population is 10.057 million, population base is larger, and population mobility is active. The children lost case is easy to happen.
In Figure 7, we can see that the case of missing children in Shanghai, the worst is Pudong New Area. The economic is developed, the transportation is convenient, the foreign and the resident population is big, and the population density is bigger, so missing children cases are more likely to happen. Pudong new area is Shanghai’s second largest administrative region. One of the symbolic cultural landscapes in Shanghai, the Oriental pearl radio and television tower is located in there, which attract visitors all over the world throughout the year.
Figure 7. Shanghai thermodynamic graph of missing children (2016.5-2017.5).
Due to large area flow, the more densely populated, the case of children lost is easy to take place.
4.2. According to the Analysis of the Economic Status
In order to explore the correlation between the economic status and the proportion of people missing, Figure 8 shows that the proportion of missing persons and the economic status of a positive correlation by the correlation analysis.
Figure 8. The correlation between the proportion of missing children and GDP in China (2016.5-2017.5).
The proportion of missing children in Beijing is 19.7%, ranking first in China; therefore Beijing is a high incidence areas of missing children. Beijing’s per capita GDP is located in top three in China. So, the disappearance of children is closely related to the above factor. The proportion of missing children is higher than other areas, and per capita GDP is located in the upper reaches of the national average, in Henan, Hebei, Shandong, Anhui, Shaanxi, Sichuan, Hubei, and Jiangxi. The disappearance of children was positively correlated with GDP. The per capita GDP level is relatively low, the incidence of missing children is relatively small, the situation is more optimistic in Inner Mongolia, Xinjiang, Qinghai and Tibet. In Jilin, Heilongjiang, Zhejiang, Fujian and other economically developed areas, the number of missing children is relatively small and no significant correlation with the level of per capita GDP. But the situation in Tianjin is more unusual. GDP per capita of Tianjin city is located in the forefront of the country, but the number of missing children were 12, the proportion of children were 1.7%, compared to other regions is relatively low, there is no close connection with per capita GDP. This may be related to the area of the two municipalities directly under the central government, the promotion of MCMAS.
Overall, the more economically developed provinces have a higher proportion of missing people, it shows that the economic status of the proportion of missing children have a direct impact on the size of.
4.3. According to the Analysis of Population Density
The correlation analysis between the proportion of missing children and the population density in high density areas is illustrated in Figure 9.
The situation of the children abduction is the most serious in Beijing and the number of missing children is 142. Beijing has a large population base, population density and population density is located in the top three. In Henan, Hebei, Shandong, Shaanxi, Sichuan, Jiangxi, Guangdong, the population density is
Figure 9. The correlation between the proportion of missing children and the population density in China (2016.5-2017.5).
larger than other provinces and cities, the proportion of missing children is higher, and population density is positively correlated. Missing children cases appear relatively less in the area with much land and few people of Ningxia, Inner Mongolia, Tibet, Xinjiang, Qinghai, and Yunnan. Provinces such as Fujian, Hainan and Chongqing, where the population density is higher than the national average, the proportion of missing children is relatively small, and there is no significant relationship with population density. But the situation in Tianjin, Shanghai has been abnormal. The population density of the two municipalities in the country is in the top three, but the number of missing children is less than that of other provinces and cities. The reason may be the same as analysis results of the number of missing children before and GDP.
In general, there is a large proportion of missing children in the region where the population density is high, but there are also a few cities with a relatively low population density and a larger proportion of missing children, such as Shanghai.
The spatiotemporal big data analysis has more and more influence on people’s lives. Thermodynamic map based on the research and analysis of the spatial data mining of the MCMAS, shows the spatial and temporal distribution of missing children by taking the broad view. At the same time, the relationship between population density, economic status and the proportion of missing children was investigated by the correlation factor analysis. We primarily dig out the missing personnel development patterns and rules. With the development of spatiotemporal big data, we need more ways to deal with the mining and analysis of spatial data to face the larger and higher dimensional future. Since MCMAS is currently used only in China, data on missing children in China can only be obtained. Future research includes not only the in-depth analysis of multi-dimensional factors based on the current basis, but also the data collected from other countries abroad.
This research is supported by Enhancement of research capability and teaching integration of Mobile GIS based on MCMAS (No. KYJJ2017034), Outstanding Youth Researcher Program of Beijing University of Civil Engineering and Architecture (No. 21082716012), National Natural Science Foundation of China (No. 41301489), Beijing Natural Science Foundation (No. 4142013), Outstanding Youth Teacher Program of Beijing Municipal Education Commission (No. YETP1647), Foundation of Key Laboratory for Urban Geomatics of National Administration of Surveying, Mapping and Geoinformation (No. 20141206NY), Student Leading Team Program of Beijing University of Civil Engineering and Architecture (No. 00082816001), and the BUCEA Post Graduate Innovation Project (No. PG2017018).
 Liu, J. (2016) The Missing Children Mutual-Assistance System of China.
 Parimala, M., Lopez, D. and Senthilkumar, N.C. (2011) A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases. International Journal of Advanced Science and Technology, No. 7, 59-66.
 Shiota, S., Okamoto, Y., Okada, G., Takagaki, K., Takamura, M., Mori, A., et al. (2017) The Neural Correlates of the Metacognitive Function of Other Perspective: A Multiple Regression Analysis Study. Neuroreport, 28, 671-676.
 Ng, R.T. and Han, J. (2002) Clarans: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge & Data Engineering, 14, 1003-1016.
 Liu, J., Yao, Y., Gong, X., Cheng, H., Feng, Y., Fu, L. and Du, M. (2016) The Design and Cloud Achievement of the Missing Children Mobile GIS Mutual Assistance System of China. In: International Conference on Cartographic Visualization of Big Data for Early Warning & Disaster/Crisis Management (EW&CM): Methodology, Techniques, and Applications, 26-30.
 Berrahou, L., Lalande, N., Serrano, E., Molla, G., Bimonte, S., Bringay, S., et al. (2015) A Quality-Aware Spatial Data Warehouse for Querying Hydroecological Data. Computers & Geosciences, 85, 126-135.
 Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R. and Wu, A.Y. (2002) An Efficient k-Means Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24, 881-892.
 Shameem, M.U.S. and Ferdous, R. (2009) An Efficient k-Means Algorithm Integrated with Jaccard Distance Measure for Document Clustering. Asian Himalayas International Conference on Internet, Kathmandu, 3-5 November 2009, 1-6.