In recent years, due to the high dependence of mineral projects on the precise determination of the tonnage of mineral materials, the various methods have been developed to estimate the grade such as geometric methods based on distance and geostatistics  .
Each method has limitations and disadvantages which affect the accuracy of the estimation . One of the new methods is the grade estimation using the clustering. The cluster analysis methods are widely used in the earth sciences. The cluster grouping method is used to classify geochemical data .
The cluster analysis connects the observations to each other which together have many similarities, then, the observations consecutively connect them which are mostly similar to previous observations . In other words, in clustering, we try to divide the data into clusters that the similarity is maximized between the data within each cluster and it is minimized between the data within the different clusters . No classes already exist in the clustering method and in fact, the variables are not divided independently and dependently, but here, the search is performed to access groups of data which are similar to each other and the behaviors can be better identified by discovering these similarities and it can be operated to achieve a better result based on them . The clustering method is an indirect method; this means that it can be used even when there is no previous information from the internal structure of the database. This method can be used to discover hidden patterns and improve the performance of direct methods .
The K-means method is one of the methods for clustering the data in data mining. It is an exclusive and planar method which has been widely studied by different researchers and attempts to cluster the samples with the specified number of k classes so that the total Euclidean intervals of each sample are minimized from the center of the class which has been assigned to it .
The K-means methods are used to properly analyze the data behavior and available analyses to each other. Some of its applications include: the division of the geological terrain  , the classification of the effect of vegetation and the recovery of water health in the Mediterranean coast forests  , the presentation of geochemical patterns in mineral areas  , predicting the organic carbon in the intelligent systems  , and determining the effect of gas diffusion in urban environments .
In this article, the behavior of the gold, arsenic, and antimony elements has been evaluated using K-means method, MATLAB and SPSS software based on the data collected from the drainage sediments and then, the gold grade is estimated using the results.
2. The Study Area
The study area is the geochemical sheet 1:100.000 of stream sediments called Tarq which is provided by the Geological Survey of Iran with code 6356 and it is located in the geographic latitudes are 24˚51' to 24˚52', and the latitudes are 55˚32' to 33˚33' .
This area is located in Isfahan province as well as the area divides into two distinct eastern and western parts by the Kohroud Mountain range with the west bank and Karkas Mountain with a height of 3895 m. Northeast low altitude plain has an average height of 1000 m which is outside the studied area and the southwest plain called Robat Sultan has nearly 1650 m altitude and there will be farming more or less in it .
Kahroud and Garrison Mountains divided the drainages of an area into two distinct networks from each other. Eastern network flows to the sabulous and southwest drainages join the Zayandehrood River in the east of Isfahan. Tame rivers have a constant flow on the eastern hillside of Karkas Mountain and most of the springs are on the hillsides of this mountain (Figure 1). Some springs have Calcium bicarbonate to a great extent which they create the sediment of travertine stone in the area. The annual rainfall was reported at 51.8 mm .
General Geology of the Area
The stratigraphy of the zone consists of Precambrian to Quaternary rocks, which
Figure 1. Location of samples and stream in the area .
are described from the old to the new, respectively. The upper Precambrian and the former Cambrian units are yellow-tufted dolomites which in the shale and lime layers are observed. This unit has a thickness of about 360 m  .
The sedimentary rocks are more clastic and the lower part of them is related to Cambrian and Ordovician which are located in the sequence of red sandstone layers. Generally, this unit consists of yellow silicified dolomite rocks, red shale, bright yellow or gray Trilobite Limestone, Red and green sandy shale  .
The doleritic rocks are formed in the Silurian and Devonian unit and red ce-ment clay and hematite sandstones are located in these dolerites. In the following, the alternation of yellow sandstone layers and dark dolomite are created with interlayers of red shale. In the upper part, the layers of limestone and dolomite have been made with a thickness of 140 which contain brachiopod, trilobite, coral, and tentaculite fossils  . This unit is bright gray which turns into black brachiopod limestone in the upper part and it has an interlayer of oolite and pizzolite (Figure 2).
3. The Study Method
637 samples of drainage sediments have been collected from the area and have been analyzed using ICP-MS method. In this paper, in order to estimate the gold grade and to study the behavior of arsenic, antimony, and gold elements, only these three elements have been investigated due to being paragenesis as well as the correlation coefficient. As can be seen from Table 1, the values of the correlation coefficient of these three elements are good and indicate the relationship with these three elements with each other. Also, given that most geochemical data are either closed or compound, if the proper conversion is not performed for data expansion, the wrong results are obtained. Therefore, data must be separated before doing any processing on the data. In this study, ilr method  was used to separate the data.
The K-Means Algorithm
The K-means algorithm starts with a given value for K (number of classes) and tries to estimate the following cases.
Finding the points as centers of clusters, in fact, these points are the same average points of each cluster.
Assigning each sample data to a cluster that data has the smallest distance to the center of that cluster . In the simple form of this method, first, the points are selected randomly as much as needed clusters. Then, the data is assigned to one of these clusters according to the similarity and so, new clusters are obtained .
New centers can be calculated for them in each of iterations by repeating the same steps and averaging of data and again the data can be attributed to new clusters . The important steps of this algorithm are summarized as follows  .
Figure 2. Tarq geology sheet 1:100,000, Isfahan .
Table 1. Spearman correlation coefficients matrix for gold, arsenic, and antimony elements.
1) First, k members randomly are selected as the number of clusters among the n members (k is the number of clusters)
2) Zj vector is calculated based on Equation (1) which represents the center of each class Cj.
3) In this equation, x represents the vector of a sample which is a member of Cj and #Cj represents the number of samples which are members of the Cj class. It should be noted that relation (1) is used to calculate the center of each class during solving and usually, k samples are randomly selected at the start of the algorithm and are considered as the center of each class .
4) Calculate the target function of the classification based on Equation (5) which calculates the total distance of samples from the center of the classes.
5) Minimize the objective function of Equation (2) and find the proper classification on the M set with the number k of classes.
And software has been introduced by the author to speed up the operation above .
4. Results and Discussion
4.1. Results and Discussion on the Results
In various studies, such as the relationship of altered diorite with magnetite mineral in the iron belt of Chile  , the relationship between copper and molybdenum of porphyry copper ore  , and the relationship between elements of the platinum group of porphyry copper ore  , the behavior of elements has been measured relative to each other in various methods. In the current study, k optimum value has been calculated using the K-means method for clustering the drainage sediment data in the Tarq area with three grade value of elements of gold, arsenic, and antimony (taking into account the coordinates of the sampling points), because the elements of arsenic and antimony are important elements in determining the geochemical halos of the gold element.
In this study, two appropriate criteria have been used to determine the appropriate value of k to determine the number of clusters. The first used benchmark is the S(i) that the number of clusters is changed from 3 to 10 based on it and then, the obtained results are analyzed to select optimal k using the above benchmark .
An appropriate benchmark has been determined according to Equation (3) for determining optimum k. The obtained classifications are measured based on the benchmark.
In the above equation, S(i) expresses the utility rate of the ith sample in its class, the parameter Aveg_Within(i) represents the average distance between the ith sample and the other samples in that class which ith sample is a member of that class and the parameter Aveg_Between(i,k) represents the average distance between the ith sample and the other samples which are members of another class such as k  .
The results are analyzed by calculating the utility rate as an average utility. The utility rate varies between −1 and +1; as this value approaches to +1, the sample is a member of more appropriate classification and as it approaches −1, it has an inappropriate classification and the zero number means that the presence of the sample is not very important in the current classification or another classification. So, the value of Equation (6) is calculated for each sample and then the obtained results are analyzed by calculating the average numbers as the average utility rate of the classification.
The second used benchmark is the quality function. According to the information, the best cluster maximizes the total similarity between the cluster center and all cluster members and minimizes the total similarity between cluster centers. First, a range is determined for the number of clusters to select the best cluster which is between 3 and 10 in this research. Then p(k) is calculated for each value k.
The k value is selected as the optimal number of clusters which maximizes p(k). In this way, the number of clusters can be selected for which the distance is maximized between cluster centers and the similarity of cluster centers with the members within each cluster. The quality of clustering results is defined with k clusters as follow  :
In these equations, O is set of cluster centers; Cn is centers of clusters; On is the set of elements which has not been selected as cluster centers; Tc is the set of all elements which is clustered; ηn is the average similarity the between the center of the cluster Cn and all the cluster elements of On, ηm is the average similarity the between the center of the cluster Cm and all the elements of the cluster Om and finally, δnm is defined as the similarity of Cn and On .
4.2. The Monitoring of Gold, Arsenic, and Antimony Elements Relative to Each Other
In order to study the behavior of the elements relative to each other, the cluster profile and the utility rate of each sample were determined in pair for classifications k = 3 and k = 10 for the elements of Gold, Arsenic, and Antimony, and the results of the utility rate of the classes have been compared together, the best class is determined according to the utility rate of the classes, and then the centers of the clusters of each class are determined according to it.
As shown in Figure 3, class 4 is selected as the best class according to the class profile diagrams and the utility rates of the best class for the two gold and arsenic elements, because as much as the utility rate is close to 1, the samples are more correctly located in the class. According to the diagram, the little negative values are also found in this classification.
The utility rate average in this classification is equal to 0.7574 which is greater
Figure 3. The profile of clusters and utility rates with 4 to 6 classes related to two elements of Gold and Arsenic.
than the average utility rate of other classes and based on Figure 4 and Figure 5 for other elements, the best classification is selected regarding the clusters profile and average utility rate. The class 9 and class 3 are selected as the best classification for Gold and Antimony with the average utility rate 0.7136 and for arsenic and antimony with the average utility rate 0.5872, respectively.
The value of k has been increased to 20 due to the changes in the utility rates and to ensure the results, but the utility rate has not exceeded the rate of the best classification of each class, and it has decreased for more than 20.
The diagram of Validation value S(i) can be shown in Figure 6 based on changing the number of clusters to select the optimal cluster number which is easier to compare. In other words, a cluster is selected as the optimal number of clusters which has the highest value of S(i).
Figure 6(a) shows the value of S(i) for the two elements of gold and arsenic and Figure 6(b) shows it for two elements of gold and antimony which has the highest value in the cluster 9. Figure 6(c) shows the value of S(i) for two elements of Arsenic and Antimony which has the highest value in the cluster 3.
Also, the proper number of clusters is determined according to the quality function and using the value of p(k). The value of p(k) has been calculated using
Figure 4. The profile of clusters and utility rates with 7 to 10 classes related to two elements of Gold and Antimony.
Figure 5. The profile of clusters and utility rates with 3 to 6 classes related to two elements of Antimony and Arsenic.
Figure 6. The validation value S(i) based on the number of clusters ((a) Gold-Arsenic, (b) Gold and Antimony, (c) Arsenic and Antimony).
Equation (6) for different k values to determine the number of clusters. As stated, the maximum value of p(k) represents the proper number of clusters. Table 2 shows the values of p(k) corresponding to the number of clusters. The highest value is 6198 in monitoring the two elements of gold and arsenic. Therefore, the most suitable number of clusters is 4 and so the number of clusters is 9 for two elements of gold and antimony. The number of clusters is 3 for two elements of arsenic and antimony. According to its sampling location, the most suitable number of clusters is 3 for the elements of gold, arsenic, and antimony. As can be seen, the proper number of clusters is obtained from the quality function which is consistent with the standard results of S(i).
For the best classification, the centers of classes are shown in Figure 7 for two elements of gold and arsenic per 4 classes and in Figure 8 for two elements of gold and antimony per 9 classes and in Figure 9 for two elements of arsenic and antimony per 3 classes.
Based on this classification shown in Figure 7, the grade of Arsenic element first decreases and then increases by increasing the grade of the gold element. According to this decreasing and increasing, the best curve is second order with positive concavity. The fitted line is and its correlation coefficient has been obtained R2 = 0.9927.
The grade of the gold and antimony (Figure 8) increases by increasing the grade of the gold element. Based on this increasing, its best fit is a line with a
Table 2. The values of p(k) for the number of different clusters.
Figure 7. The best line fitted to the centers of classes per four classes for gold and arsenic elements.
Figure 8. The best line fitted to the centers of classes per nine classes for gold and Antimony elements.
Figure 9. The best line fitted to the centers of classes per three classes for arsenic and Antimony elements.
positive slope. The line equation is and the correlation coefficient has been obtained R2 = 0.9235.
Due to the behavior of gold relative to arsenic and antimony, Arsenic decreases and antimony increases by increasing gold, for this reason, we see the unexpected behavior that two elements of arsenic and antimony do not have an absolute definite relationship. Based on the classification shown in Figure 9, the grade of Antimony element increases and then decreases by increasing the grade of Arsenic element. The fitted line equation is and the correlation coefficient of the equation fitted to the center of the classes is equal to R2 = 1.
4.3. Investigating the Gold Behavior Regarding the Grade of Arsenic and Antimony Elements
The cluster profile and the utility rate of the classification are given in Figure 10 according to the value of k = 3 to k = 10 for the three elements of gold, arsenic, and antimony (considering the length and width of the points).
As observed, based on the obtained results for different values of k (3 to 10), the best classification of the specimens has 5 classes with the characteristic of the
Figure 10. The profile of clusters and utility rates with 3 to 6 classes for gold, arsenic, and antimony (with coordinates).
gold, arsenic, and antimony grade, and the length and width of the samples.
Also, according to Figure 11, the highest value of S(i) belongs to 5 classes. The value of k has been increased to 20 due to the changes in the utility rates and to ensure the results, but the utility rate has not exceeded the rate of this class and it has decreased for more than 20 classes.
Therefore, in order to obtain center characteristics of classes, all input values must be in a standard interval to prevent the error in calculations and obtain accurate estimation value given the fact that coordinates are considered as input features besides the grade of gold, arsenic, and antimony and the interval of coordinates variation and grade values are different. For this reason, all inputs were selected in the interval [1, 0] using Equation (10).
The characteristics of cluster centers are given with five classes in Table 3.
5. The Prediction of the Gold Grade
In this section, a relationship has been determined between gold, arsenic, and antimony using the multi-variable regression in the SPSS software according to the length and width of the samples selected from all samples or cluster centers, so that the elements grade can be estimated using obtained relationship.
The gold value is introduced to the software as a dependent variable and the values of arsenic and antimony, length, and width of the points are introduced as
Figure 11. The change in the validation value S(i) based on the number of clusters (for three Gold, Arsenic, and Antimony elements).
Table 3. The normalized characteristics of cluster centers.
5.1. The Estimation with All Samples
The formula of the multiple regression line is determined (Equation (12)) according to the coefficients in Table 4.
The obtained R-value represents a parabola equation that the regression model has been able to explain the variations in terms of y (Gold). Here, the R-value is 70 and therefore, 70% of variations of gold values are due to Xs (i.e. arsenic, antimony, length and width of the sampling points). To validate the gold grade estimation based on the obtained equation (Equation (12)), a number of actual data should be compared with the values obtained from the equation to measure the accuracy of the estimator. Therefore, 30% of the samples randomly separated before the multi-variable regression based on the values of the antimony and arsenic elements and the length and width of the sampling points and replacing it in Equation (12) and then, according to these samples, the value of the gold element is estimated and is compared with its actual value. The results are shown in Figure 12 as a dispersion diagram. Figure 12 shows the correlation between the real and estimated grades with a correlation coefficient of 52% which indicates the relative accuracy of the used method.
Table 4. Regression coefficients.
Figure 12. Estimated value of gold versus actual value.
Considering the actual gold amount with the Kriging method, the area map (Figure 13) is drawn to compare with the resulting map with the grades obtained from Equation (12) (Figure 14). The point of these maps is the normalization of all of the parameters. For this reason, it can only be used visually for the accuracy of the estimation.
5.2. The Estimation with K-Means Clustering Centers
The estimations were performed exactly with the above method but without data and only with the centers of the classes; According to regression, it is suggested that the coordinates be eliminated from the prediction system due to lack of grade justification. Therefore, the coefficient is considered only for the two values of arsenic and antimony in this method and this fit is 100% definitive for centers and the Equation (13) is obtained from the results of Table 5.
Given the generality of estimation, it was tested on 100 data randomly (Figure 15).
This method is about 72% more accurate than the previous one and it can be said that the K-means method is a more efficient method.
Figure 13. Schematic representation of actual Gold grades on the map from the highest (red) to the lowest grade (green).
Figure 14. Schematic representation of predicated Gold grades on the map from the highest (red) to the lowest grade (green).
Table 5. Regression coefficients with the centers of classes.
Figure 15. Estimated value of gold versus actual value.
Considering the evidence of gold mineralization in the Khoni zone, investigating the extent of geochemical halos and the behavior of gold paragenesis elements is important in the region. For this purpose, the behavior of the elements of gold, arsenic, and antimony was compared with each other in the area using K-means method and it was examined between two elements. The equations were presented along with correlation coefficients. Then, the relationship of elements was determined using this method by taking into account the latitude and longitude of the samples in order to estimate the grade and more accurate estimation of the appearance and extent of the geochemical halos in the studied area. According to the results obtained from the process of the mentioned elements, the equation is as Equation (13) for the estimation of the gold grade based on four parameters of arsenic grade, antimony grade, length and width of sampling points and the correlation coefficient has been reported 72%. These results can be used in a variety of topics  - .
Thank you Dr. Seyyed Saeed Ghanadpour and Ms. Neda Mahoosh Mohammadi for help on the above topic as well as providing specialized help and creating this method  .
 Osterholt, V. and Dimitrakopoulos, R. (2018) Simulation of Orebody Geology with Multiple-Point Geostatistics—Application at Yandi Channel Iron Ore Deposit, WA, and Implications for Resource Uncertainty. In: Dimitrakopoulos, R., Ed., Advances in Applied Strategic Mine Planning, Springer, Cham, 335-352.
 Ozkan, E., Iphar, M. and Konuk, A. (2017) Fuzzy Logic Approach in Resource Classification. International Journal of Mining, Reclamation and Environment, 33, 183-205.
 Carranza, E.J.M. and Zuo, R. (2017) Introduction to the Thematic Issue: Analysis of Exploration Geochemical Data for Mapping of Anomalies. Geochemistry: Exploration, Environment, Analysis, 17, 183-185.
 Majdifar, S. and Kamali, G.H. (2014) Iron Grade Estimation Using ANFIS Algorithm at Tappeghermez Anomaly of Sangan Mine. Journal of Analytical and Numerical Methods in Mining Engineering, No. 5, 15-32. (In Persian)
 Abbas Zadeh, S., Rahimi Pour, G.H. and Najmodini, M. (2013) Recognition of Cu-Porphyry Mineralization Areas by Using One and Multivariate Integration Methods on Drainage Geochemical Data in Ghale Askar Area, Kerman Province. Journal of Analytical and Numerical Methods in Mining Engineering, No. 6, 1-9. (In Persian)
 Malyszko, D. and Wierzchon, S.T. (2007) Standard and Genetic K-Means Clustering Techniques in Image Segmentation. 6th International Conference on Computer Information Systems and Industrial Management Applications, Minneapolis, 28-30 June 2007, 299-304.
 Chen, T.W. and Chien, S.Y. (2009) Bandwidth Adaptive Hardware Architecture of K-Means Clustering for Video Analysis. IEEE Transactions on VLSI Systems, 18, 957-966.
 Mora, J.L., Armas-Herrera, C.M., Guerra, J.A., Rodríguez-Rodríguez, A. and Arbelo, C.D. (2012) Factors Affecting Vegetation and Soil Recovery in the Mediterranean Woodland of the Canary Islands (Spain). Journal of Arid Environments, 87, 58-66.
 Meshkani, S.A., Mehrabi, B., Yaghubpur, A. and Alghalandis, Y.F. (2011) The Application of Geochemical Pattern Recognition to Regional Prospecting: A Case Study of the Sanandaj-Sirjan Metallogenic Zone, Iran. Journal of Geochemical Exploration, 108, 183-195.
 Sfidari, E., Kadkhodaie-Ilkhchi, A. and Najjari, S. (2012) Comparison of Intelligent and Statistical Clustering Approaches to Predicting Total Organic Carbon Using Intelligent Systems. Journal of Petroleum Science and Engineering, 63, 190-205.
 Wegner, T., Hussein, T., Hämeri, K., Vesala, T., Kulmala, M. and Weber, S. (2012) Properties of Aerosol Signature Size Distributions in the Urban Environment as Derived by Cluster Analysis. Atmospheric Environment, 61, 350-360.
 Egozcue, J.J., Pawlowsky-Glahn, V. and Mateu-Figueras, G. (2003) Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology, 35, 279-300.
 Zhou, S., Zhou, K., Wang, J., Yang, G. and Wang, S. (2017) Application of Cluster Analysis to Geochemical Compositional Data for Identifying Ore-Related Geochemical Anomalies. Frontiers of Earth Science, 12, 491-505.
 Ghannad Pour, S., Hezarkhani, A. and Mahvash mohammadi, N. (2013) Investigation of Pb Behavior Circumstance than Zn in Porphyry Copper System of Parkam, Kerman Using K-Means Method. First Geological Conference of Iranian Plateau, Tehran, 2018, 23-41. (In Persian)
 Shirazi, A., Shirazy, A., Saki, S. and Hezarkhani, A. (2018) Introducing a Software for Innovative Neuro-Fuzzy Clustering Method Named NFCMR. Global Journal of Computer Sciences: Theory and Research, 8, 62-69.
 Menard, J.J. (1995) Relationship between Altered Pyroxene Diorite and the Magnetite Mineralization in the Chilean Iron Belt, with Emphasis on the El Algarrobo Iron Deposits (Atacama Region, Chile). Mineral Deposita, 30, 268-274.
 Xu, L., et al. (2012) Relationships between Porphyry Cu-Mo Mineralization in the Jinshajiang-Red Rivermetallogenic Belt and Tectonic Activity: Constraints from Zircon U-Pb and Molybdenite Re-Os Geochronology. Ore Geology Reviews, 48, 460-473.
 Yaghini, M., Ghannad Pour, F. and Khadmatlu, S. (2008) Offering an Clustering Innovative Method in Data Mining by Using Genetic Algorithms to Solve a Real Case Study in the Rail Transportation Industry. Iran Data Mining Conference, Tehran, 2018, 45-60. (In Persian)
 Mahvash Mohammadi, N. and Hezarkhani, A. (2015) Estimation OG Grade Gold in Khooni Deposit Using the Behavior of Gold Arsenic and Antimony by Clustering K-Means Method. Journal of Analytic and Numerical Methods in Mining Engineering, 5, 77-92.
 Ghannadpour, S.S. and Hezarkhani, A. (2015) Investigation of Cu, Mo, Pb, and Zn Geochemical Behavior and Geological Interpretations for Parkam Porphyry Copper System, Kerman, Iran. Arabian Journal of Geosciences, 8, 7273-7284.
 Alahgholi, S., Shirazy, A. and Shirazi, A. (2018) Geostatistical Studies and Anomalous Elements Detection, Bardaskan Area, Iran. Open Journal of Geology, 8, 697.
 Shirazi, A., Shirazy, A. and Karami, J. (2018) Remote Sensing to Identify Copper Alterations and Promising Regions, Sarbishe, South Khorasan, Iran. International Journal of Geology and Earth Sciences, 4, 36-52.
 Shirazi, A., Shirazy, A., Saki, S. and Hezarkhani, A. (2018) Geostatistics Studies and Geochemical Modeling Based on Core Data, Sheytoor Iron Deposit, Iran. Journal of Geological Resource and Engineering, 6, 124-133.
 Shirazi, A., Hezarkhani, A., Shirazy, A. and Shahrood, I.R.A.N. (2018) Exploration Geochemistry Data-Application for Cu Anomaly Separation Based on Classical and Modern Statistical Methods in South Khorasan, Iran. International Journal of Science and Engineering Applications, 7, 39-44.
 Khakmardan, S., Shirazi, A., Shirazy, A. and Hosseingholi, H. (2018) Copper Oxide Ore Leaching Ability and Cementation Behavior, Mesgaran Deposit in Iran. Open Journal of Geology, 8, 841.