Received 8 June 2016; accepted 23 July 2016; published 26 July 2016
Since 1970s, pattern recognition methods have been employed to detect hidden information of economic geology. Applications of clustering algorithms are among the most successful experiences in geochemical exploration. Characterization of the spatial distribution of elements of ore deposits has a guiding role for geological exploration  . A combination of mathematical and geological knowledge can be utilized to identify and predict potential exploration targets  . Numerous investigations have been conducted in recent years to identify distribution of elements and put the data samples in appropriate clusters. The selection of appropriate method depends on the complexity of the problems. One of these methods is based on artificial neural network. Most of the researches show that an ANN can be applied successfully to engineering problems without any restriction. It has also been seen that the capability (i.e. pattern recognition and memorization) of an ANN is suitable for inherent uncertainties and imperfections found in geochemical problems  . An important application of neural network is clustering. Clustering is an unsupervised method of data grouping using a given measure of similarity. Clustering approach attempts to organize unlabeled feature vectors into clusters (natural groups) such that samples within a cluster are similar to each other but differ from those in other clusters  . Clustering analysis is an important and useful tool for analyzing large datasets that contain many variables and experimental parameters. Therefore, the application of cluster analysis to complex datasets has attracted a high level of scientific interest in various aspects of geochemistry researches  . In order to investigate the distribution of elements, it is essential for a robust classification scheme to cluster chemistry samples into homogeneous groups  . Several common clustering techniques have been utilized to divide geochemical samples into similar homogeneous groups with the ultimate objective of characterizing the quality of elements such as principal component analysis, fuzzy k-means clustering technique and Q-mode hierarchical cluster analysis to assess the chemistry of groundwater and identify the geological factors. For example, Ji et al. (2007) developed semi-hierarchical correspondence cluster analysis and showed its application for division of geological units with the help of geochemical data that are systematically collected from an area around Tahe in Heilongjiang Province, north China  . Meshkani et al. (2011) used hierarchical and k-means clustering for identifying the distribution of lead and zinc in Sanandaj-Sirjanmetalogenic zone in Iran  . Ziaii et al. (2009) introduced the neuro-fuzzy method for separating anomalies and showed that this method is more efficient than using multivariate statistics  . These methods are efficient at geochemical samples by chemical similarities, but are not useful for the visual assessment of the results and presentation of maps showing geochemical facies  . The recently proposed method of the self-orga- nizing maps (SOM) is likely to become a complementary or alternative tool to the clustering methods   .
The SOM is related to adaptive k-means, but performs a topological feature map that is more complex than just cluster analysis. After training, the input vectors are spatially ordered in the array, i.e. the neighboring input vectors in the map are more similar than the more remote ones  . The self-organizing maps approach is based on the unsupervised learning algorithm, and has excellent visualization capabilities, including techniques that apply the reference vectors of the SOM to give an informative picture of the data  . Sun et al. (2009) applied SOM method to classify Pb-Zn-Mo-Ag anomalies in the mining area around Sheduolong in Qinghai Province, China  . In 2012, Abedi et al. used SOM method and fuzzy k-means (FCM) to provide deposit exploration map for Now Chun copper deposit in Iran. They used vectors with 13 features of three layers including geological, geophysical and geochemical information as input data  . The topology preservation property makes the SOM a popular choice in data analysis. The most important advantages of SOM such as visualization capability and the output map, lead the authors to use this method for having a better conclusion about REEs’ distribution in the study area. The objective of this paper is to show that the self-organizing map is an applicable and suitable approach for zoning of the deposit based on rare earth elements.
2. Geological Settings of Study Area
There are significant concentrations of iron ore in central and north east of Iran. Magnetite is the main mineral in most of important Iron ore bodies. Obtrusive elements are often phosphorus and sulfur in the form of apatite, pyrite and seldom chalcopyrite. Iron deposits of Iran can be divided into two main groups, magmatogene and volcano sediments. Metasomatism is the main reason of concentrating in Iron ore deposits of central Iran  . Figure 1 shows the geographical location of major iron deposits in Central Iran. Moore and Modabberi (2003) suggested that the separation of an iron oxide melt and the ensuing hydrothermal processes dominated by alkali
Figure 1. Geological map of Bafq mineral province  .
metasomatism, were both involved to different degrees in the formation of Choghart and other similar deposits in central Iran  .
The Choghart deposit occurred in the Bafq mining district, which is part of the narrow N-S trending Pan- African rift zone at the eastern margin of the so-called Lut block. The main orebody at Choghart is in the form of a roughly vertical, discordant, pipe-shaped body plunging 73˚NNW and has been explored to a depth of 600 m, where it appears to interfinger with intrusive metasomatized and fragmental wall-rock. The thickness of the metasomatic aureole differs widely. The orebody is hosted by volcanic members (intrusive and extrusive alkali rhyolites) of the epicontinental to continental Infracambrian Esfordi Formation  . The orebody and the metamorphosed country rock are cut by several diabasic dikes. The plain that surrounds the orebody and its metamorphosed intrusive and volcanic country rocks are composed of 150 m of Quaternary formations and recent alluvium, of fine grained sand and gravel, magnetite boulders, gypsum and intrusive fragments. Hematite is the second ubiquitous mineral after magnetite. Although some primary hematite is also found in the drill cores, most of hematite is secondary in origin. Some goethite and hydrous iron oxide occur on the surface, but disappear rapidly with increasing depth. Calcite, dolomite, secondary hematite and talc occur throughout the orebody as veinlets and cementing material of oxidized ore. Rutile and goethite are probably the results of total transformation of the earlier formed martite  .
In the Early Cambrian there are intrusions of granitic plutons into the Precambrian sequence and formation of felsic to intermediate volcanic and volcano-sedimentary rocks. This sequence is composed of an unmetamorphosed series which includes interlayered micro-conglomerates, sandstones, black siltstones and shales, dolomites and dolomitic limestones, mafic to felsic volcanic rocks, volcanoclastic beds and tuffaceous shales  -  . Simplified geological map of Choghart pit, based on different rock types as well as the location of samples within the study area are shown in Figure 2.
Choghart main minerals include magnetite, hematite (Martite), actinolite, tremolite and sometimes pyrite and albite. Apatite-bearing magnetite is formed often in margins of deposit  . Figure 3 illustrates some examples of optical microscopy investigations for three samples from three different rock types consisting host rock, Iron ore body and metasomatite.
The Early Cambrian igneous rocks of the Bafqmining district have a bimodal nature. The chondrite-norma- lized REE patterns display significant variation from LREE to HREE with no considerable Eu anomalies for basaltic rocks. And show obvious enrichment in the LREE with important negative Eu anomalies for the rhyolitic domes  . The REEs enrichment is intensely associated with the formation of phosphate minerals in many IOA deposits. However, sometimes bastnaesite and allanite are significant  . In this study, apatite is the main REE bearing mineral in Choghart Iron ore deposit. Figure 4 shows some examples of apatite which have been observed in this deposit.
Edfelt (2007) explained there are few complications in the phosphate-REE relationship in some Kiruna district  . Hence, the relationship between REE and phosphate minerals in such deposits should be more
Figure 2. Simplified geological map of Choghart pit and sample locations (red frame shows study area) (simplified and modified after Dehghan (2011),  ).
Figure 3. Microscopic images: (a) Host rock sample, consisting of plagioclase and calcite with fine grain background texture of quartz and sericite, porphyry texture, thin section, XPL; (b) Iron ore sample, magnetite, apatite with some martitized magnetite, polished section; (c) Metasomatite sample, microgranoular texture, pyroxene (red), apatite (gray), magnetite and hematite, thin section, XPL.
Figure 4. Apatite samples of Choghart deposit.
understood. In these deposits, appetites characteristically comprise 2000 - 6000 ppm REE   . Daliran (2002) claimed Bafq district apatites contain up to 1.75 wt.% REE  . Some researches present that post-de- positional REE leaching could be happened in apatite in which the inclusions of monazite and xenotime might be seen  -  . The U-Pb dating of monazite inclusions in apatite demonstrates that the REE redistribution in apatite might be happened frequently throughout hydrothermal process several million years after the formation of the IOA deposits  .
3. Data Set
The data set is a collection of 112 lithology samples that were assayed with laboratory tests. 19 features including coordinates x, y, z and concentrations of Phosphate (P2O5) and REEs (Table 1), were selected as input data set. The concentrations of REEs were analyzed in laboratory using inductively coupled plasma mass spectrometry (ICP-MS) due to its sensitivity for trace elements. Phosphate (P2O5) contents were measured by X-ray fluorescence (XRF) spectrometer. Phosphate is in percent and other elements are in ppm. It should be noted that these values have been normalize in order to use them in clustering methods.
According to the results of the ICP-MS analysis, cerium, lanthanum, neodymium and yttrium have the maximum amounts among all the rare earth elements in Choghart. These elements are in relationship with apatite. Therefore, the distribution of phosphorus in this region is associated with the distribution of rare earth elements. Since the apatite is the main source of these elements in study area, Phosphate was chosen as an input variable.
The geological settings in addition to information of field studies and microscopic investigations that mentioned in Section 2, were applied for validity assessment of SOM output map.
Self-organizing maps (SOM) is a type of artificial neural network (ANN), which is applied for clustering as an
Table 1. Rare earth elements and phosphate which have been used as well as average and minimum and maximum of each element. Phosphate (P2O5) is in percent and other elements are in ppm.
unsupervised method. This method was first developed by Kohonen in 1980 and its typical application is to produce a two-dimensional map from a multidimensional space  . This method uses a network to estimate the probability density function of the input space, in a way that maintains the topological structure of the input space. If two vectors in the input space are close together, they would be considered under a same condition. The net of neurons is a right-angle grid and the neighbors repeatedly upgrade. Figure 5 shows physical scheme of self-organizing map.
In this method, first, random amounts of weight, wkj, are considered for the neurons. So:
where K is the number of rows and J is the column. Then a random vector of input data is selected. The next step is calculation of distances between this vector and neurons and finding the closest as wining neuron (Equation (2)). Then, winning neurons and neighboring neurons converge to the input vector. For this purpose, the neighborhood function is defined according to Equation (3)  .
Rectangular or hexagonal neighborhood can be defined. However, the Gaussian kernel is commonly used as follows  :
where η(t) is the learning rate factor and σ(t) is the width of the kernel. Both η(t) and σ(t) are monotonically decreasing functions. To determine the accuracy of the map, error is calculated as Equation (4) and iteration stops when this error is small enough.
To determine the cluster boundaries, unified distance matrix (U-matrix), might be calculated. The U-matrix expresses the distance to the neighboring vectors for each neuron. Large values within the U-matrix indicate the position of cluster boundaries  .
5. Results and Discussion
Optimum number of clusters was determined with silhouette criterion. In this way, a graphical validation was applied for evaluation of cluster number and comparison of the different scenarios. This method is based on calculating the distances between cluster members and distances between the clusters prototype  . The silhouette value for each point shows the similarity of that point with others in its own cluster in comparison to points in other clusters  . Therefore, the number of clusters was changed in the range of 2 - 10 and known K- means algorithm and also SOM was applied for clustering and results evaluated using silhouette criterion. Finally, 4 clusters were decided as the optimal number. In this case the best results of silhouette values were attained (Figure 6). Positive values shows that samples are clustered appropriate and the width of each sample is
Figure 5. Physical structure of self-organizing map  .
Figure 6. Silhouette plot and overall average silhouette width: k-means clustering (left) and self-organizing map (right).
an expression of confidence. However, for 12 samples the silhouette values are negative and illustrate that they have been incorrectly clustered. Since, the logic of both k-means and SOM method is the same, they put the samples in the same clusters. Accordingly, silhouette results of both methods are similar.
The goal of SOM is to represent all input vectors in a high-dimensional space by prototypes in a low-dimen- sional space, such that the distance and topology are preserved as much as possible  . Therefore, in this study, the high-dimensional dataset (19 dimensions that are 19 features including coordinates x, y, z and concentrations of Phosphate (P2O5) and REEs) has been evaluated in a two-dimensional space. Schematic diagram of the structure of self-organizing map in this study is shown in Figure 7. Using self-organizing map, the samples of studied area can be assigned to four clusters, as shown in Figure 8. In this way, 13, 32, 38 and 29 samples respectively were clustered in zones 1 to 4. Each hexagon represents a neuron. In this study, a 2 × 2 network has been used which is composed of four neurons.
Average contents of REEs and phosphate (P2O5) for samples located in each zone have been calculated and presented in Table 2. Comparing the results with laboratory and field studies, these four zones can be described and summarized as follow:
Figure 7. Schematic diagram of the structure of self-organizing map in this study.
Figure 8. Determining the number of samples for each cluster.
Table 2. Average concentrations of rare earth elements for samples in different zones which separated using self-organizing map algorithm.
・ Zone 1: phosphate type
This zone is mainly composed of samples with high contents of phosphorous in the form of apatite. The average of phosphate in this zone is 31%. This type is directly related to rare earth elements and containing the maximum amount of rare earth elements with average of 652 ppm of REEs.
・ Zone 2: albitofyre type
The concentrations of phosphate and REEs are minimum in this zone. They are 0.2% and 121 ppm for phosphate and REEs, respectively.
・ Zone 3: metasomatic and phosphorus iron ore
The samples of this zone are mostly iron ore which are affected by Metasomatism. Moreover, the contents of phosphate and apatite as well as rare earth elements are relatively high. The concentrations of phosphate and REEs are 3% and 124 ppm, respectively.
・ Zone 4: iron ore type
This zone consists of Iron ore. The concentrations of phosphate and REEs are 1% and 105 ppm, respectively.
Since Self-organizing map has a 2-dimensional topology, the relations between centers of 19-dimensional clusters have been illustrated in a 2-dimensional map. Weight distance matrix or unified distance matrix (U-matrix) is one of the tools of SOM. Figure 9 shows neighbor weight distances. Lines are used to display the relationship between neighbor neurons. The darker the color, the further the distance between the neurons, as well as the lighter the color, the lesser the distance between the neurons. Therefore, the distance between zone 1 and zone 2 is maximum. They are the most prone and least prone zones for rare earth elements, respectively. The minimum distance is related to Zone 2 and 4. They both have the least contents of REEs. Finally, zone 1 (phosphate type) is the most promising zone for rare earth elements.
Figure 10 shows the location of samples. For better visual separation, samples of each zone have been shown with distinct colors. Thus, a distinction between zones (or clusters) could be seen based on the coordinates.
Usual ways of clustering which have been used in geochemical explorations, according to literature reviews, were around K-means algorithm. However, in this way and other popular methods such as MLP neural network, topology of samples is not to be considered. Self-organizing map or briefly SOM is a type of artificial neural network (ANN), which is applied for clustering and its advantages, is to involve topological settings of dataset and gives a two-dimensional map from a multidimensional input dataset. This method has been used already in some fields of Earth Sciences such as geophysics and seismology. In this study, known SOM was applied to find REEs’ distribution in Choghart Iron ore deposit. Accordingly, after finding optimal number of clusters using silhouette criterion, a two-dimensional map was composed. Finally, studied area was subdivided in four zones
Figure 9. SOM neighbor weight distances.
Figure 10. The result of SOM. The studied area has been divided to 4 zones.
which have a good agreement with rock types. Field studies and laboratory analysis confirm that there are four different rock types. Given that just REEs and phosphate (not all elements) have been used for clustering, it can be concluded that this algorithm has worked well. In addition, this study shows that the preservation of topology is one of the advantages of this method for geochemical exploration. The comparison between the results and laboratory analysis as well as checking with field observations, confirm authenticity of this study.