The most important data mining problems based on unsupervised learning approach is Clustering and it is very useful for the extraction of data distribution and patterns in the datasets. The clustering process is used to discover both the dense and sparse regions in a dataset. The two main broad approaches are partitioning approach and hierarchical approach. The hierarchical clustering creates a hierarchy of clusters from small to big or big to small and consequently it is named as agglomerative or divisive clustering techniques respectively. Clustering of numerical data has been studied in the past  . However, practically we may have different types of data such as binary, categorical, spatial, ordinal, and temporal or mixture of these. New methods and interesting algorithms for clustering categorical and spatial data have been proposed in the recent times     .
Association rule mining is an important data mining problem which derives associations among data and was formulated by Agrawal et al.  . Extracting association rules from temporal dataset is also an interesting data-mining problem. In  , Ale et al. have proposed a method of extracting association rules which hold within an itemset’s lifetime, where the lifetime of an itemset is the time-period between the first appearance and the last appearance of the itemset in the transactions. In  , the work proposed by Ale and Rossi  is extended by considering time-gap between two consecutive appearance of an itemset in the transactions. The algorithm discussed in  mines all locally frequent itemsets along with the list of time-intervals. Each frequent itemsets is associated with a time-intervals lists, where it is frequent. From locally frequent itemsets, we can define various periodic patterns by considering the time-stamps as calendar dates. All such patterns are discussed in detail by the same authors in   . While extracting periodicity of a frequent pattern, if the associated time- intervals overlapping, then they can be piled up to form fuzzy interval  . Thus, we can have some periodic patterns with each pattern is associated with a fuzzy time interval which describes its period.
In this paper, we devise an agglomerative hierarchical clustering method to explore clusters among such periodic patterns. We define the similarity measure on the corresponding fuzzy time intervals  associated with the periodic patterns. Then a merge function is defined in terms of the similarity as the union of the pair of periodic frequent itemsets. If the value of the similarity function of fuzzy time intervals is greater than some pre-assigned thresholds, then the corresponding frequent itemsets pairs are merge to form larger cluster/itemset and their corresponding fuzzy time intervals are also aggregated  . Finally, in this paper we present an algorithm for the clustering of periodic patterns.
The rest of the paper is arranged as follows. Section 2 presents a brief literature review related to the existing clustering algorithms. In section 3, we present some basic definitions and results used in this paper. The proposed agglomerative clustering algorithm is discussed in section 4. In section 5, we discuss some analysis of experiments and results. Finally, we wind up the paper with possible future enhancements of the proposed work in section 6.
2. Related Works
In this segment, we present a brief assessment of the existing research findings related to our work. In  , Gibson et al. have proposed an algorithm for clustering categorical data. Their approach is based dynamical system. In  , authors have proposed a clustering algorithm called BIRCH. The algorithm proposed in  is a hierarchical agglomerative algorithm and it is an efficient algorithm for large datasets. In  , authors have proposed a robust method for clustering categorical data using summaries. It is known as ROCK and it is an agglomerative hierarchical clustering. In  , authors have done a survey on clustering time- series data. In  , authors have discussed algorithm for mining temporal data. In  , the authors have done a survey on temporal data clustering. Concept of fuzzy sets  has been widely used in different areas including cluster analysis, association rule mining, pattern recognition and signal processing in the last couple of years. In  , Dutta and Mahanta have proposed an algorithm for clustering large categorical database. The approached used in  is a fuzzy set based approach. In  , author has made a survey on fuzzy clustering. In  , authors have discussed about the applications of fuzzy sets in pattern recognition.
Finding associations among data has also attracted a large number of researchers. In  , authors have presented a nice and efficient algorithm for association rules extractions. In  , authors have presented a modified A-priori algorithm for the extractions of temporal association rules. In  , authors have extended the work of  by adding the time-gap between two consecutive transactions containing an itemset. The algorithm  , gives all locally frequent itemsets along with the lists of time intervals. Here each frequent itemset is linked with a time intervals list in which it is frequent. To compute the periodicity of such frequent itemsets if the time intervals associated with them have overlapping, then a method of redefining the time intervals is proposed in  , which turns out to be fuzzy time intervals.
3. Problem Definition
In this section, we present a summarized view of some definitions and results on which our proposed algorithm is based.
3.1. Fuzzy Sets
Let X be the universe of discourse, then the fuzzy set A of X is characterized by , where = the membership functions of x in A and .
3.2. a-Cut of Fuzzy Sets
An α-cut of the fuzzy set A of X is actually a crisp set Aα with elements x of X having membership greater than or equal to α i.e. .
3.3. Convex Fuzzy Sets
A fuzzy set is said to be convex if all it’s α- cuts are convex sets.
3.4. Fuzzy Numbers
A fuzzy number is a convex set defined in the real line whose membership value is 1 for at least one x Î X.
3.5. Trapezoidal Fuzzy Numbers or Fuzzy Intervals
A trapezoidal fuzzy numbers denoted by A = (a, b, c, d), where it’s membership function is given by
In short, we can express the above membership function as
It is to be mentioned here that our fuzzy time intervals associated with periodic frequent patterns are actually trapezoidal fuzzy numbers. The fuzzy time intervals are formed using method  and is an L-R fuzzy intervals   .
3.6. Generalized Trapezoidal Fuzzy Numbers
A generalized trapezoidal fuzzy numbers is represented as A = (a, b, c, d, h), where its membership is given by
In short, we can express the above membership function as
3.7. Similarity Measure
Let 0 ≤ a ≤ 1, 0 ≤ b ≤ 1, then the similarity measure between a and b is given by
Let A = (a1, a2, a3, a4: hA) and B = (b1, b2, b3, b4: hB) be two generalized trapezoidal fuzzy numbers, then the similarity measure is defined in  as follows
The larger the value of S(A, B) the more similarity between A and B. Obviously, 0 ≤ S(A, B) ≤ 1, A and B will be identical if S(A, B) = 1.
3.8. Merger of Periodic Frequent Itemsets Belonging to Two Different Clusters
Let A and B be two periodic frequent itemsets having periods T and S respectively. Then the merge function is defined , if and only if S(T, S) ³ θ, where, θ is pre-defined threshold (small positive numbers). It is to be mentioned here that corresponding fuzzy time intervals of T and S associated A and B will be aggregated  to form new fuzzy time interval.
4. Proposed Clustering Algorithm
In this segment, we describe our proposed clustering algorithm based on the notion explained in the previous section. The proposed algorithm takes as input, all periodic patterns with fuzzy time intervals describing their periods. The fuzzy time intervals are constructed using the methods discussed in  . Assuming that each pattern is associated with exactly one fuzzy time interval, we want to do the clustering of periodic patterns in such a way that each cluster will contain similar type of periodic patterns. The similarity between periodic patterns is defined in terms their corresponding fuzzy time intervals. The similarity measure, S() function is discussed in section III. The itemset A and B having fuzzy time intervals T1 and T2 are said to be similar if and only if the value S(T1, T2) is greater than some pre-defined threshold.
Initially, each pattern is assigned to a separate cluster. Thereafter, for each pair of clusters the similarity value S( ) is calculated and merge function is applied (to generate a new bigger cluster) if the S( ) is greater than the threshold. And their corresponding periods/fuzzy time intervals are aggregated  . The process of merging continues till no merger of clusters is possible or there is only one cluster at the top. In bellow we present the pseudo code for the proposed algorithm.
Frequent Pattern Clustering Algorithm (n, θ)
Input: The number of frequent patterns n and threshold θ
Output: A set of cluster S of with fuzzy time intervals
Initially set of clusters is empty
1) S ¬ f
2) read each frequent pattern A[i] with fuzzy time intervals T[i]
3) To construct a cluster C with T if a cluster C1 Î S with sim(T1, T) ³ θ
4) Then C = merge (C1, C) with T = aggregate (T1, T)
5) remove C1 and T1 from S
6) add C with its fuzzy time intervals to S
7) Process continue till no merger is possible.
8) return S
5. Experiment & Discussions
For experimentation, we have used a synthetic dataset T10I4D100K, available from FIMI1 website. As the dataset is non-temporal, we consider the temporal features, the calendar dates and execute the algorithm  to get the periodic patterns and then execute our clustering algorithm for the threshold value θ = 0.4. The clustering results along with the number of misclassified itemsets obtained are presented in the Table 1. We also represent trend of number of clusters with respect to the number of transactions in graphical form given in Figure 1 and with a bar diagram in Figure 2.
6. Conclusion & Future Works
In this paper, we have presented an agglomerative-hierarchical clustering algorithm to find clusters among periodic patterns with fuzzy time intervals. The
Table 1. Clustering results along with the number of misclassified itemsets for different sets of transactions.
Figure 1. Graph of no. transaction vs. no of clusters/itemsets.
Figure 2. Bar diagram cluster and misclassified items.
algorithm starts with as many clusters as the periodic patterns having fuzzy time intervals. Then, if their similarity value is greater than pre-defined threshold, the pairs of clusters are merged. The similarity is defined on fuzzy time intervals associated with periodic patterns. After each level the corresponding fuzzy time intervals are updated by aggregation. Although we have used the agglomerative- hierarchical approach in this paper; any other approach can also be considered provided the similarity measure is properly defined.