The mathematics behind fractals began to take shape in the 17th century when mathematician and philosopher Leibniz considered recursive self-similarity (although he made the mistake of thinking that only the straight line was self-similar in this sense) . Iterated functions in the complex plane were investigated in the late 19th and early 20th centuries by Henri Poincar, Felix Klein, Pierre Fatou and Gaston Julia. However, without the aid of modern computer graphics, they lacked the means to visualize the beauty of many of the objects that they had discovered . In the 1960s, Benot Mandelbrot started investigating self-similarity in papers such as How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension, which built on earlier work by Lewis Fry Richardson. Finally, in 1975 Mandelbrot coined the word fractal to denote an object whose Hausdorff-Besicovitch dimension is greater than its topological dimension . Fractal image compression (FIC) was introduced by Barnsley and Sloan . They introduce in another work a better way to compress images , and after that, (FIC) has been widely studied by many scientists. FIC is based on the idea that any image contains self-similarities, that is, it consists of small parts similar to itself or to some big part in it . So in FIC iterated function systems are used for modeling. Jacquin  presented a more flexible method of FIC than Barnsley’s, which is based on recurrent iterated function systems (RIFSs) introduced first by him. RIFSs which have been used in image compression schemes consist of transformations which have a constant vertical contraction factor. Fisher  improved the partition of Jacquin. A hexagonal structure called the Spiral Architecture (SA)  was proposed by Sheridan in 1996. Bouboulis et al.  introduced an image compression scheme using fractal interpolation surfaces which are attractors of some RIFSs. Kramm presented a quite fast algorithm , manages to merge low-scale redundancy from multiple images. In this work, Neural Networks is used to optimize the process, by using self-organizing neural networks to provide domain classification. The experimental results are presented and the performance of the algorithms is discussed.
Artificial Neural Networks (ANN) has been used for solving many problems, special in cases where the re- sults are very difficult to achieve by traditional analytical methods. There have already been a number of studies published applying ANN to image compression . It is important to emphasize, although there is no sign that neural networks can take over the existing techniques , research on neural networks for image compression is still making some advances. Possibly in the future this could have a great impact on the development of new technologies and algorithms in this area.
J. Stark first proposed a research to apply the neural network to iterated function system (IFS) . His me- thod was using Hopfield neural network to solve the linear progressive problem and get the Hutchinson metric quickly. However, his neural network approach cannot obtain the fractal code automatically. A few methods of optimization of exhaustive search  , which were based on clustering of the set of domain blocks, were suggested. But the majorities of these methods either decrease lightly the computational complexity or result in high losses of the quality of an image. The method of clustering by means of Artificial Kohonen neural self-op- timizing network is least afflicted with these disadvantages .
2. Neural Networks
A neural net is an artificial representation of the human brain that tries to simulate its learning process. The term “artificial” means that neural nets are implemented in computer programs that are able to handle the large num- ber of necessary calculations during the learning process. To show where neural nets have their origin, let’s have a look at the biological model: the human brain.
2.1. The Components of a Neural Net
Generally spoken, there are many different types of neural nets, but they all have nearly the same components. If one wants to simulate the human brain using a neural net, it is obviously that some drastic simplifications have to be made: First of all, it is impossible to “copy” the true parallel processing of all neural cells. Although there are computers that have the ability of parallel processing, the large number of processors that would be neces- sary to realize it can’t be afforded by today’s hardware. Another limitation is that a computer’s internal structure can’t be changed while performing any tasks.
And how to implement electrical stimulations in a computer program? These facts lead to an idealized model for simulation purposes. Like the human brain, a neural net also consists of neurons and connections between them. The neurons are transporting incoming information on their outgoing connections to other neurons. In neural net terms these connections are called weights. The “electrical” information is simulated with specific values stored in those weights. By simply changing these weight values the changing of the connection structure can also be simulated.
As you can see, an artificial neuron looks similar to a biological neural cell. And it works in the same way, input is sent to the neuron on its incoming weights. This input is information called the propagation function that adds up the values of all incoming weights. processed by a threshold value by the neuron’s activation function. The resulting value is compared with a certain if the input exceeds the threshold value, the neuron will be activated, otherwise it will be inhibited. output on its outgoing weights to all connected neurons and so if acti- vated, the neuron sends an on. Figure 1(a) shows a neural net structure. In a neural net, the neurons are grouped in layers, called neuron layers. Usually each neuron of input one layer is connected to all neurons of the preceding and the following layer (except the layer and the output layer of the net). The information given to a neural net is propagated layer-by-layer from input layer to output layer hidden layers. Depending on the learning algorithm, it is also through either none, one or more possible that information is propagated backwards through the net.
Figure 1(b) shows a neural net with three neuron layers.
Note that this is not the general structure of a neural net. For example, some neural net types have no hidden layers or the neurons in a layer are arranged as a matrix, weight matrix, the what’s common to all neural net types is the presence of at least one connections between two neuron layers.
2.2. Types of Neural Nets
As mentioned before, several types of neural nets exist. They can be distinguished by their type (feedforward or feedback), their structure and the learning algorithm they use. The type of a neural net indicates, if the neurons of one of the net’s layers may be connected among each other. Feedforward neural nets allow only neuron con- nections between two different layers, while nets of the feedback type have also connections between neurons of the same layer.
2.3. Supervised and Unsupervised Learning
Neural nets that learn unsupervised have no such target outputs. It can’t be determined what the result of the
Figure 1. (a) Structure of a neuron in a neural net. (b) Neural net with three neuron layers.
learning process will look like. During the learning process, the units (weight values) of such a neural net are “arranged” inside a certain range, depending on given input values. The goal is to group similar units close to- gether in certain areas of the value range. This effect can be used efficiently for pattern classification purposes.
3. The Self-Organizing Mapping (SOM)
3.1. Competitive Learning and Clustering
Competitive learning is a learning procedure that divides a set of input patterns in clusters that are inherent to the input data. A competitive learning network is provided only with input vectors and thus implements an unsupervised learning procedure. We will show its equivalence to a class of ‘traditional’ clustering algorithms shortly.
3.2. Winner Selection: Euclidean Distance
In the competitive structures, a winning processing element is determined for each input vector based on the similarity between the input vector and the weight vector.
To this end, the winning neuron is selected with its weight vector closest to the input pattern , using the Euclidean distance measure:
the winning unit can be determined by
where the index refers to the winning unit.
Once the winner has been selected, the weights are updated according to:
Note that only the weights of winner are updated. The weight update given in Equation (1.20) effectively implement a shift to the weight vector towards the input vector .
3.3. Cost Function
Earlier it was claimed, that a competitive network performs a clustering process on the input data, i.e., input patterns are divided in disjoint clusters such that similarities between input patterns in the same cluster are much bigger than similarities between inputs in different clusters. Similarity is measured by a distance function on the input vectors, as discussed before. A common criterion to measure the quality of a given clustering is the square error criterion, given by
where is the winning neuron when input is presented. The weights are interpreted as cluster cen- ters. It is not difficult to show that competitive learning indeed seeks to find a minimum for this square error by following the negative gradient of the error-function  :
Theorem 3.1 The error function for pattern is
where is the winning unit, is minimized by the weight update rule
Proof 1 We calculate the effect of a weight change on the error function. So we have that
where is a constant of proportionality. Now, we have to determine the partial derivative of :
which is Equation (1.6) written down for one element of . Therefore, Equation (1.4) is minimised by re- peated weight updates using Equation (1.6).
3.4. Winner Selection: Dot Product
For the time being, we assume that both input vectors and weight vectors are normalised to unit length. Each output unit calculates its activation value according to the dot product of input and weight vector:
In a next pass, output neuron is selected with maximum activation
Activations are reset such that and .
This is the competitive aspect of the network, and we refer to the output layer as the winner-take-all layer. The winner-take-all layer is usually implemented in software by simply selecting the output neuron with highest activation value.
We now prove that Equation (1.1) reduces to (1.10) and (1.11) if all vectors are normalised.
Proposition 3.2 Let , and be a normalised vectors, and let be selected such that
Proof 2 Let be a normalised input vector and be the winning unit is determined by 1.1 the minimum of the quantity , i.e.
then, we have that,
where , and .
The Euclidean distance norm is therefore a more general case of Equations (1.10) and (1.11).
It can be shown that this network converges to a situation where only the neuron with highest initial activation survives, whereas the activations of all other neurons converge to zero. Once the winner has been selected, the weights are updated according to:
where the divisor ensures that all weight vectors are normalised. Note that only the weights of winner are updated. The weight update given in Equation (1.20) effectively rotates the weight vector towards the input vector . Each time an input is presented, the weight vector closest to this input is selected and is subsequently rotated towards the input. Consequently, weight vectors are rotated towards those areas where many inputs appear: the clusters in the input.
Previously it was assumed that both inputs and weight vectors were normalised. Using the activation function given in Equation (1.10) gives a “biological plausible” solution. Figure 2(b) shown how the algorithm would fail if unnormalised vectors were to be used.
An almost identical process of moving cluster centres is used in a large family of conventional clustering algorithms known as square error clustering methods, e.g., k-means, forgy, isodata, cluster. From now on, we will simply assume a winner is selected without being concerned which algorithm is used.
4. The Inverse Problem of Fractals
The term Fractals was coined by Mandelbrot in 1975 to such sets, from the Latin word fractus, meaning broken. He provided a precise technical definition  : “fractal is a set with Hausdorff dimension strictly greater than its topological dimension”. Instead of giving a precise definition of fractals which almost exclude some interesting cases, it is better to regard a fractal as a set that has the properties  :
1) Has a “fine” structure.
2) Has some type of self-similarity.
3) Difficult to be described globally or locally by the classic Euclidean geometry.
4) Usually has a non-integer dimension.
5) Defined by a simple model that can be rendered recursively or iteratively.
6) Is usually a strange attractor for a dynamical system.
4.1. Iterated Function System
Barnsleyin 1988 introduced the iterated function system (IFS)  as an applications of the theory of discrete dynamical systems and useful tools to build fractals and other self-similar sets. The mathematical theory of IFS is one of the basis for modeling techniques of fractals and is a powerful tool for producing mathematical fractals
Figure 2. (a) The selection failed in (b) if unnormalised.
such as Cantor set, Sirpinski gasket, etc, as well as real word fractals representing such as clouds, trees, faces, etc. IFS is defined through a finite set of affine counteractive mapping mostly of the form:
In particular case, two-dimensional affine maps have the following form:
This map could be charaterized by the six constants a, b, c, d, e, f, which establish the code of f.
4.2. Fractal Inverse Problem
The fractal inverse problem is an important research area with a great number of potential application fields. It consists in finding a fractal model or code that generates a given object. This concept has been introduced by Barnsley with the well known collage theorem . When the considered object is an image, we often speak about fractal image compression. A method has been proposed by Jacquin  to solve this kind of inverse problem.
4.3. Collage Theorem on
Since the number of points in fractal sets is infinite and complicatedly organized, it is difficult to specify exactly the generator IFS. From the practical point of view, it will be acceptable for the required IFS to be chosen such that its attractor is close to a given image for a pre-defined tolerance.
The collage theorem is very useful to simplify the inverse problem for fractal images , it has been addressed by many researchers as well .
Theorem 4.1 Let be a hyperbolic IFS with contractivity factor s, and W be the associated hutchinson map then
where is the fixed point of .
Hence if then
Proof 3 see .
The theorem can be used as following. Given a fractal image , find a set of contractive mappings that maps into smaller copies of itself such that the union of the smaller copies is close as to the target image. The determined contractions are the IFS codes with corresponding Hutchinson operator .
The theorem states that, the attractor of the determined IFS approximates the target image (i.e.,
. It also implies that, the more accurately the IFS maps the image to itself, the more accurately
the IFS approximates the image.
5. Fractal Image Compression by Means of Kohonen Network
The Kohonen layer is a Winner-take-all (WTA) layer. Thus, for a given input vector, only one Kohonen layer output is 1 whereas all others are 0. No training vector is required to achieve this performance. Hence, the name: Self-Organizing Map Layer (SOM-Layer).
Let is a vector of the intensity of the range-block , and is a vector of the intensity of the range-block which is transformed to the size of the corresponding range-block:
where . Let be an operator of orthogonal projection which projects on the orthogonal complement is a linear envelope of the vector , for we shall define the operator:
Theorem 5.1 Assume that and . Let us define the function in the following way:
For the minimum distance will be determined by the formula:
Proof 4 See .
This theorem means that the less of the distance the less of the Error . Let and , which we call range--vector and domain--vector receptively.
Let us apply the kohonen network algorithm to cluster the domain vectors. In the beginning, domain vectors will train the network (learning), and, later, the range vectors will input the network to choose the optimal domain.
5.1. Global Codebook
The idea of the global codebook is to assign a fixed domain pool for the entire range pool or for a specific class of it (e.g. set of range blocks that have the same size in a quad tree partition) . In the global codebook each rang block in the range pool has its own domain pool, and this domain pool can be selected by many ways. One of the methods of selecting the domain pool for a range block is to construct a set of domain blocks that are spatially close to the range block .
5.2. Ranges Filtering Algorithm
1) set i = 0.
2) if goto 5.
3) set the arguments of this range (position and the mean are enough) to the first codebook.
4) omit from and mark it as one typed.
6) if goto 2.
5.3. (Training) Domains Clustering Algorithm
1) Network initialization. Let the neurons of the net are
where is the cluster (neuron) number of neurons and is the weight vector of the (neuron) number which initialized to random domain.
2) Search the nearest cluster for . chose the winner neuron . find such that:
and add index of the vector into the corresponding memory .
3) Update the weight vector of the winning neuron by the following way:
Table 1. Results of the classic algorithm (without neural networks) vs the neurofractal.
5.4. (Encoding) Ranges Matching Algorithm
1) for each cluster let and
2) set , .
3) let .
4) if then and
5) if then goto 7.
6) if goto 3 else goto 5.
7) set the to the codebook.
6. Results and Discussion
A gray level images of size have been considered for training the network. A range pool is created having ranges of size and domain blocks.
The computer simulations have been carried out in Visual C# environment on Pentium Dual CPU with 1.73 GHz and 2.00 GB RAM and the results have been presented in Table 1.
Table 1 compares fractal image compression results where the standard scheme is introduced in  and the self-organizing method. The neural method classifies domains using the self-organizing neural network approach, in each case, a total of 320 domain cells were used. A larger number of domains would have increased encoding times and provided marginally better compression ratios. The self-organizing method is faster than the filtered ranges method and therefore faster than the baseline method.
 Bouboulis, P., Dalla, P.L. and Drakopoulos, V. (2006) Image Compression Using Recurrent Bivariate Fractal Interpolation Surfaces. International Journal of Bifurcation and Chaos, 16, 2063-2071.
 Bressloff, P.C. and Stark, J. (1991) Neural Networks, Learning Automata and Iterated Function Systems. In: Crilly A.J., Earnshaw, R.A. and Jones, H., Eds., Fractals and Chaos, Springer-Verlag, 145-164.
 Saupe, D., Hamzaoui, R. and Hartenstein, H. (1996) Fractal Image Compression—An Introductory Overview. In: Saupe, D. and Hart, J., Eds., Fractal Models for Image Synthesis, Compression and Analysis, ACM, New Orleans, SIGGRAPH’96 Course Notes 27.