Researchers have been interested in the relationships between fractals and DNA structures for years. Just recently, Anitas and Slyamov  studied multiscale fractal representing DNA sequences using small-angle scattering analysis. Cattani and Pierro  conducted a multifractal analysis of binary images of DNA in order to define a methodological approach to the classification of DNA sequences. Badea and her collaborators  characterized the geometry of some medical images of tissues in terms of complexity parameters such as the fractal dimension (FD). Carlo Cattani presented analysis of DNA based on the indicator matrix together with some elementary approach to a fractal estimate of DNA sequences in the book  edited by Elloumi and Zomaya. Albrecht-Buehler  identified explicitly the GA-sequences as a class of fractal genomic sequences. Ainsworth  investigated how the cell’s nucleus holds molecules that manage human’s DNA in the right location. In a book edited by Crilly, Earnshaw and Jones, Voss applied standard spectral density measurement techniques to demonstrate the ubiquity of low frequency noise and long range fractal correlations.
The study of the genome or DNA sequences through fractal analysis is very interesting. DNA sequences can be seen as sequences over the alphabet . Subsequences that do not appear in DNA are considered as forbidden words. A visualization method of the forbidden words in      has been designed by B.-L. Hao since 2000. This method is now called Hao’s frame representation. Recently, C.-X. Huang and S.-L. Peng discussed this method in detail, and many beautiful graphics were provided in   . From these geometric intuitions, it can be observed that these forbidden words demonstrate certain fractal properties. In fact in this work we generated some amazing fractal graphs associated with DNA sequences with forbidden words as shown in Figure 1.
It is important to explore the fractal generating mechanism that is associated with the forbidden words in the sequence. H. J. Jeffrey   and P. Tiňo   tried to associate the forbidden words with the IFS (Iterated Functions Systems) using chaos game algorithm. Denote as the set of all finite sequences over . Then how to find a generating formula or the mapping , where w is a sequence that does not contain forbidden subsequences, or corresponding iteration method? As was pointed out by P. Tiňo, the IFS is a multifractal and therefore the generating formula would be relatively complicated.
In order to detect the structures of some symbolic sequences, one has to find the properties of their topology and metric and be able to visualize these sequences. To do this, we have to provide a type of graphical representation together with their topology and metric properties so that we can directly reveal their corresponding fractal graphs. This kind of representation method is important and necessary.
For an alphabet with cardinal 3, the well known CGR method (that is, Chaos Game Representation method) was first introduced by M.F. Barnsley by considering the points in an equilateral triangle. The substrings of a string were shown graphically (see  ). For an alphabet with cardinality 4, the CGR method was later generalized by H.J. Jeffrey so that the DNA sequences can be visualized (see   ). The authors have transformed the DNA sequences into pseudo random walk in a 2-dimensional plane or in a 3-dimensional space    . We notice here that an iterated function system can be applied to construct a graphical representation of some DNA sequences   . The points in the unit square can be used to denote the substrings of the DNA sequences. Consequently, the four vertices of the unit square are labelled as .
In application, the frame representation method proposed by Hao et al. is more intuitive and visual   . The unit square is divided equally with vertical and horizontal lines so that there are congruent small squares with side length and area . For the alphabet with cardinality 4, each small square of side length is used to denote the string in regularly (See 1-, 2- and 3-frame graphs in Figures 2(a)-(c)).
Figure 1. Graphs of some forbidden words.
(a) (b) (c)
Figure 2. The frame representation method of B.L. Hao et al. (a) 1-frame graph; (b) 2-frame graph; (c) 3-frame graph.
With the frame representation method of B.L. Hao, the repetition topology structure of the subsequences (i.e. the strings in ) of a DNA sequence can be easily visualized and efficiently drawn. The avoided or the under-represented short strings in the genome sequence form the forbidden words. These forbidden words are the reasons or the basis of the constructed fractals.
P. Tino   proved the equivalence of the CGR method and the frame representation method of B.L. Hao et al. He noted that the cardinality of an alphabet can be generalized to a square integer ( simultaneously for some integer b). We will in this paper extend the above methods and relax the restriction to the cardinality of an alphabet.
The order of this paper is as follows. In Section 2, we will first convert the problem into the discussion on certain type of generalized Cantor set, which can naturally correspond to multifractals, and then in Section 3, we will induce Hao’s frame representation according to the principle that the correspondence between line segment and unit square is one-to-one  . Several examples, along with their fractal graphs, of some generalized Cantor sets are given at the end of this paper.
2. Forbidden Words and the Generalized Cantor Set
Rewrite the alphabet as . We first give the following definition.
Definition 2.1 Let . Denote B as the set consist of l finite sequences with length :
Then call the infinite sequences over
the DNA sequence with no forbidden words B, a.k.a. allowed sequence.
It is known that when is expanded in ternary representation, the subset in
is called the Cantor set. Similarly, with quaternary expansion, we give the following definition.
Definition 2.2 When is represented in quaternary expansion
the generalized Cantor set.
Apparently, the discussions on DNA sequences (1) (2) that contain no forbidden words B can be converted into the discussion on the generalized Cantor set .
Let , , , and
Then, the condition in Definition 2.2 can be rewritten as
Theorem 2.1 The generalized Cantor set can be inducted by using an iteration method.
Proof. In fact, for the th step of the quaternary expansion of , there is
Substitute (7) into (6),
In general, we let
and as , we obtain the generalized Cantor set (2.2).
, are intervals in with length . From the iteration Equation (7) in the theorem, the iteration acts differently on the l subintervals than on the intervals. Hence we have  .
Corollary 2.3 The generalized Cantor set is multifractal.
Proof. In the construction of the generalized Cantor sets , measures on removed portions are redistributed to the neighboring sections repeatedly. Thus is multifractal.
Obviously, the generalized Cantor sets are applicable for all p-carry representation (p is an integer).
3. The Hao’s Frame Representation of the Generalized Cantor Set CG
The theoretic foundation of the construction of DNA sequences can be seen in  . The subintervals in the quaternary expansion of can be one-to-one corresponding to the subsquares that are obtained by repeatedly equally dividing the unit square (and its subsquares) into 4 smaller subsquares. Cantor sets are created in one dimension in while Sierpinski sets are constructed in two dimension within . Using the corresponding relationship between the unit interval and the unit square, we can convert the discussion on the generalized Cantor sets into the discussion on the generalized Sierpinski sets on the unit square.
Let . The binary expansion of is
The expansion can be related to the quaternary expansion of as follows:
Thus the forbidden words in can be represented as
Definition 3.1 Let and the binary expansion of is (10). Then call
the generalized Sierpinski set that corresponds to the the generalized Cantor set .
Theorem 3.1 The generalized Sierpinski set can be inducted by iterating method.
Proof. The th binary expansion of is
Substitute (15) into (14), we have
Noticing the corresponding relationship between numbers and the subsquares, naturally we have Hao’s frame representation. The second-order Hao’s frame representation can be inducted from the corresponding relationship illustrated in Figure 3.
The next few examples illustrate analytic structure of some DNA sequences along with the fractal graphs of the relevant generalized Cantor sets.
Example 3.2 Let , . Then . Hence the arithmetic expression of the generalized Cantor set is
And the symbolic sequence is
which is shown graphically in Figure 4.
Example 3.3 Let , . Then . Hence the arithmetic expression of the generalized Cantor set is
And the symbolic sequence is
with graphs Figure 5:
Example 3.4 Let , . Then . Hence the arithmetic expression of the generalized Cantor set is
And the symbolic sequence is
which are shown below
Figure 3. Hao’s frame representation of .
Figure 4. .
Figure 5. .
Figure 6. .
Figure 7. Other examples.
We established relations between the generalized Cantor sets and some DNA sequences with missing words. And we have associated Hao’s frame representations and the generalized Sierpinski set with the generalized Cantor sets. The authors are interested in applying the analytical representation method to study the graphical results of space filling research works (cf.    ).
 Mircea Anitas, E. and Slyamov, A. (2017) Structural Characterization of Chaos Game Fractals Using Small-Angle Scattering Analysis. PLoS ONE, 12, e0181385.
 Badea, A.F., et al. (2013) Fractal Analysis of Elastographic Images for Automatic Detection of Diffuse Diseases of Salivary Glands: Preliminary Results. Computational and Mathematical Methods in Medicine, 2013, Article ID: 347238.
 Hao, B.-L., Xie, H.-M. and Chen, G.-Y. (2000) Factorizable Language: From Dynamics to Bacterial Complete Genomes. Physica A: Statistical Mechanics and Its Applications, 288, 10-20.
 Hao, B.-L., Xie, H.-M., Chen, G.-Y. and Chen, G.-Y. (2000) Avoided Strings in Bacterial Complete Genomes and a Related Combinatorial Problem. Annals of Cominatorics, 4, 247-255.
 Yu, Z.-G., Hao, B.-L., Xie, H.-M. and Chen, G.-Y. (2000) Dimensions of Fractals Related to Languages Defined by Tagged Strings in Complete Genomes. Chaos, Solitons & Fractals, 11, 2215-2222.
 Huang, C.X. and Peng, S.L. (2008) Fractals of Forbidden Words and Approximating Their Box Dimensions. Physica A: Statistical Mechanics and Its Applications, 387, 703-716.
 Tiňo, P. (1999) Spatial Representation of Symbolic Sequences Iterative Function Systems. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 29, 386-393.
 Tiňo, P. (2002) Multifractal Properties of Hao’s Geometric Representations of DNA Sequences. Physica A: Statistical Mechanics and Its Applications, 304, 480-494.
 Berthelsen, C.L., Glazier, J.A. and Skolnik, M.H. (1992) Global Fractal Dimension of Human DNA Sequences Treated as Pseudorandom Walks. Physical Review A, 45, 8902-8913.
 Stanley, H.E., Buldyrev, S.V., Goldberger, A.L., Goldberger, Z.D., Havlin, S., Mantegna, R.N., Ossadnik, S.M., Peng, C.K. and Simons, M. (1994) Statistical Mechanics in Biology: How Ubiquitous Are Long-Range Correlations? Physica A: Statistical Mechanics and Its Applications, 205, 214-253.
 Solovyev, V.V., Korolev, S.V. and Lim, H.A. (1993) A New Approach for the Classification of Functional Regions of DNA Sequences Based of Fractal Representation. International Journal of Genomic Research, 1, 109-128.
 Bagga, S., Girdhar, A., Trivedi, M.C. and Yang, Y.Z. (2016) RMI Approach to Cluster Based Cache Oblivious Peano Curves. 2016 Second International Conference on Computational Intelligence & Communication Technology, Ghaziabad, India, 12-13 February 2016, 89-95.
 Platos, J., Kromer, P. and Snasel, V. (2015) Efficient Area Association Using Space Filling Curves. 2015 International Conference on Intelligent Networking and Collaborative Systems, Taipei, 2-4 September 2015, 322-326.