Proteins are called the workhorse molecules of life, playing a crucial role in essentially every activity of living organisms. A protein molecule is made from one or more long chains of amino acids, which normally folds into a well-defined three-dimensional structure. It is the precise shape of the folded structure that determines the function of proteins in a cell.
Most cellular processes are not carried out by random collisions between freely diffusing proteins. Proteins usually interact with other proteins and assemble into complexes to carry out their function    . It is therefore crucial to understand and control the formation of protein complexes for understanding biological activity in the cell. In particular, structural characterization of the components of complexes, such as shape complementarity at protein- protein interfaces, is the key to understanding the function of proteins.
In the last two decade, huge number of protein structures is experimentally determined via high-throughput structural genomics pipelines. However experimental determination of their functions is lagged far behind the pace due to the labor-intensive and time-consuming nature of the process. Urgently needed are improved computational approaches to function prediction of the proteins with known structure  .
It is however extremely difficult to describe the shape of proteins without visual inspection on a three-dimensional display. The fundamental question is how to describe the geometry of such a highly complicated shape as proteins.
In most of previous studies, the surface of proteins is described using concepts developed in computational geometry and topology, such as the Voronoi diagram, the Delaunay simplices, and the alpha shape representation    . As for protein complexes, the topological arrangement of their subunits is usually represented as a graph   .
The Hamiltonian cycle problem on a regular triangular mesh: a) A region in a regular triangular lattice. b) A Hamiltonian cycle through the region.
An extended version of the Hamiltonian cycle problem on a regular triangular mesh: de nove design of complexes of closed trajectories of triangles. Shown are all the three sets of closed trajectories of triangles which cover the specified region. In this case, the region has no Hamiltonian cycle.
In this paper, we propose a novel mathematical toy model which is intended for the structural study of protein complexes. While physics and mathematics have been inspired each other in their long relationship, the relationship between biology and mathematics is still to come. In our case, it is the relation between real protein complexes and the new mathematical toy model. That is, it is critical to justify why such new toy models are indeed relevant and practically useful.
To justify the usefulness of mathematical tools in biology, I’d like to mention the case of the Estrada index introduced by Ernesto Estrada  in 2000. The Estrada index was originally proposed as a molecular structure descriptor, and the protein structure has been investigated by using the Estrada index and the normalized Laplacian Estrada index  extensively in mathematics in the past decade. The Estrada indices have also found a range of applications in chemistry and complex networks. These days, a dynamic version of the Estrada indices are proposed  to study large-scale time-evolving networks which arise naturally in a variety of areas from peer-to-peer telecommunication to online human social behavior to neuroscience.
As for other mathematical approaches to protein structure analysis, most of them are application of known mathematical techniques to the structural study of proteins, such as, distance geometry  , the knot theory  , and persistent homology  . Differential geometric techniques are also applied to the analysis of the backbone structure of proteins  .
In our model, instead of open chains of amino acids, we consider closed trajectories of n-simplices using the discrete differential geometry of n-simplices ( )   . Then, interaction of open chains of amino acids (i.e., proteins) is mimicked with “recombination”, such as fusion and fission, of closed chains of n-simplices. The advantage of our model lies in the correspondence between the shape of a complex of closed trajectories of n-simplices and (a projection image of) the intersection of a pair of -dimensional cones.
Using the mathematical toy model, we will consider the problem of designing protein complexes from scratch (de novo design of protein complexes    ). That is, we will consider the problem of finding a set of closed trajectories of n-simplex that forms a specified n-dimensional shape: de nove design of complexes of closed trajectories of n-simplices. For simplicity, we consider the case of only.
The problem we consider here is an extended version of the Hamiltonian cycle problem on a regular triangular mesh. A Hamiltonian cycle of triangles (i.e., 2- simplices) is a closed trajectory through a given triangular mesh which visits each triangle exactly once, where the trajectory passes triangles through a com- mon edge. As shown in Figure 1(a), meshes are given as a region in a two-di- mensional regular triangular lattice. In this case, a Hamiltonian cycle is obtained as shown in Figure 1(b).
To study the formation of a complex of closed trajectory of triangles, we consider not only a single but also multiple closed trajectories of triangles to cover the given region. In the case of Figure 2, two closed trajectories are required.
In what follows, we will propose a novel method for finding all the sets of closed trajectories which cover a given region of triangles.
3. Differential Structure on the Mesh
To define a differential structure on a regular triangular mesh, we stack unit cubes diagonally in the three-dimensional Euclidean space (Figure 3(a)).
By piling up unit cubes orderly in the direction of in , a “mountain range-like shape object” consisting of multiple triangular cones is obtained as shown in the upper part of Figure 3(c). If we draw a thick straight
Figure 1. The Hamiltonian cycle problem on a regular triangular mesh. (a) A region in a regular triangular lattice; (b) A Hamiltonian cycle through the region.
line diagonally on the three upper faces of each unit cube, we will obtain a “drawing” on the slope of the mountain range-like shape object (Figure 3(a) and Figure 3(d)). It is the drawing which specifies a flow of “slant” triangles (along the thick polygonal lines) on the slope.
Then, we define a flow of “flat” triangles on a plane which is perpendicular to the direction of in by projecting the flow of “slant” triangles on the plane (the lower part of Figure 3(c))). In the case of Figure 3(c), we obtain a closed trajectory of flat triangles of length 30 and others. In this section we give the precise definition of the differential structure on a regular triangular mesh.
For space saving purposes, we use monomial in indeterminates and to represent the coordinate of points in the three-dimensional Euclidean space. For example, point is identified with monomial , where denotes the set of all integers. Then, points , and are represented by monomials , and respectively. (Note that for all pairs of i and j.)
3.1. Triangle Tiles
Shown in the upper part of Figure 3(a) is a unit cube with a thick straight line drawn diagonally on each of the upper three faces, which is located at the origin
(a) (b) (c)
Figure 2. An extended version of the Hamiltonian cycle problem on a regular triangular mesh: de nove design of complexes of closed trajectories of triangles. Shown are all the three sets of closed trajectories of triangles which cover the speciﬁed region. In this case, the region has no Hamiltonian cycle.
(a) (b) (c) (d)
Figure 3. Differential structure on a regular triangular mesh. (a) A unit cube and its projection on a plane perpendicular to the direction of in ; (b) The pro- jection of “slant triangles” onto a “flat triangle”; (c) A “mountain range-like shape object” obtained by piling up unit cubes orderly along the diagonal direction, whose peaks are , , , and ; (d) A “drawing” on the slope of the mountain range-like shape object of (c).
O of a three-dimensional Cartesian coordinate system defined by three axes , and . Let , , , and . Then, the upper face on the -plane is divided into two “slant triangles”, and , by the line segment . The other upper faces are also divided into two “slant triangles” similarly.
Shown in the lower part of Figure 3(a) is the projected image of the unit cube on a plane which is perpendicular to the direction of in . The unit cube at O is projected onto a hexagon, which is divided into six “flat triangles” by the image of the three thick line segments on the cube.
The schematic drawing of Figure 3(b) shows the projection of slant triangles onto a flat triangle. Using the projection, we will define a discrete differential structure on the set of flat triangles, i.e., a regular triangular mesh.
Let be the symmetric group on a finite set of three symbols. For and , let denote the convex hull of three points , and , i.e.,
where denotes the set of all real numbers.
For example, the “slant triangle” defined above is denoted by , where and .
Definition 3.1 We define the set of all slant (triangle) tiles by
The set of all flat (triangle) tiles is defined as the quotient of by “shift operator” , i.e.,
We identify with the projection image of “slant triangles” on a plane perpendicular to vector mentioned above. Then, the schematic drawing of Figure 3(b) shows the equivalence class of a slant tile and the corresponding flat tile .
3.2. Tangent Space at a Flat Triangle Tile
A tangent bundle-like local structure is defined on by
Let . Then, we obtain
Definition 3.3 (Tangent space) For , we call the tangent space of at .
Definition 3.4 For , the gradient of is defined by
Then, we can identify with
via the one-to-one correspondence
Note that the monomial of corresponds to the direction of the thick line on the “slant triangle” which is described in subsection 3.1 above (Figure 3).
3.3. Vector Field on B2
Having defined a tangent-bundle like structure on a set of triangles, now we consider the inverse of the projection map .
Definition 3.5 A section of is a map such that
For a section of , the value of on is given by
for some . Let and be two adjacent slant tiles of in defined by
(a) (b) (c) (d)
Figure 4. Local trajectory. (a) The local trajectory specified by . and ; (b) The smoothness condition on a section . Colored gray is and colored white is . Shown above are the gradient of the white tile. The gradient of the gray tile is ; (c) Smooth sections of on a hexagonal region composed of six flat tiles; (d) Sections of which dose not satisfy the smoothness condition. The corresponding sin- gular flat tiles are colored gray in the lower part.
Then, the set of three slant tiles, , makes up a “continuous mountain path” along the thick polygonal line (i.e., along the gradient ) at in (Figure 4(a)). By projecting these slant tiles on , we obtain a trajectory of flat tiles of length three at .
To consider the “smoothness” of the section , we firstly define a local trajectory passing through as follows.
Definition 3.6 Let . The local trajectory specified by is the set
of three consecutive flat tiles passing though .
Let be a section on . Then, ( ) can assume one of the three values of the corresponding tangent space . For example, can assume one of the three values of
where and .
However some of the slant tiles are not connected smoothly to in . In this case,
is not connected smoothly to as shown in Figure 4(b).
To obtain a “smooth” trajectory, we will impose a condition on sections of .
Definition 3.7 (Smoothness condition) Let be a section of and . Let , where . The smoothness condition on at is defined by
In what follows, we will only consider the sections of which satisfies the smoothness condition at every flat tiles of .
Remark corresponds to (the direction of) the contact edge between and . corresponds to (the direction of) the contact edge between and .
Definition 3.8 (Vector field) A vector filed on is a section of which satisfies the smoothness condition at every flat tiles of .
Shown in Figure 4(c) are all the six types of “local” smooth sections of on a hexagonal region composed of six flat tiles of . By patching these “local” sections together, we will obtain a “global” section of .
Note that some of the “global” sections do not satisfy the smoothness condition as shown in Figure 4(d). The singular flat tile of a section of is the flat tile where dose not satisfy the smoothness condition. A singular flat tile is assigned either no gradient (i.e., without thick edge), two gradients (i.e., two thick edges), or three gradients.
Let be a trajectory of a vector field V, where I is a subset of the set of natural numbers. Then, we can define the second derivative of the trajectory as follows.
Definition 3.9 The second derivative of V along is a binary-valued (U or D) function defined by
where and .
In  , the conformation of a protein backbone structure is encoded into a 16-valued sequence using the second derivative of trajectories of tetrahedrons (i.e., 3-simplices).
3.4. Vector Fields Induced by Tangent Cones
In the beginning of this section, we constructed a “mountain range-like shape object” by piling up unit cubes diagonally. (Using the terminology defined above, it is a section of .)
Unit cubes are piled up to form a union of triangular cones, which can be identified by its top vertexes. For example, the object shown in the upper part of Figure 3(c) is identified by five peaks , , , , and .
Definition 3.10 For , a tangent cone is defined by
We denote the set of all the “top vertexes” of by .
Then, the mountain range-like shape object of Figure 3(c) is given by
For a tangent cone c, let be the set of all the slant tiles on the surfaces of c, i.e.,
For and ( ), set
Then, is given as follows.
Lemma 3.11 For ( ),
Proof. Let . Then, for some . is the coordinate of z with respect to “origin” p. In particular,
The result follows immediately.
The surfaces of a tangent cone c induce a vector field of as follows.
Definition 3.12 For ( ), a vector field induced by c is defined by
where such that . The value is uniquely determined at every flat tile of .
For example, in Figure 3(c), the thick polygonal lines on the surfaces of the tangent cone shows the vector field induced by the tangent cone.
Note that all the smooth sections shown in Figure 4(c) are induced by a tangent cone as indicated in the figure.
Proposition 3.13 For any vector field V, there exists a tangent cone c such that .
Proof. Let V be a vector filed on and let be a decomposition of into hexagonal regions of six flat tiles:
For , we let denote the restriction of V on the hexagon .
Because of the smoothness condition, V is locally induced by a tangent cone as shown in Figure 3(c). That is, there exists a tangent cone for each such that , i.e.,
Moreover, by considering all combinations, we can assume
for any pair of adjacent hexagons and , where denote the union of two tangent cones and , i.e., .
Suppose that for . Then,
In particular, such that
In other words, tangent cone is (partially) covered by tangent cone on .
Then, there exists a circular loop of hexagons of around such that
where is the set of all the hexagons of contained in the circular region surrounded by . Because of the shape of the tangent cones, such that is adjacent to and is (partially) covered by on , i.e.,
which is in contradiction to Equation (1).
4. The Boundary of a Closed Trajectory
Now let’s go back to our problem described in section 2. Using the terminology given in section 3, the problem is stated as follows.
Problem 4.1. (De nove design of complexes of closed trajectories of triangles) Given a region in , find all the vector fields on which give a decomposition of the region into closed trajectories.
If there exists such a vector field, we can describe the boundary of the region using a pair of three-dimensional cones as explained in this section.
The cones are defined in another lattice which is associated with . Recall that the three-dimensional lattice is generated by , and . The associated lattice is defined as follows.
Definition 4.2. The conjugate lattice is the lattice which is generated by , and .
Note that the gradient of a slant tile corresponds to one of the three coordinate axes of the conjugate lattice . In particular, a trajectory of slant tiles corresponds to a zig-zag walk (with gaps) on the grid of .
Two types of cones are defined in :
Definition 4.3. For , a cotangent cone is defined by
For , a cotangent roof is defined by
In other words, is obtained by putting as many unit cubes as possible on .
For example, shown in Figure 5(c) is
(a) (b) (c)
Figure 5. Cotangent roofs associated with a closed trajectory on . (a) The boundary of the closed trajectory of Figure 3(c); (b) The cotangent roof of the region, where , and ; (c) The inverted cotangent roof of the region, where and .
Inverted cones are also defined similarly:
Definition 4.4. For , an inverted cotangent cone is defined by
For , an inverted cotangent roof is defined by
For example, shown in Figure 5(c) is
Then, the boundary of a closed trajectory of a vector field on can be described using a pair of a cotangent roof and an inverted cotangent roof as shown below.
Let w be a cotangent cone. We denote by the set of all the lattice points of which resides on the surface of the cone w. is called the boundary lattice points of w. The boundary lattice points of an inverted cotangent cone is also defined in the same manner.
Proposition 4.5. Let be a vector field of induced by a tangent cone c whose top vertexes are in . Let ( ) be a closed trajectory of . Let be the region swept by the trajectory .
Then, there exist a cotangent cone w and an inverted cotangent cone such that the boundary of is uniquely specified by the intersection of and .
The pair is called a boundary pair (of the region ) and the specified region is denoted by , i.e. .
Proof. Let be a subset of such that
Because of the smoothness condition, we may assume slant tiles of are connected “smoothly” in without any gap. Let A be the set of all vertexes of the slant tiles of . Define cones w and iv by
Then, the boundary of is obtained by connecting the points of on , where denotes the projection of the lattice points of on the corresponding vertexes of flat tiles of .
Remark is not defined if .
For example, in the case of the closed trajectory given in Figure 3(d), the boundary of is uniquely specified by
as shown in Figure 5.
Corollary 4.6. Let R be a region in . Then, R has a closed-trajectory decomposition if and only if there exists a pair of a cotangent cone w and an inverted cotangent cone iv such that . The pair is also called a boundary pair (of R).
Proof. ( ) Let ( ) be a closed-trajectory decomposition of R and let be their boundary pairs. Set
( ) A closed-trajectory decomposition of R is induced by .
Remark Let be the set of all tangent cones. Let be the set of all cotangent cones. Let be the set of all the regions in which are defined by boundary pairs. Then, an -valued function is defined on by := “the region in which is specified by the intersection of and “. In particular,
for a boundary pair .
5. Extended Hamiltonian Cycle Problem on B2
By Corollary 4.6, we can paraphrase Problem 4.1 as follows.
Problem 5.1. (De nove design of complexes of closed trajectories of triangles) Given a boundary pair , find all the tangent cones which induce such a vector field that gives a decomposition of the region into closed trajectories (Figure 6).
Figure 6. The extended Hamiltonian cycle problem on (See also Figure 1). (a) A pair of a cotangent cone and an inverted cotangent cone which specifies the boundary of a region: and , where ; (b) A tangent cone which induces such a vector field whose trajectories don’t traverse the specified boundary: , where .
One of the solutions to the problem is obtained immediately, i.e., (Figure 6(b)). In this section, we consider how to find all solutions to the problem.
5.2. Closed-Trajectory Decomposition
Definition 5.2. For , a tangent roof is defined by
In other words, is obtained by putting as many unit cubes as possible on .
Definition 5.3. For , a (tangent) ceiling is defined by
For , a (tangent) floor is defined by
It follows immediately that
where denotes the set of all the “top vertexes” of a cone c.
For a boundary pair , let be the set of all the tangent cones c such that
Now all solutions to Problem 5.1 are obtained as follows:
Proposition 5.4. induces all decompositions of into closed trajectories.
Proof. ( ) Let be the set of all the vector fields whose trajectories don’t traverse the boundary of . Then,
because . In particular, induces a decompositions of into closed trajectories.
( ) Given a decomposition of into closed trajectories. Then, it can be extended to a vector field V on . For example, a flow of triangles on is induced by
Then, a tangent cone c such that by proposition 3.13. Then
because trajectories of don’t traverse the boundary of .
5.3. Fusion and Fission of Closed Trajectories
For a vector field V on and a region R of , let be the set of all closed trajectories of V in R. gives a closed-trajectory decomposition of R if it exists. The number of the closed trajectories of is denoted by .
For a boundary pair , let and ( ) be two tangent cones of . Then, vector fields and induce two different decompositions of : and . The correspondence
gives a “recombination” of closed trajectories from one to the other. In particular, it gives a “fusion” and “fission” of closed trajectories on if
In the case of Figure 7, the region has three decompositions but no Hamiltonian cycle:
Moreover, it is not difficult to show the following proposition (in the case of closed trajectories of triangles):
(a) (b) (c)
Figure 7. All solutions to the extended Hamiltonian cycle problem for a boundary pair , where (See also Figure 2). Set . (a) The vector field induced by ; (b) The vector field induce by which is obtained by putting a cube (with top vertex ) on ; (c) The vector field induced by , which is obtained by putting a cube (with top vertex ) on the vector field of (b).
Proposition 5.5. When a closed trajectory is merged with a closed trajectory of length 6 (which occupies a hexagonal region), they don’t fuse together to form a single closed trajectory.
In other words, closed trajectories always split when they interact with a hexagon.
See Tabel 1 for the distribution of the length of closed trajectories of n-sim- plices ( ).
The distribution of the length of closed trajectories of n-simplices ( ). Two closed trajectories are identified if and only if their sequences of the second derivative coincide with each other by rotational shift, inversion, or reversion.
We have considered an extended version of a two-dimensional Hamiltonian cycle problem in a three-dimensional setting, where the boundary of a two-di- mensional region is uniquely specified by a pair of three-dimensional cones, i.e., a boundary pair. Using the discrete differential geometry of triangles, all decompositions of the region into closed trajectories of triangles are obtained immediately from the intersection of the boundary pair.
In the structural study of protein complexes, it is essential to characterize surface features such as bumps (convexity) and dents (concavity) of protein molecules. However mathematical surface characterization has not produced any satisfactory results so far, where the surface of protein molecules is usually studied in a three-dimensional setting.
This paper proposes a novel mathematical approach to the structural study of protein complexes, i.e., an approach from a four-dimensional setting, where the surface of protein molecules is to be described by a pair of four-dimensional cones (with multiple top vertexes) as in the case of complexes of closed trajectories of triangles.
In our approach, protein molecules are to be represented as closed trajectories of tetrahedrons, where shape complementarity is expressed inherently. In particular, we could define fusion and fission of molecules (i.e., closed trajectories) naturally.
As a future research subject, we are considering whether there exist any (algebraic) equations a given boundary pair satisfies. If there exists a set of such equations that specifies the given boundary pair, it is possible to represent the shape
Table 1. The distribution of the length of closed trajectories of n-simplices ( ). Two closed trajectories are identified if and only if their sequences of the second derivative coincide with each other by rotational shift, inversion, or reversion.
of a protein molecule as a solution for a system of equations. In particular, we would obtain another protein molecule of the same function if a given set of equations has more than one solution.
 Pereira-Leal, J.B., Levy, E.D. and Teichmann, S.A. (2006) The Origins and Evolution of Functional Modules: Lessons from Protein Complexes. Philosophical Transactions of the Royal Society B: Biological Sciences, 361, 507-517.
 Gerstein, M and Richards, F.M. (2001) Protein Geometry: Volumes, Areas, and Distances. In: Rossman, M.G. and Arnold, E., Eds., The International Tables for Crystallography, Vol. F, Chap. 22, Kluwer, Dordrecht, 531-539.
 Levy, E.D., Pereira-Leal, J.B., Chothia, C. and Teichmann, S.A. (2006) 3D Complex: A Structural Classification of Protein Complexes. PLoS Computational Biology, 2, e155.
 Perica, T., Marsh, J.A., Sousa, F.L., Natan, E., Colwell, L.J., Ahnert, S.E. and Teichmann, S.A. (2012) The Emergence of Protein Complexes: Quaternary Structure, Dynamics and Allostery. Biochemical Society Transactions, 40, 475-491.
 Goriely, A., Hausrath, A. and Neukirch, S. (2008) The Differential Geometry of Proteins and Its Applications to Structure Determination. Biophysical Reviews and Letters, 3, 77-101.
 Woolfson, D.N., Bartlett, G.J., Burton, A.J., Heal, J.W., Niitsu, A., Thomson, A.R. and Wood, C.W. (2015) De Novo Protein Design: How Do We Expand into the Universe of Possible Protein Structures? Current Opinion in Structural Biology, 33, 16-26.