Received 18 June 2016; accepted 8 August 2016; published 11 August 2016
For a given function, defined in a domain, let us calculate its partial derivatives in the logarithmic space:
where is an arbitrary point. If in the domain for all j, then v clearly is a
power function in of the form where. In this paper we study piecewise constant ap-
proximations of the quantities (1) or, in other words, nonlinear approximations of the function v by piecewise power functions.
This study is first of all motivated by applications in Systems Biology, where many networks can be de- scribed via compartment models
with the influx and efflux functions and, respectively.
For instance, in a typical metabolic network used in Biochemical Systems Theory the index refers to the n internal metabolites. The influx, resp. efflux function accounts for the rate (velocity) of a production (synthesis), resp. degradation of the metabolite.
Another important example is gene regulatory networks which in many cases can be described as a system of nonlinear ordinary differential equations of the form
where is the gene concentration (i = 1, ・・・, n) at time t, while the regulatory functions and depend on the response functions, which control the activity of gene k and which are assumed to be sigmoid-type functions  .
The derivatives in the logarithmic space are very important local characteristics of biological net- works. In Biochemical Systems Theory these derivatives are known as the kinetic orders of the function v, while in Metabolic Control Analysis (see e.g.  ) they are called elasticities. From the mathematical point of view, these quantities measure the local response of the function v to changes in the dependent variable (for instance, the local response of enzyme or other chemical reaction to changes in its environment). Thus, they describe local sensitivity of the function v, the terminology which is widespread in e.g. engineering sciences.
If all influx and efflux functions in (2) have constant kinetic orders, one obtains the so-called “synergetic system”, or briefly “S-system”:
where the exponents, represent all the (constant) kinetic orders associated with (4). The right-hand side of an S-system, thus, contains power functions, and analysis based on S-systems is, therefore, called “Power- Law (PL) Formalism”, see e.g.  -  ).
The Power-Law Formalism has been successfully applied to a wide number of problems, for example, to metabolic systems  , gene circuits  , signalling networks  . Such systems are very advantageous in bio- logical applications, as the systems’ format considerably simplifies mathematical and numerical analysis such as steady state analysis, sensitivity, stability analysis, etc. For instance, calculation of steady states for the S- systems is a linear problem (see  ). By these and other biological and mathematical reasons, it was suggested in  to classify such systems as “a canonical nonlinear form” in systems biology.
In many models, however, the kinetic orders may vary considerably. A typical example is a model coming from Generalized Mass Action
where the power functions describe the rates of the process no. r, while is a stoichiometric factor that stands for the number of molecules of produced, i.e. or. Collect- ing the processes in (5) in a net process of synthesis (positive terms) and a net process of degradation (negative terms) results in an aggregated system (2), which is not an S-system.
Another example of generic systems with non-constant kinetic orders stems from Saturable and Cooperativity Formalism  reflecting two essential features of biological systems, which gave the name to this formalism (see  for more details). In this case, the system (5) becomes
where and, , and are real numbers.
Another version of Saturable and Cooperativity Formalism, which is mentioned in  , is defined as follows:
where and, , , , and are real numbers.
In the case of gene regulatory networks (3) the sensitivities (1) are non-constant as well, even if one considers the functions and to be multilinear in. In addition, the usage of non-multilinear functions are also known in this theory  .
Taking into account the importance of kinetic orders/elasticities/sensitivities (1) in Systems Biology, one one hand, and convenience of the well-developed analysis of S-systems (stability theory  , parameter estimation routines  , software packages) on the other, a new kind of generic representations of compartment systems (2) was suggested in  (see also  for further applications of this representation). According to this idea, the entire operating domain is divided to partition subsets where all kinetic orders can be viewed as constants. In other words, the system (2) is approximated by a set of S -systems, each being only active in its own partition subset. This way of representing (2) is called “Piecewise Power-Law Formalism”  .
From the biological point of view, piecewise power-law representations are useful in many respects, when compared to other ways of approximations, as they take into account biologically relevant characteristics (kinetic orders) rather than the standard partial derivatives. Therefore, piecewise S-systems preserve important biological structures and, at the same time, do not destroy a relatively simple mathematical structure of plain S-systems. By this reason, approximations of a general target function by piecewise power approximations may be of a great importance for biological and other modelling. A rigorous mathematical justification of the idea of piecewise power-law approximations is the main purpose of the present paper. More precisely, we consider mean-square and uniform convergence of approximations by piecewise power functions to the target function provided that the associated partitions of the operating domain satisfy some additional assumptions. One of the challenges is that partitions of the operating domain may not be chosen freely in applications. For instance, the partitions may directly stem from biological properties of the model  . Other ways of con- structing partitions can be dictated by optimality-oriented algorithms. In Appendix (see also  ) we describe such a method which goes back to the paper  and which is based on an automatical procedure, allowing to obtain simultaneously the best possible polyhedral partition and the respective best possible piecewise linear approximation in the logarithmic space.
The main results of the paper are presented in Section 3 (mean-square convergence of piecewise power approximations) and in Section 4 (uniform convergence of piecewise power approximations). Several auxiliary results are proved in Appendices A.1-A.3, while Appendix A.4 presents an approximation algorithm which provides an automated partition and the respective best possible approximation in the logarithmic space for a given number of subdomains. Finally, in Appendix A.5 we explane by example why a direct piecewise power- law fitting is ill-posed.
Throughout the paper we use the following notations (see Table 1). Let and let be given by
Let be a domain in the space which we call Cartesian. We assume to be closed and
bounded (i.e. compact) subset of. Let be its image in the logarithmic space and be a
measurable partition of. This means that are all Borel measurable subsets of, for
every and for any natural N. In some results and algorithms will be a poly-
hedron domain in the logarithmic space, and will be a polyhedral partition.
Below is the measurable partition of which is the image of the partition under the
inverse logarithmic transformation.
Table 1. Overview of the basic terminology and notation used in the paper (LS-least-squares).
We also put
Let be a least-squares (LS) power-law fitting to the function v on, For
we consider the piecewise power function whenever We put
Let be a LS linear approximation to the function on For
we consider the piecewise linear function whenever We put also
We remind that the parameters and of the linear functions are uniquely obtained from the following minimization criterion in the logarithmic space:
Alternatively, one can define approximations of the target function v by power functions minimizing the distance in the space:
Our last minimization criterion looks similar to (6), but is, in fact, very different
as the minimum here is taken over all polyhedral partitions of the polyhedral domain, and all
corresponding linear functions ().
The main advantage of the criteria (8) and (10) is their linearity that provides the uniqueness of the solution and also makes the process of finding the solution computationally cheap, as it is based on explicit matrix formulas. On the other hand the use of the logarithmic transformation requires caution. The influences of the data values will change, as will the error structure of the model. Yet, the criterion (8) only requires a standard linear regression, while the criterion (10) requires a special regression algorithm, still linear, but much more involved (see Appendix A.4 for details).
The criterion (9) gives best possible approximation in terms of the LS error in the Cartesian space. However, a nonlinear regression algorithm should be used in this case, which is less advantageous, especially when the number of the estimated parameters is big. In addition, the nonlinear regression may have other drawbacks, one of which is ill-posedness (see Appendix A.5).
3. Mean-Square Convergence of Piecewise Power Approximations
The results of this section provide the mean-square convergence (L2-convergence) of piecewise approximations by power functions. The involved parameters may be e.g. obtained according to one of the minimization criteria (8) or (9).
The main technical challenges stemming from the nature of these minimization algorithms can be sum- marized as follows: 1) the L2-convergence of the approximations in the logarithmic space may not imply the L2-convergence of their images in the Cartesian space (and vice versa); 2) it is not evident that automatic dissections of the operating domain, as e.g. in the algorithms based on the minimization criterion (10), make the diameters of the partition subsets go to zero even if the number of partition subsets tends to ¥.
Three propositions below deal with L2-convergence in the logarithmic domain.
Proposition 1. Let the target function be measurable and bounded on and. Suppose
that the measurable partitions satisfy the property (). Then for the corresponding
LS approximations in and in we have, in the respective L2-norms, if.
To prove this proposition we need the following lemma, the proof of which can be found in Appendix A.1:
Lemma 1. Let v be measurable and on for some constants. Let
and the measurable partitions satisfy the property (). Then there exists a sequence
of continuous on functions satisfying the properties on any,
for all and (resp.) L2-converges to (resp. v) on (resp.).
Proof of Proposition 1. We use the sequences
from the lemma 1, which both converge in the L2-sense in the respective domains.
Since is the LS piecewise linear approximation in, we have
In the next proposition we do not assume that.
Proposition 2. Let be a polyhedral domain in, the function be square integrable in and
be the optimal polyhedral partition of obtained by the algorithm described in Appendix A.4. Then
for the corresponding LS approximations in we have in the L2-norm, if.
Proof. Evidently, for the L2-function there exists a sequence of polyhedral partitions of such
that as and a sequence of piecewise constant functions given
by whenever for which in the L2-norm if.
For the optimal polyhedral approximation we obtain
In particular, the assumption on is fulfilled if the target function v is measurable and bounded on.
The case of the L2-convergence of the approximations, given as, is more involved. The reason for that is that the L2-convergence of the sequence does not necessarily imply the L2-
convergence of the sequence.
We introduce the following notation. Given a partition subset of we put
where the point is the center of mass of the convex set given by
Let be the symmetric -matrix with the entries defined as
Below we fix a matrix norm. All matrix norms are equivalent. One of the norms is Euclidean, which is
defined via the maximal eigenvalues:. In the case of symmetric, positive definite matrices
(like above) we can write that.
We say that the sequence of partitions () of satisfies the condition () if there exists
a constant such that
If the chosen norm is Euclidean, then the latter estimate can be rewritten as
where is the least (positive) eigenvalue of the matrix (;).
Informally speaking, this property means that the partition subsets cannot be too different from each other in the shape. Assume, for instance, that the partition sets are enclosed in rectangular boxes. The result below says that if the ratio of the longest and the shortest edges of the boxes is bounded above, i.e. boxes are not “too thin”, then the sequence of such boxes satisfies the property ().
Proposition 3. A sequence of rectangular boxes satisfies the property () if and only if
, where (resp.) is the length of the smallest (resp. biggest) edge of the box.
Proof. We calculate the matrix (13).
We fix N and the Nth rectangular box given by
Let be the center of mass and.
and, the matrix (13) becomes. The least eigenvalue of the matrix is equal
to, i.e. to. The diameter of the box can be estimated above by the constant
, which also dominates the asymptotics of the diameter. Therefore the condition () is fulfilled for the given sequence of rectangular boxes if and only if the sequence is bounded above. ,
The next lemma is proved in Appendix A.2.
Lemma 2. Assume that the target function is measurable and bounded on and.
Assume further that the sequence of partitions () of satisfies the condition (). Then
the corresponding LS approximations and are uniformly bounded on and, re- spectively, i.e. there exist constant and such that
for all and all
for all and all
The main result of this section is the following theorem:
Theorem 4. Let the target function v be measurable and bounded on.
1) If the measurable partitions have the property (), then and
in the -norm as.
2) Assume that is a polyhedral domain in, while a sequence of polyhedral partitions of
and associated LS piecewise linear approximations satisfy the criterion (10) for each. Assume
further that the partitions () satisfy the condition (). Then in the -
Proof. To prove the first part of the theorem, we apply Lemma 1 and obtain
as, since is the LS piecewise power approximation in.
In the second part of the theorem, we use either Proposition 1 or Proposition 2, which yields the L2- convergence of the LS approximations to the function. Applying Lemma 2 we obtain the uniform boundedness of the approximations: for some M and any. Then we have
where The latter estimate is due to the uniform Lipschitz continuity of the function on the interval:
This estimate proves the L2-convergence of the LS approximations to the target function v. ,
4. Uniform Convergence of Approximations
In the previous section we studied convergence of LS approximations in the L2-norm. In many applications, however, it is desirable to consider their uniform convergence. This may be, for instance, of interest if we include the obtained approximations into the models based on differential equations, as it is well-known that convergence of (approximations of) solutions is only guaranteed by the uniform convergence of (approximations of) the right-hand sides.
The main result of this section is formulated in terms of kinetic orders and of the target function and its piecewise power approximations.
Theorem 5. Let the target function be a C1-function (i.e. differentiable with the continuous partial
derivatives). Let the sequence of partitions () of have the following two properties:
1) The closure of each coincides with the closure of its interior
Assume, in addition, that for any, there exist points, such that piecewise power approximations (=on) satisfy
Then uniformly on as
Proof. We fix N and consider the corresponding partition of the domain. Let,
, ,. Clearly, By assumption, for
we have where On the other hand, the mean value theorem yields where depends on y. Therefore
The uniform continuity of the continuous vector function on and the property that
imply that, given an, the estimate
is fulfilled for sufficiently large N.
Since (15) holds for any we also obtain that for sufficiently large N i.e. uniformly on as.
As the uniform convergence of the sequence implies its uniform boundedness, there is M such that for all. Therefore,
where. This gives the uniform convergence of to v as ,
Our last result shows that the LS approximations converge uniformly in the scalar case. This is due to the fact that in the scalar case the equalities (14) are always fulfilled.
Corollary 1. Let the target function v be continuous on () and. Assume
that the sequence of partitions () of has the property ().
Then for the corresponding LS power approximations and we have, () uniformly on.
The proof of the theorem follows directly from the previous theorem and the following lemma, the proof of which is given in Appendix A.3.
Lemma 3. Let a linear function be the LS approximation of a function on the entire interval. Then there exist and such that
5. Discussion and Conclusions
Piecewise power-law representations may be very useful as practical approximations to target functions which are defined analytically or numerically. However, a strict mathematical justification of these approximations is not always paid attention to. Unfortunately, such an analysis is not always straightforward, especially if one puts additional a priori assumptions on the approximations, which is quite common in many applications.
We showed in the present paper that under additional assumptions power approximations do converge to the target function. We studied least-squares and uniform convergence, both of which are widely used (explicitly or implicitly) in applications.
Our analysis dealt with two types of regression: linear regression in the logarithmic space and power-law regression in the Cartesian space. The first procedure has all the advantages of the linear regression, but the transformation back to the Cartesian space distorts the error structure of the problem; the least squares error for the resulting piecewise power-law fitting is in general less accurate than the corresponding error for a power-law regression of the original data. As a partial remedy, it may be advantageous to apply power-law regression to the original data over each of the partition subsets back in the Cartesian space. Yet, being nonlinear regression this procedure is essentially ill-posed. Thus, both kinds of regression have their strong and weak sides, so that the choice between them must be undertaken by modeling consideration.
In many cases, it may also be advantageous to use the classical linear regression in combination with optimal partitions of the operating domain. In the logarithmic space this procedure is again linear and can be auto- matized, but this may also cause several technical problems when proving the convergence of the corresponding approximations.
In the present paper, we offered a partial mathematical justification of the analysis based on piecewise power approximations, stemming from both kinds of regression, by verifying their convergence in the mean-square (L2) and uniform sense. Uniform convergence is e.g. important if target functions are included in differential eq- uations, as it is the uniform, and not L2-convergence, which is inherited by the solutions of the equations. How- ever, a comprehensive analysis of convergence of solutions of differential equations, approximated by piecewise S-systems, is beyond the scope of this paper and will be discussed in a separate publication.
The work of the first author was partially supported by a EEA grant coordinated by Universidad Complutense de Madrid, Spain, and by the grant #239070 of the Norwegian Research Council.
A.1. Proof of Lemma 1, Section 1
Let us prove the lemma for the domain. Notice that the function is measurable on the compact set and satisfies the estimate on. By Lusin’s theorem, for any N, there is a uniformly continuous function on such that for all N and the Lebesgue measure of the set is less than. Now, there exists a for which,
implies. We define for some.
Let be chosen in such a way that. If, then we have
as and Therefore, the Lebesgue measure of the set is less than, so that the sequence converges to in measure, and due to its uniform bondedness, also in the L2-sense on.
A similar argument applies to the sequence on, where we use the sequence instead of.
A.2. Proof of Lemma 2, Section 1
Clearly, is measurable and bounded on. Let.
Let us fix a partition subset. Our aim now is to find estimates for the norms of orthonormal basis functions, in the linear subspace of the space consisting of all linear functions and equipped with the scalar product
One basis is given by the set (11). However, this set is not necessarily orthogonal.
First of all, we choose and observe that its norm is equal to 1. Using the description (11) of the basis functions defined via the center of mass we directly deduce from (12) that is orthogonal to any linear combination of the other basis functions. The challenge is therefore to estimate the norms of linear
combinations, where are real numbers.
In the proof below we often omit one of the variables in, that is either l, or y, depending on a particular interpretation of this basis. Writing means that we regard it as a vector for each particular y, i.e. (the component is excluded in further considerations). Omitting y () means that we treat as a function of y for a given l, i.e. as an element of the space.
As, we require the following constraints on the coefficients:
(where is the Euclidean norm in and is the scalar product of two vectors) with the constraint.
Diagonalization of the symmetric, positive definite matrix with the help of an orthogonal matrix Q gives the matrix containing the eigenvalues of on the diagonal. Putting and using, we obtain from (16) that
with the constraint, where the constant is evidently an upper estimate for the func-
tions (11) on the partition subset. The maximum value of the expression under the above constraint is, where is the minimal eigenvalue of the matrix. Due to the condition () we get
that, where the constant does not depend on i and N.
The final step in the proof of the lemma uses the explicit representation of the LS approximation:
Therefore, () and
This implies also the uniform boundedness of the approximations on. The proof of the lemma is complete.
A.3. Proof of Lemma 3, Section 4
Let us first prove the existence of. Assume the converse, i.e. that for all Let for instance for all Put Then the linear function satisfies the estimates Therefore
This, however, contradicts the definition of the least squares approximations. The case is treated in a similar manner.
Assume now that for all. We shall prove that in this case the graph of the scalar linear function intersects the graph of in at least two points from the interval.
From the first part of the proof we know that at least one intersection point does exist. Assume that there is exactly one point such that Without loss of generality we may assume that where. Since and for all, we obtain that for and for (one of these sets may be empty). Consider a new linear approximation given by where a sufficiently small is chosen in such a way that the graphs of the functions and have still one intersection point in (namely, d by construction).
It is easy to see that such a does exist. Indeed, in a vicinity U of the point d we have that, so that for small we have and hence d is the only intersection point of the graphs of the functions and in U. Outside U, i.e. inside the compact set the continuous function is non-zero, so that. Choosing in such a way that guarantees that the graphs of the functions and meet only in d.
We complete now our analysis of the scalar case observing that for such
simply because the graph of is closer to the graph of, than the graph of l. This contradicts the assumption that l is the LS approximation of. We have therefore proved that there exists such that
A.4. Piecewise Power-Law Regression
In this subsection we describe a numerically stable way to find a best possible, in some sense, piecewise power approximation to an arbitrary target function. The method was suggested in  and was based on the piecewise linear regression from  .
Let, be a target function (e.g. the in- and efflux functions in (2), possible only as a data set obtained from some measurements. The task is to find a set of power functions which approximate the target function in a “best possible” way given a number N of the partition sets. The problem is essentially non-convex being therefore not well-posed numerically. However, using the logarithmic space one can convert this problem into a linear one. Performing an automated piecewise linear regression based on Artificial Neural Networks (ANN) introduced in  , this algorithm returns an optimal polyhedral partition of the domain in logarithmic space and the corresponding optimal set of linear functions which are the best approximations to the image of the target function in the logarithmic space. Returning to the Cartesian space produces piecewise power functions that are not necessarily the best possible approximations in the sense of the metric in, but this approximation procedure is numerically stable.
As before, we assume to be the image of under the logarithmic transformation, i.e. ,. Let also N be a fixed number of partition subsets of a polyhedral set. The task is to construct numerically the function, , which is piecewise linear and which is the best possible LS approximation to. The inverse logarithmic transformation of will then give a piece- wise power approximation of the target function v, which is best possible with respect to the logarithmic distance. To simplify the notation we will below write, thus removing the index N.
A polyhedral partition satisfies the following assumptions: for every, ,
This partition gives rise to a partition of the original domain. Applying the inverse logarithmic trans-
formation, we obtain
The piecewise linear function is assumed to have the following representation:
where are all polyhedral sets defined by (17) and defining a partition of the logarithmic domain.
Scalar weights for and the numbers for, , uniquely characterize the function and the corresponding partition. Below the weights are collected in a vector for every.
The piecewise linear algorithm has two targets.
Target 1: The weights vector (), which should be reconstructed as soon as the partition is known.
Target 2: Scalars (,) which should be estimated for every
The aim of the piecewise linear regression: given a function represented via a data set, a number of partition subsets N and a natural number c find a piecewise linear function
and the polyhedral partition such that
where the minimum is taken over all polyhedral partitions and all linear functions (). The parameter c is the number of nearest neighboring points required by the implementation of the method. The neighboring points are used to aggregate approximations into clusters.
The algorithm consists of the following steps.
Step 1. Local regression: Define the set containing the kth point and the samples associated with nearest neighbors y to and perform linear regression to obtain the weights fitting the samples in.
Step 2. Clustering: Perform a clustering process to subdivide the set of weight vectors into N groups with similar features.
Step 3. Classification: Apply a classification algorithm based on a pattern-recognition method to produce the coefficients describing the partition subsets.
Step 4. Regression: For every perform linear regression on the samples with to obtain the weights for the ith approximation.
The inverse logarithmic transformation of results in a piecewise power approximation of the function and a partition defined by (18).
A modification of this algorithm, which is suggested in  , assumes a power-law regression to the original data over each of the N partition subsets of the optimal partition, so that the increase in difficulty is modest even though the regression is now nonlinear. The partitioning is found from the first three steps of the above algorithm and the new algorithm proceeds as follows:
Step 4A. Power-law regression: For every perform power-law regression on the samples with to obtain the power functions for each partition subset given by (18).
The algorithm is implemented in a free MatLab toolbox Hybrid Identification Toolbox (HIT).
A.5. Ill-Posedness of LS Power-Law Fitting
In this section we show that the nonlinear power-law regression, i.e. the one based on the minimization criterion (9), is ill-posed. This ill-posedness is caused by the fact that the set from (9) is a non-convex set.
Let us assume that is only known with a certain accuracy, as it is often the case. Mathematically, we will describe this situation by letting v depend on a (small) parameter, i.e. (and so becomes the function as well). But it turns out, as we will show below, that for certain values of small perturbations may cause a “jump” in the corresponding power-law representation, i.e. while functions remain close to each other, the least-squares minimization criterion (9) may produce the power-law repre- sentations that are very different.
Below we illustrate this fact analytically using a specific example. For the sake of simplicity, let us consider a target function of one variable, so that
If then and f that minimize (20) are also functions of
We first consider a simpler problem assuming that, so that after rescaling we may assume that and rewrite (20) as
After applying the substitution we obtain
Now, let and consider the function and its LS power approximation.
It is easily seen that the projection function is discontinuous in (see Figure A1), while Figure A2 gives a graphical representation of this discontinuity in for one value of y.
Figure A1. (a) The continuous lines represent the functions for different values of and the dotted lines give its LS approximations within the operating interval. The blue and green colors correspond to the values, while the red and black colors describe the case of. (b) This graph explains how the LS power approximations at i.e. depend on. We see that is discontinuous at.
Figure A2. (a) The continuous lines represent the functions for different values of and the dotted lines give its LS approximations within the operating interval. The blue and green colors correspond to the values, while the red and black colors describe the case of. (b) This graph explains how the LS power approximations at i.e. depend on. We see that is discontinuous at.
Going back to the variable x, the above function becomes, , ,
the discontinuity in being preserved.
This example shows that the criterion (9) may produce a LS power approximation that is not stable under small perturbations of the parameter and by this under small perturbations of the target function, which causes ill-posedness of the minimization problem. We stress also that this effect is generic, i.e. independent of the number of the involved parameters, as the comparison of Figure A1(a) (resp. Figure A1(b)) and Figure A2(a) (resp. Figure A2(b)) clearly demonstrates.
Submit or recommend next manuscript to SCIRP and we will provide best service for you:
Accepting pre-submission inquiries through Email, Facebook, LinkedIn, Twitter, etc.
A wide selection of journals (inclusive of 9 subjects, more than 200 journals)
Providing 24-hour high-quality service
User-friendly online submission system
Fair and swift peer-review system
Efficient typesetting and proofreading procedure
Display of the result of downloads and visits, as well as the number of cited articles
Maximum dissemination of your research work
Submit your manuscript at: http://papersubmission.scirp.org/