Single Molecule Thermodynamics Hypothesis of Protein Folding and Drug Design

Show more

1. Introduction: A Single Molecule View of Protein Folding

To resolve the protein folding problem, that is: predicting the native structure and describing the folding dynamics, we must work with the fundamental physical law that directly governs protein folding process. That law is the Thermodynamic Principle of Protein Folding [1], it is just the Second Law of Thermodynamics since Anfinsen and others already shown that the folding process is spontaneous. In the protein folding case, the second law is that the Gibbs free energy achieves a minimum at the native structure.

Therefore, we have to figure out what is the Gibbs free energy. The question is, is there a Gibbs free energy function whose variables are all possible conformations of a given protein molecule? Or is it only a Gibbs free energy difference between the folded ensemble of protein molecules and its counterpart, the unfolded ensemble? The former is a single molecule view coming from contemplating a protein molecule comes out of ribosome and changes its conformations until it achieves its native structure; the latter is an ensemble view coming from staring at a tube of purified protein solution and trying to figure out the collective behaviours of the protein molecules in the solution while the solution is going towards equilibrium.

Via quantum statistics applied to a tiny thermodynamic system ${\mathfrak{S}}_{X}$ which is tailor made for the conformation $X$ and its immediate physiological environment ${\mathfrak{E}}_{\mathfrak{U}}$, we have derived the conformational Gibbs free energy (CGFE) function $G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}$ for globular proteins $\mathfrak{U}$ [2] [3] [4].

Applying CGFE function we translate the thermodynamic principle of protein folding into the Single Molecule Thermodynamic Hypothesis (SMTH) of Protein Folding: putting a protein molecule $\mathfrak{U}$ in an environment $\mathfrak{E}$, a stable conformation ${X}_{\mathfrak{E}}$ (may not be unique) of $\mathfrak{U}$ must be a minimizer (local or global) of the CGFE function $G\mathrm{(}X\mathrm{;}\mathfrak{E}\mathrm{,}\mathfrak{U}\mathrm{)}$. In particular, the gradient vanishes at ${X}_{\mathfrak{E}}$, $\nabla G\mathrm{(}{X}_{\mathfrak{E}}\mathrm{,}\mathfrak{E}\mathrm{,}\mathfrak{U}\mathrm{)}=0$.

Another big question in protein folding is that is there a folding force? Leventhal in 1969 [5] has shown by contradiction that there must be a folding force, otherwise if the folding process were only random, it would have taken a time span longer than the Earth’s age.

The CGFE function also gives us the deterministic part of the folding force ${F}_{i}$ acting on an atom ${a}_{i}$ of $\mathfrak{U}$. It is ${F}_{i}\mathrm{(}X\mathrm{)}=-{\nabla}_{{x}_{i}}G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}$.

A scientific hypothesis has to give verifiable predictions to let people confirm or refute it. We suggest two verifiable predictions of the single molecule thermodynamic hypothesis.

1) ab initio predictions of native structures of globular proteins: the native structure ${X}_{\mathfrak{U}}$ of the protein $\mathfrak{U}$ is a (local or global) minimizer of the CGFE function:

$G\mathrm{(}{X}_{\mathfrak{U}}\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}=\underset{X\in U\subset {\mathfrak{X}}_{\mathfrak{U}}}{\mathrm{min}}G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{),}\text{\hspace{1em}}\text{forsomeneighbourhood}U\text{\hspace{0.17em}}\text{of}{X}_{\mathfrak{U}}\mathrm{.}$ (1)

Especially, since $G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}$ is smooth, $\nabla G\mathrm{(}{X}_{\mathfrak{U}}\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}=0$. In case ${X}_{\mathfrak{U}}$ is a global minimizer, then

$G\mathrm{(}{X}_{\mathfrak{U}}\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}=\underset{X\in {\mathfrak{X}}_{\mathfrak{U}}}{\mathrm{min}}G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{).}$ (2)

Here $X$ denotes a conformation and ${\mathfrak{X}}_{\mathfrak{U}}$ is the set of all possible conformations of $\mathfrak{U}$.

Therefore, for a globular protein $\mathfrak{U}$ (we know the formula of $G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}$), the prediction of native structure is reduced to a pure mathematical problem, the minimization problem of a known smooth function. A mathematical theorem guarantees that this problem has a solution, i.e., mimimizers always exist, the real task is to find them and determine which is the native structure, a hard programming problem.

2) For a globular protein $\mathfrak{U}$, starting from any initial conformation ${X}_{0}$, there is a folding path $X\mathrm{(}t\mathrm{)}=X\mathrm{(}t\mathrm{;}{X}_{0}\mathrm{)}$ satisfying $X\mathrm{(}{t}_{0}\mathrm{;}{X}_{0}\mathrm{)}={X}_{0}$, and for $t\ge {t}_{0}$, the following Langevin equation:

${m}_{i}\frac{{\text{d}}^{2}{x}_{i}\mathrm{(}t\mathrm{)}}{\text{d}{t}^{2}}={F}_{\text{total}}=-{\nabla}_{{x}_{i}}G\mathrm{(}X\mathrm{(}t\mathrm{);}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}-{\eta}_{i}\frac{\text{d}{x}_{i}\mathrm{(}t\mathrm{)}}{\text{d}t}+{F}_{i}\mathrm{(}t\mathrm{),}\text{\hspace{1em}}i=\mathrm{1,}\cdots \mathrm{,}n\mathrm{.}$ (3)

Here ${\eta}_{i}$ is the solvent friction. The random force ${F}_{i}\mathrm{(}t\mathrm{)}$ is caused by occasionally bumping into another non-solvent molecule. Because of it, we do not have a completely deterministic folding path. Again, mathematical theorems guarantee that such folding path $X\mathrm{(}t\mathrm{;}{X}_{0}\mathrm{)}$ exists. Moreover, mathematics also tells us that it is highly depending on its initial conformation ${X}_{0}$, so an important issue is to know the protein’s initial conformation as it is out of ribosome.

If the two predictions are positively verified, then we can say that theoretically the protein folding problem is resolved, at least for globular proteins.

2. The CGFE Function for Globular Proteins

To explain our CGFE function, we start with a conformation $X$. A protein $\mathfrak{U}$ consists of n atoms $\mathrm{(}{a}_{1}\mathrm{,}\cdots \mathrm{,}{a}_{n}\mathrm{)}$, a conformation of $\mathfrak{U}$ can be expressed as a point of 3n-dimensional space ${\mathbb{R}}^{3n}$, $X=\mathrm{(}{x}_{1}\mathrm{,}\cdots \mathrm{,}{x}_{n}\mathrm{)}$, where ${x}_{i}\in {\mathbb{R}}^{3}$ is the nuclear position of ${a}_{i}$ in the 3-dimensional space ${\mathbb{R}}^{3}$. Not all points in ${\mathbb{R}}^{3n}$ are conformations of $\mathfrak{U}$, bond lengths, bond angles, and van der Waals distances in general, are natural constraints. Denote ${\mathfrak{X}}_{\mathfrak{U}}$ as the set of all possible conformations of $\mathfrak{U}$. A conformational function of $\mathfrak{U}$ is a function $f\mathrm{:}{\mathfrak{X}}_{\mathfrak{U}}\to \mathbb{R}$. For example, all force fields used in molecular dynamics simulations are conformational functions.

The 3-dimensional conformation ${P}_{X}$ of $\mathfrak{U}$ is ${P}_{X}={\cup}_{i=1}^{n}{C}_{i}\mathrm{(}{x}_{i}\mathrm{)}\subset {\mathbb{R}}^{3}$, where ${C}_{i}\subset {\mathbb{R}}^{3}$ is the shape of the atom ${a}_{i}$ in $\mathfrak{U}$, ${C}_{i}\mathrm{(}{x}_{i}\mathrm{)}$ indicates it has been congruently moved to the nuclear position ${x}_{i}$. ${C}_{i}$ exists, will change with $X$, see [6]. A very good approximation to ${C}_{i}\mathrm{(}{x}_{i}\mathrm{)}$ is a solid ball $B\mathrm{(}{x}_{i}\mathrm{,}{r}_{i}\mathrm{)}$, centred at ${x}_{i}\in {\mathbb{R}}^{3}$ with radius ${r}_{i}$ that is the ${a}_{i}$ ’s van der Waals radius. For simplicity, we adopt that ${C}_{i}\mathrm{(}{x}_{i}\mathrm{)}=B\mathrm{(}{x}_{i}\mathrm{,}{r}_{i}\mathrm{)}$.

In natural, and even in most of artificial environments of protein folding, the immediate environment of a globular protein $\mathfrak{U}$ is just one layer of water molecules surrounding the conformation ${P}_{X}={\cup}_{i=1}^{n}B\mathrm{(}{x}_{i}\mathrm{,}{r}_{i}\mathrm{)}$. This is true even the protein molecule is inside a crystal [7].

${P}_{X}$ plus the one layer water molecules consists of a tiny thermodynamic system ${\mathfrak{S}}_{X}$, tailor made for ${P}_{X}$. As an open thermodynamics system, ${\mathfrak{S}}_{X}$ has a Gibbs free energy $G\mathrm{(}{\mathfrak{S}}_{X}\mathrm{)}$. The CGFE function $G\mathrm{(}\cdot \text{\hspace{0.17em}}\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{):}{\mathfrak{X}}_{\mathfrak{U}}\to \mathbb{R}$ then is defined as $G\mathrm{(}X\mathrm{;}{\mathfrak{E}}_{\mathfrak{U}}\mathrm{,}\mathfrak{U}\mathrm{)}=G\mathrm{(}{\mathfrak{S}}_{X}\mathrm{)}$.

Between ${P}_{X}$ and the layer of water molecules, is an interface ${M}_{X}$, for example, the solvent accessible surface $\partial {P}_{X}$. The expression of $G\mathrm{(}{\mathfrak{S}}_{X}\mathrm{)}$ is via global geometric features of ${M}_{X}$ and its surface chemical potentials. A protein molecule has many moieties or atom groups, some are charged, some are polar, others are non-polar. They can be classified into hydrophobicity classes ${H}_{i}$, $1\le i\le H$, $H>1$, from most hydrophobic (non-polar) to the most hydrophilic (polar or charged). An atom ${a}_{i}\in {H}_{j}$ if it belongs to a moiety of class ${H}_{j}$. Define ${P}_{X\mathrm{,}i}={\cup}_{{a}_{j}\in {H}_{i}}B\mathrm{(}{x}_{j}\mathrm{,}{r}_{j}\mathrm{)}$, then ${P}_{X}={\cup}_{i=1}^{n}{P}_{X\mathrm{,}i}$. The space containing water molecules in ${\mathfrak{S}}_{X}$, the ring ${\Re}_{X}=\stackrel{\xaf}{{\mathfrak{S}}_{X}\backslash {P}_{X}}$, is decomposed into H parts (not necessarily connected) via the distance function $dist\mathrm{(}x\mathrm{,}{P}_{X\mathrm{,}i}\mathrm{)}={\mathrm{min}}_{y\in {P}_{X\mathrm{,}i}}\mathrm{|}x-y\mathrm{|}$, see Figure 1.

${\Re}_{X\mathrm{,}i}=\mathrm{\{}x\in {\Re}_{X}\mathrm{:}\text{\hspace{0.17em}}dist\mathrm{(}x\mathrm{,}{P}_{X\mathrm{,}i}\mathrm{)}\le dist\mathrm{(}x\mathrm{,}{P}_{X\mathrm{,}j}\mathrm{),}\text{forany}j\ne i\mathrm{\},}i=\mathrm{1,}\cdots \mathrm{,}H\mathrm{.}$ (4)

The interface ${M}_{X}$ then is decomposed accordingly

${M}_{X}={\cup}_{i=1}^{H}{M}_{X\mathrm{,}i}\mathrm{,}\text{\hspace{1em}}{M}_{X\mathrm{,}i}={M}_{X}\cap {\Re}_{X\mathrm{,}i}\mathrm{.}$ (5)

A water molecule in ${\Re}_{X\mathrm{,}i}$ will touch ${M}_{X\mathrm{,}i}$, so it will be attracted ( ${H}_{i}$ is charged or polar) or repulsed ( ${H}_{i}$ is non-polar) by ${P}_{X}$. So the same water molecule in different ${\Re}_{X\mathrm{,}i}$ has different chemical potentials ${\mu}_{i}$. For non-polar ${H}_{i}$, ${\mu}_{i}>0$, for charged or polar ${H}_{i}$, ${\mu}_{i}<0$. Thus there is a $1<k<H$, such that

${\mu}_{1}>{\mu}_{2}>\cdots >{\mu}_{k}>0>{\mu}_{k+1}>\cdots >{\mu}_{H}.$ (6)

is called hydrophobic surface, is called hydrophilic surface.

Since water molecules and electrons can enter or leave, is an open thermodynamic system. Hence, the ensemble used here is the grand canonical ensemble. Applying quantum statistics to the open system, the number of water molecules in turns to be a Hermitian operator with mean value. The same is true to the number of electrons. The Gibbs free energy of then is

(7)

Since is fixed, here the kinetic energy operator vanishes.

Figure 1. Water molecules are contained inside, note it is not necessarily connected. is an interface.

Since every water molecule in has contact with the surface, is proportional to the area. Therefore, there are, such that

(8)

Let be the coordinates of one electron in, the coordinates of water molecules in. Then the electronic density distribution function ((1.3) of ([6], page 6)) gives

(9)

The is obtained by mean value theorem of integrals.

Let be the domain enclosed by (), we have roughly the volume, where is the diameter of a water molecule and is the area of. Then taking the mean in (9) we have

(10)

Let and. Substitute (8) and (10) into (7), the conformational Gibbs free energy function of a globular protein in its physiological environment is

(11)

The function is smooth, i.e., the first and second derivatives exist in except at points such that for some. But such case cannot happen for a conformation, because of the van der Waals distances must be positive. Therefore, on, is smooth.

3. Explanations of Protein Folding

A scientific hypothesis has to be able to explain natural and artificial phenomena. We will explain several phenomena of protein folding, unfolding, and docking, and suggest an application to drug design, according to the SMTH.

3.1. What the CGFE Function in (11) Reveal?

It is well known ([8] and [9]) that native structures of globular proteins have three important global geometric features. Comparing to unfolded conformations, they have: 1) Smaller volume; 2) Smaller surface area; and 3) Compactly packed hydrophobic cores.

Hence folding towards native structure, the volume and area are going to shrink. Then the first two terms in (11) must have positive coefficients, i.e., , otherwise shrinking volume and area would have enlarged the Gibbs free energy.

As the hydrophobic core, look at (11), the hydrophobic surfaces have positive chemical potentials, thus shrinking will reduce the Gibbs free energy. On the other hand, for hydrophilic surface, , therefore, enlarging will reduce the Gibbs free energy. Enlarging of hydrophilic surface area is equivalent to shrinking the hydrophobic surface area, since is also shrinking towards the native structure.

According to the SMTH, predicting native structure of a globular protein is to minimize the CGFE function (11) as in (1) and (2). As analysed above, it is essentially making an ever better hydrophobic core by shrinking volume, area, and hydrophobic area simultaneously and cohesively. This is the “cooperativity” searched in [10], of “the concurrent participation of different regions of the biomolecule to promote and sustain intramolecular or intermolecular interactions”.

One may ask that where are hydrogen bonds in (11)? The answer is that secondary structures and hydrogen bonds are products of minimizing the CGFE function to find the native structures. A judicious examination of an exhaustive PDB sample of small soluble globular proteins of moderate size (residues) showed that the hydrophobic collapsing is coupled with backbone hydrogen-bond formation [11]. In [12], we neglected the volume and area, only shrank the hydrophobic surface area (equivalent to make better hydrophobic core), hydrogen bonds, secondary structures such as helices, strands, and turns, duly appeared with statistical significance.

The explanation is: proteins in their physiological environment are special among polymers. Polymers do not have specified structures, proteins have native structures in their physiological environment. Why? Globular proteins’ peptide chains are special, folding in their physiological environment, while collapsing to hydrophobic cores the residues are putting just in places to be able to form secondary structures and hydrogen bonds simultaneously. Evolution selects the very few peptide chains to be foldable globular proteins. In fact, randomly picking a 400 residue peptide chain, the probability that it is a protein’s peptide chain is at most 10^{−460} [2]. In any computer, 10^{−460} is zero.

3.2. Explanation of Denaturation

According to the pioneer research of denaturation [13], all known denature phenomena are caused by environment change from to some. Denaturation, or unfolding, is the same as folding, only in a different environment and with a different CGFE function. According to the SMTH, the unfolding will end at minimizers’s (may not be unique, local or global), of.

Experiments show that the difference between folding and unfolding is that folding leads to a unique native structure, unfolding leads to many different stable conformations [1].

Either in folding or unfolding, a conformation moves along a folding (unfolding) path which satisfies an equation of motion, the Langevin Equation (3), with deterministic forces and. This may explain why folding leads to a unique native structure and unfolding leads to many stable conformations. The random forces in the Langevin equation for folding and unfolding may be also different, in a denatured environment randomly bumping upon some other molecules will happen more often.

Moreover, the initial conformation of the folding or unfolding path also determines where the path ends. Any local minimizer has a domain of attractive basin (U in (1)) such that any initial conformation in will fold to. Because of evolutional selection for a protein’s peptide chain made it fit in the protein’s physiological environment, the attractive basin of is large enough to contain all the initial conformations freshly come out of ribosome, thus even has more than one minimizer, the native structure is the unique folding result.

But is not in denaturation, the initial conformation is the unique native structure? Why it unfolds to many stable denatured conformations? It is because catastrophe, a phenomenon often happens in nature, a description of it is given in [14]: “Catastrophe theory is concerned with the mathematical modelling of sudden changes―so called ‘catastrophes’―in the behaviour of natural systems, which can appear as a consequence of continuous changes of the system parameters”.

Actually, from physiological environment to the denaturation environment, there must be a family of environments connecting them. Except and, these environments are not in equilibrium or quasi-static so the function is not well defined. Hence, although the parameter t varies continuously, catastrophe does happen so that various copies of the same native structure suddenly changed to different structures and when the environment finally changes to the denaturation environment, these changed structures became different initial conformations of denaturation paths under.

3.3. Explanation of Docking

Docking is trying to bound two molecules to form a stable complex.

Let and be two molecules (proteins or others), and and. The 3-dimensional conformations and are contained in their tailor made thermodynamic systems and respectively in their common physiological environment. In Figure 2, in the beginning,. Now suppose that there is a congruence brings to, i.e.,. Van der Waals repulse tells us that it is always 1) and 2). Thus, we define and

Figure 2. (a) Two independent molecules; (b) Invading each other; (c) Forming new minimiser.

binding if and only if 1) and 2).

If and binding, there is a binding energy depending on the net effect of the way of bring to close to. Neglecting, consider the Gibbs free energy for the conformation of the “molecule”), then

(12)

If is just water, then is given by (11) for the conformation of the “molecule”. If and are not binding, then, thus (12) is true for any.

In general, even and are minimizers of and respectively, will not be a minimizer of .

3.4. Drug Design

We address a question asked in [10], “the rational drug designer faces a many-body problem: the interactions between the protein target and the drug/ligand involve more than groups matched up in a pairwise fashion at the target-ligand interface... what sort of many-body problem is the drug designer facing and how can this knowledge play advantageously to address the major therapeutic imperatives of today and tomorrow”?

In fact, a drug is much smaller than its target, a globular protein. Thus, the binding energy will be easier to figure out than that between two large proteins. Then the stable structures of the drug/target complex will be (local or global) minimizers of in (12). Thus the drug’s efficiency and safety depend on the properties of minimizers of . This might be an answer to what is the “many-body problem” in [10] and also suggests a particular solution.

References

[1] Anfinsen, C.B. (1973) Principles That Govern the Folding of Protein Chains. Science, 181, 223-230. https://doi.org/10.1126/science.181.4096.223

[2] Fang, Y. Gibbs Free Energy Formula for Protein Folding. In: Morales-Rodriguez, R., Ed., Thermodynamics—Fundamentals and Its Application in Science, 47-82.
http://www.intechopen.com/books/thermodynamics-fundamentals-and-its-application-in-science

[3] Fang, Y. (2014) A Gibbs Free Energy Formula for Protein Folding Derived from Quantum Statistics. Science China, Physics, Mechanics & Astronomy, 57, 1547-1551. https://doi.org/10.1007/s11433-013-5288-x

[4] Fang, Y. (2015) Thermodynamic Principle Revisited: Theory of Protein Folding. Advances in Bioscience and Biotechnology, 6, 37-48.
https://doi.org/10.4236/abb.2015.61005

[5] Levinthal, C. (1969) How to Fold Graciously. Mössbauer Spectroscopy in Biological Systems Proceedings. Proceedings of a Meeting Held at Allerton House, Monticello, Illinois. University of Illinois Bulletin, 67, 22-26.
https://web.archive.org/web/20110523080407/http://www-miller.ch.cam.ac.uk/levinthal/levinthal.html

[6] Bader, R.F.W. (1990) Atoms in Molecules: A Quantum Theory. Clarendon Press, Oxford.

[7] Lattman, E.E. and Loll, P.J. (2008) Protein Crystallography: A Concise Guide. The Johns Hopkins University Press, Baltimore.

[8] Novotny, J., Bruccoleri, R. and Karplus, M. (1984) An Analysis of Incorrectly Folded Protein Models. Implications for Structure Predictions. J. Mol. Biol., 177, 787-818.
https://doi.org/10.1016/0022-2836(84)90049-4

[9] Novotny, J., Rashin, A.A. and Bruccoleri, R. (1988) Criteria That Discriminate between Native Proteins and Incorrectly Folded Models. Proteins, 4, 19-30.
https://doi.org/10.1002/prot.340040105

[10] Fernádez Stigliano, A. (2015) Biomolecular Interfaces: Interactions, Functions and Drug Design. Springer International Publishing.

[11] Fernandez, A., Kardos, J. and Goto, Y. (2003) Protein Folding: Could Hydrophobic Collapse Be Coupled with Hydrogen-Bond Formation? FEBS Letters, 536, 187-192.
https://doi.org/10.1016/S0014-5793(03)00056-5

[12] Fang, Y. and Jing, J. (2010) Geometry, Thermodynamics, and Protein. Journal of Theoretical Biology, 262, 382-390. https://doi.org/10.1016/j.jtbi.2009.09.013

[13] Wu, H. (1931) Studies of Denaturation of Proteins XIII. A Theory of Denaturation. Chinese J. Physiol., 5, 321-344.

[14] Sanns, W. (2009) Catastrophe Theory. In: Meyers, R.A., Ed., Encyclopedia of Complexity and System Science, Springer, Vol. 4, 703-719.
https://doi.org/10.1007/978-0-387-30440-3_47