JBiSE  Vol.12 No.8 , August 2019
Predictors for Predicting Temperature Optimum in Beta-Glucosidases
Author(s) Shaomin Yan, Guang Wu
This is the continuation of our studies on beta-glucosidase, which plays an important role in biological processes and recently strong interests focus on their potential role in biofeul production. In order to develop simple methods to predict the optimal working condition for beta-glucosidase, we used a 20-1 feedforward backpropagation neural network to screen possible predictors to predict the temperature optimum of beta-glucosidase from 25 amino-acid properties related to the primary structure of beta-glucosidases. The results show that the normalized polarizability index and amino-acid distribution probability can predict the temperature optimum of beta-glucosidase, which highlights a cost-effective way to predict various enzymatic parameters of beta-glucosidase.

1. Introduction

The β-glucosidase (EC 3.2.1 .21) plays an important role in biological processes because it cuts the β-bond linkage into glucose molecules [1]. For example, mutations in the gene of lysosomal enzyme acid beta-glucosidase can lead to human metabolic disorder Gaucher disease characterized by deficient activity of the enzyme [2,3]. β-glucosidase can deglycosylase isoflavones to their aglycone forms, which provides wide applications in food and pharmaceutical industries [4]. Recently, more and more interest on its potential role on biofeul production because cellulose is a linear biopolymer of glucose molecules connected by β-1,4-glycosidic bonds, of which enzymatic hydrolysis requires mixtures of hydrolytic enzymes including endoglucanases, exoglucanases (cellobiohydrolases), and β-glucosidases [5]. Therefore, great efforts have been made to develop renewable biofuel by enzymatically hydrolyzing carbohydrate polymers in biomass to sugars and fermenting them to ethanol [6].

Generally speaking, the optimal working conditions for enzymes are determined through the experimental approaches, which are costly and time-consuming. Nowadays, the experimental speed apparently lags the speed of increase of enzymes in database because in 2002 there were only 789 enzymes documented in the Comprehensive Enzyme Information System BRENDA [7,8]. However, there are enzymes from 33,721 organisms currently. In this situation, it is easily found that many enzymes have their sequence information but lack their optimal working conditions. Thus it is intriguing to develop methods to predict the optimal working conditions of enzymes based on their primary structure, and recently we have conducted several studies on predicting functional parameters of enzymes using amino acid properties, including pH optimum [9 - 12], temperature optimum [11 - 15], Michaelis-Menten constant [16 - 18] and turnover number [19]. However, more studies are needed in order to get solid conclusions. The aim of this study is to find out the predictors that are useful to predict the temperature optimum of β-glucosidase.


2.1. Data

From the Comprehensive Enzyme Information System BRENDA, 37 β-glucosidases (EC 3.2.1 .21) have their sequence information under the category of temperature optimum, of which one β-glucosidase was documented with its mutant [20,21]. Also, two temperature values are documented in the β-glucosidases B5TWK3 at 22˚C and 37˚C [22] and Q12715 [23] at 65˚C and 70˚C. In total, this databank provides 40 matched sequences and temperature values of β-glucosidases. The amino-acid sequences of β-glucosidases are obtained from the Universal Protein Resource (UniProt) [24].

2.2. Possible Predictors

Table 1 lists the amino acid properties to be scanned, which involve the characteristics of charge, hydrophilicity or hydrophobicity, size and functional groups, and they are crucial for protein structure and protein–protein interactions [25]. Some properties are related to primary structure of enzymes and include the spatial properties [26,27] listed in rows 2 - 5 in Table 1; hydrophobic properties [28 - 30] listed in rows 6 - 10 in Table 1; electronic properties [31] listed in rows 11 - 17 in Table 1, and the secondary structure predictions [32] listed in rows 18 - 24 in Table 1. All of these properties have a particular number to a certain amino acid in proteins, thus each amino acid has a fixed value, which surely cannot represent different β-glucosidases. Because each β-glucosidase has its own amino-acid composition, we multiply the values listed in Table 1 by their amino-acid composition for each β-glucosidase.

Based on occupancy of subpopulations and partitions [33], we have developed a measure to calculate amino acid distribution probability according to the following equation:

r ! / ( q 0 ! × q 1 ! × × q n ! ) × r ! / ( r 1 ! × r 2 ! × × r n ! ) × n r

where ! is the factorial function, r is the number of a type of amino acid, q is the number of partitions with the same number of amino acids and n is the number of partitions in the protein for a type of amino acid. And its calculation can be available at http://www.gxas.cn/dp.htm. Each type of amino acids has its distribution probability as example shown in Table 2. However, the same type of amino acids can have different values in different proteins according to their real distribution pattern along protein sequence [34 - 38].

2.3. Predictive Model

In order to find out possible predictors to predict the temperature optimum of β-glucosidases, a 20-1 feedforward backpropagation neural network was used as predictive model [39], whose structure is shown in Figure 1. In this model, the first layer contains 20 neurons corresponding to 20 inputs (or 20 elements of input in neural network terminology), which can be any measure related to 20 types of amino acids. The second layer contains a single neuron corresponding to the single output, temperature optimum. The transfer functions are tan-sigmoid and linear for two layers. The training algorithm is the resilient backpropagation, which is the fastest algorithm on pattern recognition in MatLab [40].

Table 1. Features of amino acids used as predictors. A, alanine; R, arginine; N, asparagine; D, aspartic acid; C, cysteine; E, glutamic acid; Q, glutamine; G, glycine; H, histidine; I, isoleucine; L, leucine; K, lysine; M, methionine; F, phenylalanine; P, proline; S, serine; T, threonine; W, tryptophan; Y, tyrosine; V, valine. σI: Inductive effect scale; HMΔPH: Normalized Mulliken population data for the amino-acid side chains in the context of phenol; σR: Resonance effect scale; σα: Normalized polarizability index; σF: Field effect index; AI: Additional scale; f(i): Frequency of the 1st residue in turn; f(i + 1): Frequency of the 2nd residue in turn; f(i + 2): Frequency of the 3rd residue in turn; f(i + 3): Frequency of the 4th residue in turn.

Figure 1. 20-1 feedforward backpropagation neural network to model the relationship between 20 pieces of information on primary structure of β-glucosidase, which are labeled using the symbols of 20 types of amino acids, and its temperature. Each diamond presents a neuron. IW{1} is the input weights, LW{2,1} is the layer weights to the second layer from the first layer. b{1} and b{2} are the biases related to each neuron at the first and second layers.

Table 2. Difference between normalized polarizability index (σα) and amino acid distribution probability in β-glucosidases A9UIG0 and Q4U4W7.

2.4. Validation of Predictions

Each predictor went through this predictive model with same procedures in order to compare its output statistically. Table 3 lists a total of 40 β-glucosidases to be analyzed, of which 25 were used to generate the weights and biases in neural network as training group, and 15 were used to validate the neural network with trained weights and biases as validation group. This is a traditional way used in neural network. Then, the delete-1 observation jackknife was used and each time one observation was left out from the sample set for validation, because it is most effective in comparison with independent dataset test and subsampling test, and is widely used [41]. Finally, cross-validation was used, and the data were split into 10 or 4 subsets, which had 4 or 10 cases and was held out in turn as the validation set [42].

2.5. Statistics

One hundred trainings were conducted for each predictor in the predictive model, and their weights and biases were used to predict the temperature optimum 100 times. The mean and standard deviation of predicted values were compared with the recorded temperature optimum for each β-glucosidase [43], and linear regression was also used to evaluate the predicted temperature values with their recorded ones.


Theoretically, the neural network displayed in Figure 1 can account for various linear and nonlinear relationships between amino acid properties of primary structure and temperature optimum of β-glucosidases, which can guarantee the screening of various predictors, no matter whether the relationship between predictors and temperature is linear or nonlinear [39].

Technically, the initialization of weights and biases and number of training epochs govern whether the neural network can converge during training process, for which the weights and biases were initialized by random initialization function, and 250 training epochs were conducted. Only 4 out of 25 amino acid properties can be converged and shown in Figure 2, where each line represents that a training process contains random initialization of weights and biases with 250 training epochs. As seen, the convergence can be reached within 250 training epochs with any random initialization, which lays the foundation to guarantee the training process, indicating that these 4 properties can be served as predictors to predict the temperature optimum of β-glucosidases. However, it can be found that different predictors have different profiles of their convergence and the convergence of profiles of amino-acid distribution probability (bottom panel) reached narrower than others.

Table 3 demonstrates the comparison of recorded temperature optimum with predicted temperature optimum for 40 β-glucosidases. If there is no statistical difference between recorded and predicted temperature optimum, a predictor would be considered workable. Accordingly, if no statistical difference was found between recorded and predicted temperature optimum, the predicted temperature optimum is marked with asterisk. The last row in Table 3 summarizes the overall performance, where it can be seen that the normalized polarizability index (σα) and amino-acid distribution probability works better than the other two.

Figure 3 displays the percentage of β-glucosidases with correctly predicted temperature during the training process. As can be seen, the amino-acid distribution probability worked best in training group, which resulted that the temperature optimum of all β-glucosidases was correctly predicted, and followed by the normalized polarizability index (88%), whereas only the normalized polarizability index reached 60% of correctly prediction in validation group. Figure 4 visualized the regression between recorded and predicted temperature optimum by using these four amino-acid properties as predictors. Figure 5 shows the results of delete-1, delete-4 and delete-10 jackknife validations, where it can be seen that both normalized polarizability index and amino-acid distribution probability gave better performance and that there was generally no significant difference between different deletions.

In conclusion, many studies have been focused on revealing the structure-function relationship of enzymes [44 - 46]. This study is consistent with our previous studies [9 - 19], demonstrating that some predictors

Figure 2. Convergence of mean squared error performance function with 100 different initial weights and biases generated by random initialization function.

Table 3. Comparison between recorded and predicted temperature optimum in 40 of β-glucosidases. The predicted temperature optimum was presented as mean ± SD of 100 predictions. AA, the amino-acid composition; AA DP, amino-acid distribution probability. *, no statistical difference with the recorded temperature optimum.

Figure 3. Percentage of β-glucosidases with correctly predicted pH. The training and validation groups contained 25 and 15 β-glucosidases.

Figure 4. Linear regression between recorded and predicted temperature optimum in training and validation groups, respectively. Linear regressions for training groups are: (1) Temperature Optimum = 8.2944 × (σα × AA composition) + 0.8352, P < 0.0001; (2) Temperature Optimum = 7.1604 × (AI × AA composition) + 0.8563, P < 0.0001; (3) Temperature Optimum = 13.3125 × (f(i) × AA composition) + 0.7368, P < 0.0001; (4) Temperature Optimum = 0.0166 × AA distribution probability + 0.9997, P < 0.0001. Linear regressions for validation groups are: (1) Temperature Optimum = 0.4935 × (σα × AA composition) + 36.8632, P = 0.0783; (2) Temperature Optimum = −0.0079 × (AI × AA composition) + 59.0869, P = 0.9726; (3) Temperature Optimum = 0.1182 × (f(i) × AA composition) + 52.6179, P = 0.6118; (4) Temperature Optimum = 0.4216 × AA distribution probability + 43.9512; P = 0.0071.

Figure 5. Percentage of β-glucosidases with correctly predicted temperature. The validation among 40 β-glucosidases was conducted using MatLab by means of delete-1, delete-4 and delete-10 jackknifing. AA, amino-acid.

do have a promising prospective to predict the enzymatic optimal working conditions based on the information related to enzyme primary structure. Surely, further efforts are needed to explore a cost-effective way to predict various enzymatic parameters of β-glucosidases.


This study was supported by National Natural Science Foundation of China (31560315), and Key Project of Guangxi Scientific Research and Technology Development Plan (AB17190534).

Cite this paper
Yan, S. and Wu, G. (2019) Predictors for Predicting Temperature Optimum in Beta-Glucosidases. Journal of Biomedical Science and Engineering, 12, 414-426. doi: 10.4236/jbise.2019.128033.
[1]   Jeng, W.Y., Wang, N.C., Lin, M.H., Lin, C.T., Liaw, Y.C., Chang, W.J., Liu, C.I., Liang, P.H. and Wang, A.H. (2011) Structural and Functional Analysis of Three β-Glucosidases from Bacterium Clostridium cellulovorans, Fungus Trichoderma reesei and Termite Neotermes koshunensis. Journal of Structural Biology, 173, 46-56.

[2]   Kacher, Y., Brumshtein, B., Boldin-Adamsky, S., Toker, L., Shainskaya, A., Silman, I., Sussman, J.L. and Futerman, A.H. (2008) Acid β-Glucosidase: Insights from Structural Analysis and Relevance to Gaucher Disease Therapy. Biological Chemistry, 389, 1361-1369.

[3]   Granovsky-Grisaru, S., Belmatoug, N., vom Dahl, S., Mengel, E., Morris, E. and Zimran, A. (2011) The Management of Pregnancy in Gaucher Disease. European Journal of Obstetrics & Gynecology and Reproductive Biology, 156, 3-8.

[4]   Chen, K.I., Erh, M.H., Su, N.W., Liu, W.H., Chou, C.C. and Cheng, K.C. (2012) Soyfoods and Soybean Products: from Traditional Use to Modern Applications. Applied Microbiology & Biotechnology, 96, 9-22.

[5]   Dashtban, M., Maki, M., Leung, K.T., Mao, C. and Qin, W. (2010) Cellulase Activities in Biomass Conversion: Measurement Methods and Comparison. Critical Reviews in Biotechnology, 30, 302-309.

[6]   Wilson, D.B. (2009) Cellulases and Biofuels. Current Opinions in Biotechnology, 20, 295-299.

[7]   Schomburg, I., Chang, A., Hofmann, O., Ebeling, C., Ehrentreich, F. and Schomburg, D. (2002) BRENDA: A Resource for Enzyme Data and Metabolic Information. Trends in Biochemical Sciences, 27, 54-56.

[8]   Placzek, S., Schomburg, I., Chang, A., Jeske, L., Ulbrich, M., Tillack, J. and Schomburg, D. (2017) BRENDA in 2017: New Perspectives and New Tools in BRENDA. Nucleic Acids Research, 45, D380-D388.

[9]   Yan, S. and Wu, G. (2011) Searching of Predictors to Predict pH of Cellulases. Applied Biochemistry and Biotechnology, 165, 856-869.

[10]   Yan, S. and Wu, G. (2013) Prediction of Optimal pH in Hydrolytic Reaction of Beta-Glucosidase. Applied Biochemistry and Biotechnology, 169, 1884-1894.

[11]   Yan, S. and Wu, G. (2012) Prediction of Optimal pH and Temperature of Cellulases Using Neural Network. Protein & Peptide Letters, 19, 29-39.

[12]   Yan, S., Shi, D., Nong, H. and Wu, G. (2011) Simultaneously Predicting pH and Temperature Optimum in Catalytic Reaction of Beta-Glucosidase. Guangxi Sciences, 18, 253-260.

[13]   Yan, S. and Wu, G. (2019) Predicting pH Optimum for Activity of Beta-Glucosidases. Journal of Biomedical Science and Engineering, 12, 354-367.

[14]   Yan, S. and Wu, G. (2012) Exhausted Jackknife Validation Exemplified by Prediction of Temperature Optimum in Enzymatic Reaction of Cellulases. Applied Biochemistry and Biotechnology, 166, 997-1107.

[15]   Yan, S. and Wu, G. (2013) Prediction of Temperature Optimum in Enzymatic Reaction of Beta-Cellobiosidases with Exhausted Jackknife Validation. Life Science Journal, 10, 1673-1678.

[16]   Yan, S. and Wu, G. (2011) Prediction of Michaelis-Menten Constant in Beta-Cellobiosidase’s Reaction with Lactoside as Substrate. Enzyme Engineering, 1, 102.

[17]   Yan, S. and Wu, G. (2011) Prediction of Michaelis-Menten Constant of Beta-Glucosidases Using Nitrophenyl-Beta-D-Glucopyranoside as Substrate. Protein & Peptide Letters, 18, 1053-1057.

[18]   Yan, S., Shi, D., Nong, H. and Wu, G. (2012) Predicting Km Values of Beta-Glucosidases Using Cellobiose as Substrate. Interdisciplinary Sciences: Computational Life Sciences, 4, 46-53.

[19]   Yan, S. and Wu, G. (2013) Prediction of Turnover Number of Cellulose 1,4-Beta-Cellobiosidase. Protein & Peptide Letters, 20, 255-264.

[20]   Berrin, J.G., Czjzek, M., Kroon, P.A., McLauchlan, W.R., Puigserver, A., Williamson, G. and Juge, N. (2003) Substrate (aglycone) Specificity of Human Cytosolic Beta-Glucosidase. Biochemical Journal, 373, 41-48.

[21]   Tsukada, T., Igarashi, K., Fushinobu, S. and Samejima, M. (2008) Role of Subsite +1 Residues in Temperature Dependence and Catalytic Activity of the Glycoside Hydrolase Family 1 Beta-Glucosidase BGL1A from the Basidiomycete Phanerochaete chrysosporium. Biotechnology and Bioengineering, 99, 1295-1302.

[22]   Gundllapalli, S.B., Pretorius, I.S. and Cordero Otero, R.R. (2007) Effect of the Cellulose-Binding Domain on the Catalytic Activity of a Beta-Glucosidase from Saccharomycopsis fibuligera. Journal of Industrial Microbiology & Biotechnology, 34, 413-421.

[23]   Chen, H., Hayn, M. and Esterbauer, H. (1992) Purification and Characterization of Two Extracellular Beta-Glucosidases from Trichoderma reesei. Biochimica et Biophysica Acta, 1121, 54-60.

[24]   UniProt Consortium (2019) UniProt: A Worldwide Hub of Protein Knowledge. Nucleic Acids Research, 47, D506-D515.

[25]   Burlingame, A.L. and Carr, S.A. (1996) Mass Spectrometry in the Biological Sciences. Humana Press, Totowa, NJ.

[26]   Zamyatin, A.A. (1972) Protein Volume in Solution. Progress in Biophysics & Molecular Biology, 24, 107-123.

[27]   Darby, N.J. and Creighton, T.E. (1993) Dissecting the Disulphide-Coupled Folding Pathway of Bovine Pancreatic Trypsin Inhibitor. Forming the First Disulphide Bonds in Analogues of the Reduced Protein. Journal of Molecular Biology, 232, 873-896.

[28]   Kyte, J. and Doolittle, R.F. (1982) A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology, 157, 105-132.

[29]   Trinquier, G., Sanejouand, Y.H. and Hausman, R.E. (1998) Which Effective Property of Amino Acids is Best Preserved by the Genetic Code? Protein Engineering, Design and Selection, 11, 153-169.

[30]   Cooper, G.M. (2004) The Cell: A Molecular Approach. ASM Press, Washington DC, 51.

[31]   Dwyer, D.S. (2005) Electronic Properties of Amino Acid Side Chains: Quantum Mechanics Calculation of Substituent Effects. BMC Chemical Biology, 5, 2.

[32]   Chou, P.Y. and Fasman, G.D. (1978) Prediction of Secondary Structure of Proteins from Amino Acid Sequence. Advances in Enzymology and Related Subjects of Biochemistry, 47, 45-148.

[33]   Feller, W. (1968) An Introduction to Probability Theory and Its Applications. 3rd Edition, Wiley, New York.

[34]   Wu, G. and Yan, S. (2008) Prediction of Mutations Engineered by Randomness in H5N1 Hemagglutinins of Influenza A Virus. Amino Acids, 35, 365-373.

[35]   Wu, G. and Yan, S. (2008) Lecture Notes on Computational Mutation. Nova Science Publishers, New York.

[36]   Yan, S. and Wu, G. (2009) Descriptively Quantitative Relationship between Mutated N-Acetylgalactosamine-6-Sulfatase and Mucopolysaccharidosis IVA. Peptide Science, 92, 399-404.

[37]   Yan, S. and Wu, G. (2010) Prediction of Mutation Positions in H5N1 Neuraminidases by Means of Neural Network. Annals of Biomedical Engineering, 38, 984-992.

[38]   Yan, S. and Wu, G. (2010) Linking Mutated Structure of Adrenoleukodystrophy Protein with X-Linked Adrenoleukodystrophy. Computer Methods in Biomechanics and Biomedical Engineering, 13, 403-411.

[39]   Demuth, H. and Beale, M. (2001) Neural Network Toolbox for Use with MatLab. User’s Guide, Version 4, MathWorks Inc., Natick, MA.

[40]   MathWorks Inc (1984-2001) MatLab-The Language of Technical Computing (Version, Release 12.1). MathWorks Inc., Natick, MA.

[41]   Chou, K.C. and Zhang, C.T. (1995) Prediction of Protein Structural Classes. Critical Reviews in Biochemistry and Molecular Biology, 30, 275-349.

[42]   Chou, K.C. and Shen, H.B. (2010) Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization. PLoS One, 5, e11335.

[43]   Sokal, R.R. and Rohlf, F.J. (1995) Biometry: The Principles and Practices of Statistics in Biological Research. 3rd Edition, W. H. Freeman, New York, 203-218.

[44]   Campbell, R.L. and Davies, P.L. (2012) Structure-Function Relationships in Calpains. Biochemical Journal, 447, 335-351.

[45]   Sacchi, S., Caldinelli, L., Cappelletti, P., Pollegioni, L. and Molla, G. (2012) Structure-Function Relationships in Human D-Amino Acid Oxidase. Amino Acids, 43, 1833-1850.

[46]   Silavi, R., Divsalar, A. and Saboury, A.A. (2012) A Short Review on the Structure-Function Relationship of Artificial Catecholase/Tyrosinase and Nuclease Activities of Cu-Complexes. Journal of Biomolecular Structure and Dynamics, 30, 752-772.