Industrial advancement has resulted in the use of large amounts of chemicals entering the human life cycle. According to the American Chemistry Council 2020, the business of chemistry is a large user of natural gas and petroleum products . The petroleum products include the saturated hydrocarbons or alkanes. The saturated hydrocarbon falls into a large n-linear, branch, and cyclic alkanes. The physical phase alkane series consists of gases (methane, ethane, propane, and butanes), liquid phase from pentanes to hexadecane, and longer chain solids. These saturated hydrocarbons have been the subject of extensive attention because they encompass one of the foundations of petrochemistry. In petrochemistry, alkanes are the major constituents of natural gas and crude oil . For example, pentane is mostly used in the formulation of gasoline. On the other hand, hexane is well known to be used as a solvent in glues, varnishes, cements, and other product. Meanwhile, 2-methylpropane is used in principal feedstock of alkylation units of refineries . 2-methylpropane, 2-methylbutane, n-Butane, and propane are widely used in the cosmetic industry as aerosol propellants . The day-to-day life demand for alkane products may impact the environment, especially in terms of toxicity to aquatic life and subsequently to biodiversity . Furthermore, when inhaled, this hydrocarbon can cause central nervous system depression . Liquid hydrocarbon such as pentane, hexane, and octane is also solvents for fats. When these liquid hydrocarbons are in contact with the skin, they are capable of removing fat, and resulting in dryness, scaling, and skin inflammations . Surprisingly, the exposure to n-hexane or solvent containing a high concentration of n-hexane shows polyneuropathy syndrome .
Upon realizing this issue, a method for biological activity and toxicity assessment needs to be investigated on alkane-based compounds on the molecular structure. According to Phillips and his co-workers, the prediction of the toxicant using QSAR method must use chemicals with the same group compound; otherwise, the different toxicant actions involved will limit the prediction of toxicity . Getting toxicants with the same group compounds is very challenging in QSAR, due to the limitation of retrieving experimental data. The mechanisms involved in biological action are also too complex, and it is inadequate to describe it using a model alone. Therefore, this study investigates the biological activity and toxicity of alkane groups. In this study, the relationship of molecular structure and biological activity and toxicity are investigated based on the molecular descriptors. The molecular descriptors used in this study include chemical hardness, chemical potential, dipole moment, non-linear polarizability, and graph energy index. These molecular descriptors are known as non-linear molecular properties. The molecular has been calculated using semi-empirical calculation due to fast comparison with another ab initio method and low cost in terms of computer resource . The major limitations of semi-empirical are some parameters that may not be available especially for transition metal . However, in this study, we only employ the alkanes which are in the range of the calculation. The quantum molecular descriptor relationships are employed to predict the unmeasured values of the considered properties of compounds. Furthermore, the relationships can be extended to design non-existent structures possessing some desirable properties. The volatile component of the alkane from the biological experiment using chromatography is also investigated. To best our knowledge, there are non-articles on 3-D QSARs examining the biological effects and toxicology of alkanes using the semi-empirical method as a molecular descriptor, especially on non-linear molecular properties.
2. Methods and Calculation
2.1. Calculation of the Quantum Molecular Descriptor
Semi-empirical quantum chemical method is the computational method to calculate the electronic and molecular orbital properties using self-consistent field molecular orbital theory. The integration method employs parametric method 6 (PM6). This quantum molecular descriptor was calculated using semi-empirical MOPAC2016, Version: 21.002 James J. P. Stewart software . The input of the structure and geometry optimization was generated using Avogadro version 1.2.0. The geometry optimization used force field method MMFF94s with step per update 4. The molecular structure was generated using 3-dimension Avogadro interface, and the molecular descriptor involved three coordinates. Therefore, this method employed the 3D-QSAR approach. The calculation was done using PM6 parameters.
2.2. Definitions of the Quantum Molecular Descriptors Used
αij: the dynamic linear polarizability constant at ω = 0.25 eV. The value of i and j represented tensor vector inx, y or z.
βi: the average value of the hyperpolarizability at ω = 0.25 eV. The quantity of interest is defined as βi = [βiii + βijj + βikk] .
γijkl: the second order hyperpolarizability at ω = 0.25 eV. The value i, j, k and l represented tensor vector in x, y and z.
εHOMO: the energy of the highest occupied molecular orbital.
εLUMO: the energy of the lowest unoccupied molecular orbital.
dp: the molecular dipole moment contributed from the net charge density .
dhyb: the molecular dipole moment contributed from a hybridization of molecular orbital .
COS: the Conductor like screening model (COSMO) surface area.
μ: the chemical potential. The chemical potential can defined as, μ = (εHOMO – εLUMO)/2.
η: the molecular chemical hardness. The η can be defined as, η = (εHOMO + εLUMO)/2.
W: the electrophilicity index. The electrophilicity is defined by Parr et al. that is W = μ2/2η .
W–: the electrodonating power. This index explain chemical response to the donation of electron which given by (3εHOMO – εLUMO)2/16(εHOMO + εLUMO) .
GE1: the graph energy index. The graph energy index is the sum of eigen value. The index is defined as where λ is the negative eigen value.
GE2: the graph energy index with positive eigen value where λ is the positive eigen value.
Est: the Estrada index. The Estrada energy index is defined as .
2.3. Data Preparation and Analysis
The chemical data was collected using the Reaxys database. Reaxys is a well-known curated database for finding and comparing relevant chemical information from different sources that attempt to bring together chemical, biological activity and toxicogenomic into their center location  . In this study, all biological activity and toxicology values were extracted from a single reference. This was done to ensure a compatible relationship among them. The data used in this study are presented in the appendix section  - . Inferential statistical analyses were performed. Multiple linear regression was applied to generate the prediction models. The multiple linear regression is a relatively simple approach pursuit to model the linear relationship between both independent and explanatory variables by fitting the linear equation. Variable selection in a block was entered and calculated using a single step. The researchers performed regression using SPSS statistical software. The chosen regression was based on a good fitting with the regression equation’s standard of deviation not less than 0.0001. The observed plot vs calculated toxicity with a 95% degree of confidence was plotted using Minitab. The descriptors had very small standard deviations, being inadequate to represent the changes in electronic structure from molecule to molecule . Principal Component Analysis (PCA) was used to determine the inherent dimensionality of groups of molecular descriptor properties. The orthogonal PROMAX rotation was used during the PCA analysis.
3. Results and Discussion
3.1. Quantum Molecular Descriptor
Alkanes are hydrocarbons derived from petroleum processing streams. These alkanes contain only carbon and hydrogen atoms. It contains carbon numbers ranging from approximately C5 - C20 with three-type constituents: normal paraffins, isoparaffins, and cycloparaffins. These constituents result in different physical and chemical properties . Tables 1-3 show the electronic and molecular orbital for alkanes generated from the semi-empirical calculation. To understand the factor contribution of each descriptor, the researchers plotted the component analysis based on ProMax rotation as shown in Figure 1. Five principal component factors are classified. The first component is the dynamic linear polarizability constant (αxx, αyy and αzz), conductor-like screening model (COSMO), Ge1 Ge2 second order hyperpolarizability of γxxxx and γxxyy. The second component is the average value of the hyperpolarizability of βz, hybrid and the net charge dipole moment. The third component is the second order hyperpolarizability of γyyyy and γzzzz. The fourth component is the average value of hyperpolarizability of βx and βy. The last component is the electrophilicity index, second order hyperpolarizability of γyyyy and γyyzz.
Inside each cluster, the molecular properties differed for their dimension, polarizability, and isomerization. The occurrence of various descriptor components was clustered using a different classification of molecular electronic property
Table 1. The value of quantum descriptor of alkanes.
Table 2. The value of quantum descriptor of alkanes.
Table 3. The value of quantum descriptor of alkanes.
Figure 1. Component plot for quantum molecular descriptor of alkanes.
descriptors, based on the calculation of the molecular spherical coordinate system, the localization of electron, oscillating electrical field, and electron-electron interaction. The cumulative total variance of all five components is 85.136%, by which most of the contributions came from component 1 (40.434%). This contribution might be classified according to the position of carbon atoms to the molecule in the spherical coordinate system based on self-consistency molecular orbital . The second principal contribution is 19.363% which is plausibly related to the orientation of dipole moment in the molecule . The third contribution for the cumulative total variance is 13.115%. The fourth and fifth contributions are 6.242% and 5.982% respectively.
3.2. Quantitative Structure Retention Relationship (QSRR) Analysis
The biological activity is normally related with the molecular interaction that goes through the compound during the transport through biological membranes, or in the reaction with the active site. This interaction is strongly related to the molecular chemical structure . A change in the structure can result in a change in biological response. One of the techniques to measure the biological response is by manifesting a change in the chromatographic retention data. The prediction of chromatography retention indexes can also be explained using quantum chemical descriptors or parameters. The volatile component of Scorzonera hispanica L has been demonstrated by a wide variety of molecules . The statistical analysis yields the following results in which n is the number of samples, r and r2 are the correlation coefficients, and s is the standard deviation. In this study, the researchers only discussed three best regression Equations (with the highest value of r and r2) that are given in Equations (1a)-(1c). The chromatography retention indexes for alkanes in this series were computed. The three best statistical analyses yield the following results:
Linear alkanes were also used as a reference to calculate the retention index for the volatile components from marjoram oil sample . The computed isotherm retention time for alkanes in this series used statistical analysis, yielding the following results:
The researchers computed the series of correlation with chromatography retention index to produce branched alkanes by Cynobaterium Microcoleus vigantus.  The correlation with quantum molecular descriptor was found, yielding the following results:
The obtained results show that the best regression r2 descriptors are different between experiments. This might be due to the number of samples being different from each other. To determine the experiment modeling with the quantum chemical descriptor, the molecular structure plays an important role. Equations (1a) and (2a), and the retention index can fit with COSMO and dipole moment descriptor. They also showed the best r and r2 values in both results. In chromatography, the retention data are proportional to the free-energy change of solute-stationary phase interactions changed by the mobile phase. This shows that the molecular interaction with the molecules affect the regression model. The molecules such as butane, 3,4-Dimethylhexane and hexane (the total dipole moment is zero) tend to induce London dispersion forces. While the molecules with the total dipole moment is not zero, influenced by van der Waals forces. The linear polarizability constant (α) and second hyperpolarizability (γ) are the physical quantities derived from dipole moment interaction with static and fluctuation of electrical field. Both quantities show a good correlation with the retention index. Therefore, the effects of charge screening in molecule and dipole moment are correlated with the retention index.
The analysis of electrophilicity index and the molecular chemical hardness (in Equation (3a)) is related to molecular electron density distribution on HOMO and LUMO that are localized in the atoms. Chemical hardness is a property that measures the stability and reactivity of a molecule . Therefore, the change of molecular structure, sightly changes the electron cloud deformation which affects the retention constant. The Estrada index also gives a good regression relation in the quantitative retention structure analysis. The Estrada index is based on the topological molecular descriptor based on eigenvalues of an adjacency matrix. The Estrada index is also suitable for molecular descriptor which is comparable with the molecular topology index using the graph theory approach . This index is able to represent a mathematical numeric to character the molecules, especially the isomers structures which fit the retention constant.
3.3. Toxicology Analysis
Hydrocarbons include a vast number of existing chemicals in the environment and consumerism, and there is a need to study the toxicology impact on both environment and human. Hence, an alternative approach is needed to seek the generalities of “structure-activity relationships” from the presence of toxicological data. This can be used to predict another chemical isomer with similar effects. The in vitro oxidation by reconstituted enzyme system using alkane as substrate is demonstrated . This report can provide additional information on the toxicity of substrate to the whole cells. The best three statistical analyses yield the following results:
The acute toxicology of aqueous solutions of hydrocarbon to Daphnia magna is reported . The researchers found the correlation with quantum molecular descriptors, yielding the following results. Figure 2 shows the correlation between the experimental and predicted LC50 values obtained from the dynamic linear polarizability descriptor with our Equation (5a):
Figure 2. Plot of observed LC50 vs calculated LC50 using dynamic linear polarizability descriptor.
A number of aliphatic hydrocarbons after intravenous injections of emulsion formulations into Mice were examined . The correlation with quantum molecular descriptor was found, yielding the following results. Figure 3 shows the correlation between the experimental and predicted LD100 values obtained from the second order hyperpolarizability descriptor with our Equation (6a).
The aquatic toxicity of hydrocarbon to aquatic organism was reported . The correlation with quantum molecular descriptor was found, yielding the following results. Figure 4 shows the correlation between the experimental and predicted EC50 values obtained from the dynamic linear polarizability descriptor with our Equation (5a).
Figure 3. Plot of observed LD100 vs calculated LD100 using second order hyperpolarizability.
Figure 4. Plot of observed EC50 vs calculated EC50 using second order hyperpolarizability.
The process of toxicology analysis was done through computational techniques, and the relation of the molecular structure with the interaction subject can be represented by this relation:
The physiochemistry includes chemical properties, electronic properties, molecular topology, thermodynamic and optical properties . The modification in molecular structure will change the biological activity. In linear form, the activity can be described by:
where xn is the molecular descriptor and an is the constant. The investigation of quantitative relationships between chemical structures characterized by electronic properties is one of the most important tools in biological activity such as toxicology analysis. The regression Equations (4)-(7) show that the electronic properties such as the electrophilicity index, chemical hardness, dynamic linear polarizability, second order hyperpolarizability, graph energy, dipole moment and conductor like screening model show a good relationship with the biological activity.
The electrophilicity index and chemical hardness show the best regression for in vitro oxidation by the reconstituted enzyme system. The electrophilicity index shows the stabilization energy when the molecules receive the charges from the environment. The electrophilicity is also related to the chemical potential, which can be related to the binding of environment in the biological system. The chemical hardness describes the transfer of charge in the system with high charge density. These two descriptors are useful indicators to estimate enzyme activities on alkane substrates. The regression result shows that both descriptors are best fitting to relate with enzyme activity system. This is also agreeable with the result by Grillo and his coworkers .
The GE index is a new molecular descriptor related to the characteristic value of eigen function. The eigen function is assigned to Linear Combination Atomic Orbital Self-Consistent Filed (LCOASCF) molecular orbital. This index corresponds to orbital energy of the molecular structure. GE index is capable of describing the binding of energy interaction, the total electron exchange energy, electrostatic that exists in the molecule and the resonance energy  . Therefore, GE index can explain the chemical interpretation based on the electronic properties of molecular structure involved in the biochemical interaction. Concerning the GE index as molecular quantum description in the biological activity and toxicity, it is a good fitting with the regression equation.
The non-linear polarizability of the molecule is an important physical property that draws our attention in chemical-biological interaction. The non-linear polarizability is a measurement of distortion of molecules in an electric field. This property measures the strength of molecular interactions such as long-range intermolecular induction, dispersion forces, scattering and electrons interaction . The linear polarizability (α) shows a good regression fitting for most biological activities and toxicities. Hansch and Kurup also reported that the linear polarizability shows a good regression result with their toxicity study . The hyperpolarizability descriptor is related with non-linear polarizability constant. This descriptor shows a moderate regression fitting (with the value of r between 0.901 - 0.788). The second order hyperpolarizability shows the best regression fitting in most biological activities and toxicities. This might be due to the induced non-linear dipole moment interaction with the applied field, giving a fine electronic property in the molecular structure. The electron-correlation effect in the microscopic polarizability calculation produced a good correlation in predicting the qualitative trends for structure-property relationships .
The conductor like screening model (COSMO) area with molecular dipole moment is surprisingly a good fitting for enzyme activity and toxicity. COSMO area is related with the surface charge densities on the neighboring segments. The COSMO area is an effective area of the screening surface. The screening surface is related to the perturbation Coulomb interaction in the molecule. The screening depends on the localization of charge, and the molecular polarizability. The molecular polarization is contributed by electronic, vibration and rotation. This molecular polarization contribution is also related with molecular dipole moment. In the researchers’ work, the COSMO area and molecular dipole moment are important molecular properties in relating the biological and toxicology activities.
The researchers have demonstrated that the molecular quantum descriptors calculated from semi-empirical calculation are applicable for the development of quantitative structure activity/retention relationship. These molecular quantum descriptors include the electrophilicity index, chemical hardness, dynamic linear polarizability, conductor-like screening model, graph energy index, second-order hyperpolarizability, Estrada index, and molecular dipole moment. The molecular quantum descriptors are classified into five principal component factors. The molecular quantum descriptors generated from semi-empirical calculations give a good correlation with the retention index, biological activity, and toxicity. This shows that the molecular quantum descriptors have good reliability to become a new approach in QSAR and QSRR.
In future studies, a comparison calculation of molecular descriptor can be calculated using density functional theory (DFT). DFT includes electron correlation in the ab initio calculation. The non-linear molecular properties also can be calculated using B3LYP, Moller-Plesset perturbation, configuration interaction and coupled-cluster theory . This method can be used to compare QSAR accuracy with semi-empirical method.
The authors would like to thank Dr. James J. P. Stewart from MOPAC Inc. for his permission to use the MOPAC software, and UiTM’s Department of Infostructure for the SPSS and MiniTab software usage permission.
Table A1. Table of the retention index (Log Rt1, Log Rt2 and Log Rt3), biological activity (Log (Ea)) and toxicity (Log LC50, Log LD50, Log LD100 and Log EC50) used in this study.
1QSAR is a mathematical modelling that using the machine learning as a tool in the modelling development.
 Wang, Y., Hu, P., Yang, J., Zhu, Y.-A. and Chen, D. (2021) C-H Bond Activation in Light Alkanes: A Theoretical Perspective. Chemical Society Reviews, 50, 4299-4358.
 Vora, B.V., Kocal, J.A., Barger, P.T., Schmidt, R.J. and Johnson, J.A. (2003) Alkylation. In: Othmer, K., Ed., Kirk-Othmer Encyclopedia of Chemical Technology, John Wiley & Sons, Inc., Hoboken, 169-203.
 Moore, A.F. (1982) Final Report of the Safety Assessment of Isobutane, Isopentane, N-Butane, and Propane. Journal of the American College of Toxicology, 1, 127-142.
 Christian, W.C., Butler, T.M., Ghannam, R.B., Webb, P.N. and Techtmann, S.M. (2020) Phylogeny and Diversity of Alkane-Degrading Enzyme Gene Variants in the Laurentian Great Lakes and Western Atlantic. FEMS Microbiology Letters, 367, Article No. fnaa182.
 Phillips, J.C., Gibson, W.B., Yam, J., Alden, C.L. and Hard, G.C. (1990) Survey of the QSAR and in Vitro Approaches for Developing Non-Animal Methods to Supersede the in Vivo LD50 Test. Food and Chemical Toxicology, 28, 375-394.
 Zerner, M.C. (1991) Semiempirical Molecular Orbital Methods. In: Lipkowitz, K.B. and Boyd, D.B., Eds., Reviews in Computational Chemistry, Vol. 2, Wiley-VCH, Inc., New York, 313-365.
 Kurtz, H.A., Stewart, J.J.P. and Dieter, K.M. (1990) Calculation of the Nonlinear Optical Properties of Molecules. Journal of Computational Chemistry, 11, 82-87.
 Pople, J.A., Santry, D.P. and Segal, G.A. (1965) Approximate Self-Consistent Molecular Orbital Theory. I. Invariant Procedures. The Journal of Chemical Physics, 43, S129-S135.
 Akhondi, S.A., Rey, H., Schwörer, M., Maier, M., Toomey, J., Nau, H., et al. (2019) Automatic Identification of Relevant Chemical Compounds from Patents. Database, 2019, Article No. baz001.
 Dembitsky, V.M., Dor, I., Shkrob, I. and Aki, M. (2001) Branched Alkanes and Other Apolar Compounds Produced by the Cyanobacterium Microcoleus Vaginatusfrom the Negev Desert. Russian Journal of Bioorganic Chemistry, 27, 110-119.
 Bobra, A.M., Shiu, W.Y. and Mackay, D. (1983) A Predictive Correlation for the Acute Toxicity of Hydrocarbons and Chlorinated Hydrocarbons to the Water Flea (Daphniamagna). Chemosphere, 12, 1121-1129.
 van Beilen, J.B., Kingma, J. and Witholt, B. (1994) Substrate Specificity of the Alkane Hydroxylase System of Pseudomonas oleovorans GPo1. Enzyme and Microbial Technology, 16, 904-911.
 Jeppsson, R. (1975) Parabolic Relationship between Lipophilicity and Biological Activity of Aliphatic Hydrocarbons, Ethers and Ketones after Intravenous Injections of Emulsion Formulations into Mice. Acta Pharmacologica et Toxicologica, 37, 56-64.
 Ashraf, C., Joshi, N., Beck, D.A.C. and Pfaendtner, J. (2021) Data Science in Chemical Engineering: Applications to Molecular Science. Annual Review of Chemical and Biomolecular Engineering, 12, 15-37.
 Khan, M.F., Nahar, N., Rashid, R.B., Chowdhury, A. and Rashid, M.A. (2018) Computational Investigations of Physicochemical, Pharmacokinetic, Toxicological Properties and Molecular Docking of Betulinic Acid, a Constituent of Corypha taliera (Roxb.) with Phospholipase A2 (PLA2). BMC Complementary and Alternative Medicine, 18, Article No. 48.
 Jana, G., Pal, R., Sural, S. and Chattaraj, P.K. (2020) Quantitative Structure-Toxicity Relationship Models Based on Hydrophobicity and Electrophilicity. In: Roy, K., Ed., Ecotoxicological QSARs, Humana, New York, 661-679.
 Grillo, I.B., Urquiza-Carvalho, G.A., Fernando Ruggiero Bachega, J. and Bruno Rocha, G. (2020) Elucidating Enzymatic Catalysis Using Fast Quantum Chemical Descriptors. Journal of Chemical Information and Modeling, 60, 578-591.
 Perrin, E. and Prasad, P.N. (1989) Ab Initio Calculations of Polarizability and Second Hyperpolarizability in Benzene Including Electron Correlation Treated by Mo/ller-Plesset Theory. The Journal of Chemical Physics, 91, 4728-4732.
 Barca, G.M.J., Bertoni, C., Carrington, L., Datta, D., De Silva, N., Emiliano Deustua, J., et al. (2020) Recent Developments in the General Atomic and Molecular Electronic Structure System. The Journal of Chemical Physics, 152, Article ID: 154102.