Sunflower oil is obtained from good mature sunflower seeds of the species of a family Helianthus annuus L of a family Compositae, by a process of expression or solvent extraction  . Sunflower oil obtained by either expression or solvent extraction exists in two forms of either raw grade or refined grade. The current main analytical methods for most vegetable oils have been wet chemistry methods like titration or chromatographic methods which are time consuming and laborious  . These methods are designed for analysis of only one specific parameter at a time and tend to be tedious, expensive and often require hazardous solvents and reagents. It is due to these reasons that a fast and relatively cheap method like Near Infrared Spectroscopy (NIR) has been designed.
NIR Spectroscopy offers the following advantages as compared to conventional methods; it is a non-invasive and non-destructive technique. It requires minimal or no sample preparation. It is environmentally friendly as it doesn’t use organic solvents  . Measurement and result delivery are quite fast (within one minute). Other advantages are automation of the technique that results in increased throughput and use of a single spectrum that allows several analytes to be determined simultaneously. The instrument is portable so that it can be taken to the field for routine analysis of products. Despite the above advantages, NIR measurements are scarcely selective, therefore chemometric techniques have to be used to model data and extract relevant information. Construction of NIR models also requires substantial investment in time, resources and energy  .
The analytical information of NIR spectra is hardly selective and is influenced by a number of physical, chemical and structural variables. Interpretation of near infrared spectra always requires mathematical processing which is called chemometrics. The multivariate techniques most frequently used to allow samples with similar characteristics to be grouped, in order to establish classification methods for unknown samples (qualitative analysis) or to perform methods determining some property of unknown samples (quantitative analysis). Among qualitative methods, it is Mahalanobis distance method that is used for classifying samples in near infrared analysis  . In Mahalanobis distance (M-distance), samples with distance less than 3 standard deviations are considered to be members of the same group as those used to develop the model, while those that have M-distance greater than 3 standard deviations are considered to be non-mem- bers  . Another advantage of using the Mahalanobis measurement for discrimination is that the distances are calculated in units of standard deviation from the group mean    . Therefore, the calculated circumscribing ellipse formed around the cluster actually defines the one sigma or one standard deviation boundary of that group  .
In developing a NIR model, it is also important to pretreat the obtained spectra. This is important so as to eliminate, reduce or standardize interfering spectral parameters, such as light scattering, path length variations and random noise, resulting from variable physical sample properties or instrumental effects. Some of the spectra pretreatments done are mean centering, baseline correction, auto scaling, Standard Normal Variate (SNV) with de-trending and without de-trend- ing, derivatization and others    .
Adulteration of edible oils with other chemicals is a common practice which is harmful, as it has once been reported the adulteration of olive oil with aniline- denatured rapeseed oil and ended affecting about 20,000 people with toxic oil syndrome   . Recently in Tanzania, the Tanzania Bureau of Standards (TBS) suspended the public sale of OKI and VIKING brands of Palm olein oil which were imported into the country from Malaysia after being found to be unfit for human consumption   .
Near Infrared analysis has been used in various studies regarding vegetable oils and seeds like moisture content in soybean  . It has also been used in various studies regarding analysis of sunflower oil or sunflower seeds  . A Near Infrared method for estimating quality parameters of ground and intact high- oleic sunflower achenes has been developed  . NIR analysis has also been applied in the detection and quantification of sunflower oil in the adulterated extra virgin oil from eastern Mediterranean   . NIR method has been successfully applied in authentication of pure camellia oil in China  . In that study, four different pattern recognition chemometric techniques were applied to discriminate pure camellia oil from non-pure ones  . On the other hand, Principal Component Analysis (PCA)  methods to classify vegetable oils by using second-derivative NIR spectra have been applied  . The feasibility of NIR spectroscopy for the determination of Fatty acids composition in soy flour has also been examined  .
Discriminant analysis has been applied in the analysis of vegetable oils by Near-Infrared reflectance spectroscopy using second derivative spectra of four different vegetable oils for discriminant analysis by using Mahalanobis distance principles  . Various other methods like multivariate modeling have been applied to predict butter fat content of spreads in adulteration of butter fats with cheaper vegetable oils. Methods for discrimination and quantification of adulterated olive oils by using Partial Least Square (PLS) algorithm and Discriminant PLS have also been reported  . Near infrared spectroscopy along with multivariate calibration models of PCR and PLS regression have also been used for the qualitative and quantitative determination of adulterants in sandalwood oil  .
2. Materials and Methods
Near infrared spectrophotometer (LabSpec model ASD5000 Spectrophotometer from ASD Inc. Boulder, Colorado, USA) with wavelength range from around 350 - 3000 nm and 0.01 nm resolution was used to capture the spectra. A quartz window cell of 1 mm path length was used as sample holder. Serial port communications by Ethernet wire was used to capture the raw spectral data and store them in a Computer. Data acquisition was by Indico Pro Software from ASD Inc., USA, and analysis by Grams Suit Version 9.0 software from Thermo Fisher Scientific Inc., USA.
Virgin Sunflower oil pressed from sunflower seeds grown in Iringa, Dodoma, Singida and Morogoro regions of Tanzania were used as reference standards that were used to develop spectra library for qualitative identification and validation of the method. Sunflower oils bought from various outlets were used to assess the quality consistency by checking if they were adulterated or not. Samples consisting of virgin sunflower oil deliberately adulterated with Korie oil (Palm olein oil) were used to assess the quality of the model together with other oil samples like corn oil, soybean oil and cotton seed oil.
2.3. Samples Preparations
Samples were taken from their original containers and put into clean plastic containers of volume 60 ml, which were labeled and properly closed ready for scanning. Those needing adulteration were adulterated in respective percentages of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%. The samples were labeled calibration set and validation set separately. All measurements were carried in a closed air-conditioned Pharmaceutical analysis Laboratory at a temperature of 23˚C + 2˚C.
2.4. Spectral Acquisition and Data Pretreatment
One hundred (100) samples of virgin sunflower oil from different regions of Tanzania were collected and scanned to generate a spectral library. The procedure involved acquiring spectra files through scanning various samples of virgin sunflower oil. Virgin Sunflower oil samples in sample cuvettes were scanned in the data acquisition mode with Indico Pro software in order to build a library. A baseline spectrum (white) reference was also taken before taking sample spectra. Scanning was done at both visible and near infrared region at a wavelength of 350 - 2500 nm. During the experiment the sample cell components were washed with n-hexane before next scanning, followed by warm water used for cleaning. The cell was then dried by using soft tissue paper.
2.5. Library (Calibration File) Development
A total of 100 samples were scanned each three times making 300 spectra obtained as an average of 32 scans all in an .asd format. Then .asd files were converted into .spc format as reciprocal of wavelength (log 1/R) and zero derivatives. Spectral files were exported to chemometric software, which was Grams AI software. In the Grams software under Grams IQ, a calibration file was created with validation type being cross validation, zero derivatives, Savitzky-Golay  gaps and Multiplicative Scatter Correction  used for creating file. Within calibration file development, the important classifying parameter that was incorporated is Mahalanobis distance (M-Distance) where Mahalanobis distance of 1 is for accept and Mahalanobis distance of 3 is for reject.
The NIR spectra for calibration file were subjected to various data pretreatment conditions. The aim of data pretreatment in NIR is to reduce noise and interferences that are not related to physical property of interest that can lead to unwanted variability. Mean centering was applied to all calibration spectra. Other data pretreatment methods applied were de-trending (baseline correction), Standard Normal Variate  (SNV), Multiplicative Scatter Correction (MSC) and derivatives (zero, first and second derivatives) and reciprocal of reflectance (log 1/R). These pretreatments were applied in various models and were used to choose the best calibration model due to their results.
2.6. Positive Controls
Forty (40) samples of virgin sunflower oil positive controls were taken from same areas that provided samples for creation of calibration file. The samples were purchased from pure sunflower oil that was pressed and filtered under supervision to ensure that obtained oil is of good standard. For each location, 10 samples were analyzed and each sample was scanned in triplicate making a total of 30 scans for each location.
2.7. Negative Controls
Negative control samples included samples which were not virgin sunflower oil like soybean oil, cotton seed oil, corn oil and palm kernel oil (Korie®). They also included sunflower oil refined from industries of Tanzania. This aimed at detecting the ability of the model to identify refined sunflower oil (non-virgin). Adulterated samples were made by deliberately mixing virgin sunflower oil with Korie® oil. This was because Korie® which is a cheaper cooking oil has been claimed to be adulterated in sunflower oil. Sunflower oil was mixed with Korie® in various proportions ranging from 10% adulteration (sunflower 90%) to 90% Korie® samples.
2.8. Library Validation
This was evaluated by testing the samples of the spectral reference library against the positive control samples and the negative control samples by using percentage correct classification. Correct classification refers to the percentage of spectra of a similar material (positive control sample) in a validation set matched with spectra in a calibration set, or percentage of dissimilar material (negative control sample) in a validation set non-matched with spectra in a calibration set. Models were validated by using 30 samples (the one that were not used to create reference library).
3. Results and Discussion
3.1. Development of the NIR Qualitative Model
The following is a spectral display of absorbance of 300 sunflower oil samples that were used to develop a qualitative model in a near infrared region of 350 nm to 2500 nm. Samples showed similar pattern of absorbance though there were differences in the level of absorbance between samples. These might have arisen due to the fact that samples were collected from different sources with different geographic conditions which may have had an influence on the quality of sunflower seeds and thus on sunflower oil.
From Figures 1-3, it can be seen that the main absorptions are observed in
Figure 1. An overlaid spectral virgin sunflower oil (calibrator) with different oils (corn, cotton, and soy oil) showing absorbance (log 1/R) against wavelength in nanometers after pretreatment.
Figure 2. An overlaid spectral virgin sunflower oil (calibrator) with different oils (sunola and sundrop oil) showing absorbance (log 1/R) against wavelength in nanometers after pretreatment.
Figure 3. An overlaid spectral virgin sunflower oil (calibrator) with different oils (sunola, sundrop, corn, cotton, and soy oil) showing absorbance (log 1/R) against wavelength in nanometers after pretreatment.
the following regions 913, 1005, 1183, 1387, 1699, 1850 and 2297 nm. Absorbance at 913 is due to 3rd overtones due to CH and CH2 present in fatty acids and also due to 2nd overtones of OH groups. Absorptions at 1183 are due to 2nd overtones due to CH and CH2 stretching while 1387 absorptions are due to first overtone combinations of CH band, whilst absorption in the region around 1700 are due to first overtone. It is also seen that region around 1850 - 1900 are due to C=O second overtones and the region between 2400 is due to CH-CH and C-C combination (NIR absorption bands, 2013).
Figure 1 and Figure 2 show comparison between spectra outputs of different vegetable oils. The outputs are made up of broad overlapping spectra showing no significant differences to the naked eyes that, without chemometrics, conceal specific details including broad categorisation of major groups. Difference in absorbances between different types of oils can be attributed to difference in refractive indices caused by difference in free fatty acids composition, which in turn affect amount of light penetrating oil sample and interact with molecules.
The calibration with only Mean Center Correction  , a pretreatment method known for enhancing subtle differences between spectra, gave the best classification of all models tested. It discriminated accurately for 100% all samples tested either genuine virgin sunflower oil samples or other oil samples or those virgin sunflower oil adulterated with other oils. Other models which contained higher number of factors or different pretreatments were poor model in classification as they became over fitting and over discriminating to the extent of not recognizing even virgin sunflower positive control samples.
After assessing the performance and characteristics of different models, the best model was chosen to be the one prepared by Mean Centering Correction  only, with no derivative.
3.2. Model Validation
From Table 1, all positive control samples were recognized by the model at 100% prediction rate, thus signifying that model can recognize samples that are truly virgin sunflower oil. Also, calibration files created from each location that contributed in the creation of main calibration file were tested against samples from other sources and all passed well with 100% correct classification.
Table 1. Percentage of correct classification of vegetable oils samples based on principal component analysis of NIR spectra (log 1/R) from 350 to 2500 nm using different models.
3.3. Negative Control Results
Negative control analysis involved samples which were either purposely adulterated sunflower oil samples or samples of other vegetable oils but not sunflower oil like Palm oil, corn oil, soybean oil, and cotton oil. Also included in the negative control samples were sunflower oil refined from industries. From Table 2, by using our main model, MCC, it can be seen that all of them were not recognized by MCC calibration file by providing Mahalanobis Distance values greater than 3, thus not to be recognized.
It is clear that Discriminant Principal Component Analysis coupled with Mahalanobis distance is a very powerful and useful method in classifying virgin sunflower oil samples from those which are not sunflower oil. The method also managed to distinguish between virgin sunflower oil from those of industrial refined sunflower oils like Sunola® and Sundrop® brands as evidenced in Table 1 and Table 2.
3.4. Market Sample Analysis
Analysis involved 22 market samples collected from Dodoma, Dar es Salaam and Morogoro regions by using the best model, MCC calibration file. When these samples were matched against calibration samples, 4 samples out of 22 (about 18.2%) were not matched by the model. Samples that failed analysis came from Dodoma and Singida regions.
It can be observed from Table 3 that, those market samples exceeding Mahalanobis distance value of 3 standard deviations were not recognized by our model. Locations that provided samples with failing analysis were Singida (2 samples), Kibaigwa (1 sample) and Pandambili (1 sample). As all these samples were procured from vendors selling sunflower oil by the road side, it is possible that they adulterated these oils with other materials for the sake of financial gain. During sample collection, we managed to interview the retailers about existence of adulteration practices and what are possible adulterants. Interviewees admitted that that there was existence of adulteration and the main adulterant was Palm Olein oil (Korie® oil) which is cheaper than sunflower oil. Another adulterant mentioned was Pumpkin seeds which are dried and then mixed with sunflower seeds
Table 2. Percentage of correct classification of vegetable oils samples based on principal component analysis of NIR spectra (log 1/R) from 350 to 2500 nm using different pre-treatments.
Where MCC = Mean Center Correction, MSC = Mean Scattering Correction, SNV = Standard Normal Variate, VS = Variance Scaling. *Other vegetable oils are Cotton, Corn and Soybean oil, Sundrop® and Sunola® oils. Sundrop and Sunola are refined sunflower oils from some of the oil industries in Tanzania.
Table 3. Mahalanobis distance values of some selected samples and their sources in market sample analysis.
during expression of oil. Apart from intentional adulteration by these businessmen, another reason that might have contributed to some of these market samples to fail might be prolonged exposure to the sun. When vegetable oils are exposed to the sun for at least five days, there is possibility of formation of beta-sitosterol oxides  which may cause change in physico-chemical properties of vegetable oil and this may lead to their non-identification   .
It can be observed that application of Principal Component Analysis together with Mahalanobis distance method in Near Infrared analysis results in a powerful analytical tool that can be used for discriminant analysis of Virgin Sunflower oils. The models created have been applied successfully to identify Virgin Sunflower oils from other samples which are not sunflower oil or adulterated oils. It was also observed that models created without derivative correction could also correctly classify virgin sunflower oils from those which are not. When the best model was used to analyse market samples, it rejected four samples and recognized 18 samples. This means that there are some market samples which are claimed to be virgin sunflower oil while in reality they are not. Interpretation of the classification models evidenced how the discriminant information was linked to the absorption and reflection bands of the most characteristic moieties present in virgin sunflower oils. Validation results, negative control samples and market samples analysis results have shown that NIR spectroscopy combined with chemometrics is a powerful tool for qualitative analysis of virgin sunflower oil.
Further research should be conducted to evaluate potential of the calibration models in the analysis of sunflower oil samples. Further studies can be performed on quantitative analysis of fatty acid profile or other parameters that can be used to quantify them in the sunflower oil samples.
Conflicts of Interest Statement
The authors declare no conflict of interest.
The authors wish to acknowledge the members of staff of MUHAS Pharm R&D Laboratory, Mr. Maro Mhando, Mr. Prosper Tibalinda, Edson Lutta, Ruth Ng’wananogu and Bertha Francis for their support and their contribution to make this work a success.