In recent decades, with the rapid development of spectroscopy technology, it has been applied to many fields, especially to life sciences. For example, it is widely used in diagnosing disease of biological tissue and also has been used to determine the species of origin of the blood sample  -  . The identification of blood spectroscopy is an important aspect of forensic investigation, wildlife investigation and import/export detection. The current methods used to identify the blood spectroscopy mainly include HPLC  (High Performance Liquid Chromatography) and LC-MS  (Liquid Chromatography-mass Spectrometry), etc. These methods are featured with complex operation procedures and low recognition efficiency and easy to cause contamination of blood samples.
In 2008, De Wael et al.  tested the Raman spectrum and Near-infrared spectrum of human, canine, and feline. They found it is very difficult to visually distinguish the difference among the blood samples of different species. However, in 2009, based on advanced statistical method, Virkler et al.  developed a three-dimensional score plot that shows separation among Raman spectrum of human, canine, and feline blood samples. In 2017, Lu et al.  performed blood species classification using fluorescence spectroscopy and support vector machine (SVM), and achieved a remarkable result. In 2018, Dong et al.  classified the blood samples of human, dogs and rabbits by method of machine learning, with over 90% accuracy.
Although, in the past decade, great progress has been made in the study of classification of blood spectroscopy, to develop a quick and high accuracy analytical technique to classify the blood spectroscopy which is still an important task now. In recent years, the artificial intelligence technology of deep learning has evolved as one of the powerful tools in many fields  . It’s worth mentioning that deep learning has given conspicuous performance with respect to the automatic feature extraction  , which endows it with broad and efficient application prospects in analysis of spectral data.
In this work, the combination of fluorescence spectroscopy and deep learning method was used to classify doves, chickens, mice and sheep blood samples, which have many similarities in composition causing their spectra to look almost identical by visual comparison alone. A deep belief neural network model (DBN) is developed for feature extraction from the pretreated data, and the cross-validation results show that the application of deep learning method makes it possible to classify the blood species in a more precise way than previous methods.
The paper is organized as follows. In Section 2, the fluorescence spectrums of whole blood and red blood cell solutions with different concentrations from doves, chickens, mice and sheep are presented and pretreated. In Section 3, a Deep Belief Network (i.e., DBN)    model based on a stacking of Restricted Boltzmann Machines (i.e., RBMs) is established to extract the deep learning feature of blood spectrums. Besides, the label-layer of DBN is established to adjust the parameters, so as to obtain a classifier after training, which makes it possible to classify the blood species in a more precise way than traditional methods. In Section 4, the paper ends with some discussions about the methods and future work.
2. Experiment and Data Pretreatment
Fluorescence spectrum is capable of detecting the chemical structures of molecules in the organism and the interaction between different molecules and surroundings, which allows researchers to achieve the cellular composition and metabolism of organisms via the study of fluorescence spectrum  . In this work, the whole blood and red blood cell samples of doves, chickens, mice and sheep are collected. Then, 0.85% NaCl normal saline solution is used to dilute the blood samples into the solution with 1% and 3% of concentration. Finally, Cary Eclipse fluorescence spectrophotometer (See Table 1 for parameter setting) is used to scan the stand-by blood samples at ambient temperature and collect the spectrum data (See Table 2, 50 groups of data are collected from the whole blood and red blood cell solutions (with two different concentrations) of four animals).
In view of that, the fluorescence spectrum data collected includes a mass of noise, the “db3” wavelet mother function, “Heursure” threshold selection rule and “sln” threshold adjustment approach are selected to perform de-noising processing of the original spectral data. The de-noised and normalized spectral curves are shown in Figure 1.
3. Methodology and Classification Results
Methodologically, after raw data of fluorescence spectrum is pretreated by wavelet de-noising and normalization, RBM is used to extract the abstract features in the training dataset, and the tuning classifier with label information is used to accomplish the overall training of DBN. Finally, the trained DBN model is used to perform the recognition and classification in the testing dataset (see Figure 2).
3.2. DBN Model Formulation and Hyper-Parameter Adjustment
As a generative model proposed by Geoffrey Hinton in 2006, a Deep Belief Network (DBN) is developed by a stack of RBMs (Restricted Boltzmann Machines).
RBM is an energy-based neural network model with a two-layer structure (the visible layer and hidden layer). The neurons between visible layer and hidden layer are completely connected and there is no any connection within the layers. DBN is developed by a stack of several RBMs.
The neurons of RBM belongs to Boolean ones whose values are 0 or 1 (0 = Activated; 1 = Non-activated). The energy function of RBM is defined as:
In which, v refers to the state of visible layer, while is the number of neurons in visible layer; h refers to the state of hidden layer, while is the number
Table 1. Parameters of Cary Eclipse fluorescence spectrophotometer.
Table 2. Data description (Note: See the original data inappendix file).
Figure 1. De-noised and normalized spectral curve. (a) 1% concentration whole blood solution; (b) 3% concentration whole blood solution; (c) 1% concentration red blood cell solution; (d) 3% concentration red blood cell solution.
Figure 2. Flow chart of the classification method.
of neurons in hidden layer; a refers to the offset of neurons in visible layer; b refers to the offset of neurons in hidden layer; represents the weight of connection between neurons in visible layer and hidden layer. Suppose that is a parameter to be learned by RBM. After the confirmation of parameter, the joint probability distribution of based on the energy function is:
In which, is the normalization factor. The boundary distribution of joint probability distribution about (also known as the likelihood function) can be expressed as:
During the learning process of RBM, suppose that there are sample data, and the value of parameter can be confirmed by maximizing the log-likelihood function:
The training process of DBN is divided into two steps: 1) Firstly, the unsupervised training of RBM network is conducted layer by layer so as to ensure that more feature information can be retained when the feature vector is mapped to different feature spaces; 2) The tuning classifier with labels is set to slightly adjust whole DBN network based on back-propagation algorithm.
Structure of deep neural network and hyper-parameter adjustment are two most prominent aspects of DBN construction. A DBN stacked by 3 RBMs is designed in the paper, and the number of nodes at each layer is 681-50-50-200 (see Figure 3). According to the results of experimental comparison, the improvement of DBN’s depth (the number of layers stacked by RBM) and width (the number of nodes at each layer of RBM) is found to haven’t produced higher recognition accuracy rate, but significantly reduces the time efficiency of recognition.
Besides, DBN model in Figure 3 is subject to hyper-parameter adjustment. As for the learning rate and momentum factor, multiple groups of cross-validation experiments and adjustments on the 1% whole blood training set are carried out, with the learning rate of 0.1 and the momentum factor of 0.5 (see Table 3).
3.3. Classification Results and Analysis
In this paper, half of the spectral data of the same blood sample (whole blood or red blood cell), at the same concentration that is collected from the same animal, is randomly selected as the training data. While the rest half data are used as the test data. Secondly, the formulated DBN model and the label-layer of DBN (i.e., the classifier) are trained by the training data. Then, the test data of doves, chickens, mice and sheep is recognized by the trained classifier. Finally, the training set and test set are interchanged to perform cross validation and obtain the average classification accuracy (See Table 3).
Figure 3. Structure of the deep belief network model.
Table 3. Hyper-parametric adjustment.
The classification accuracy of 1% whole blood is achieved 97.5%, with a standard deviation of 2.5, while the classification accuracy of 3% whole blood is 96.5%, with a standard deviation of 2.09. The classification accuracy of 1% and 3% red blood cells are 85% and 82.5%, respectively, with the standard deviations of 8.67 and 2.85 accordingly. According to Table 3, the classification accuracy of whole blood solutions is higher than those of red blood cell solutions, which prove that the whole blood solution shall be used to identify the blood species of animals in reality.
All samples of mice and sheep can be recognized accurately under the circumstance of different blood types and concentrations. Among the samples that
Figure 4. Classification results based on the deep belief network. (a) 1% concentration whole blood solution; (b) 3% concentration whole blood solution; (c) 1% concentration red blood cell solution; (d) 3% concentration red blood cell solution.
Table 4. Cross validation results of the deep belief network.
identified the wrong classification, doves were the most likely to misjudge mice, followed by sheep. Whereas the blood spectrum of chickens is quite similar to that of mice and sheep, small part of chicken samples is mistaken for mice and sheep. But even so, most samples of doves, chickens, mice and sheep can be recognized accurately in the case of 1% whole blood (see Figure 4 and Table 4).
In recent years, the artificial intelligence technology of deep learning has evolved as one of the powerful tools for many fields. With the rapid development of spectroscopy technology, there is no doubt that spectral data analysis using advanced machine learning method will become a new direction in spectral analysis. In this work, based on fluorescence spectroscopy combined with artificial intelligence technology of deep learning method, a novel analytical approach for classification of blood species was developed to classify doves, chickens, mice and sheep blood samples. A deep belief neural network model stacked by three layers of RBMs is used for the feature extraction from the fluorescence spectral data. Finally, by the trained DBN classifier, the cross-validation results show that the application of deep learning method makes it possible to classify the blood species in a more precise way than previous methods. Especially, the classification accuracy of whole blood with 1% of concentration is up to 97.5%. Compared with other models of blood spectrum classification, DBN can extract effective feature information preferably so that better classification (i.e. higher classification accuracy) is achieved. Therefore, the approach proposed in this paper has great potential in classification of blood species in forensic investigation, wildlife investigation and import/export detection.
However, in the current work, we just classify four different kinds of animals with the fluorescence spectroscopy combined with DBN model. In the future work, we shall collect spectral data of more kinds of animals with different blood concentrations and try to formulate a more useful machine learning model, so as to provide more quick and effective identification and classification methods for forensic investigation, wildlife investigation and import and export detection, etc.
This work was partially supported by the National Natural Science Foundation of P. R. China (No. 11401092, 11426045), Scientific and Technological Planning Project of Jilin Province of P. R. China (No. 20180101229JC), Foundation of Jilin Educational Committee (No. JJKH20181100KJ).
The original fluorescence spectroscopy of four kinds of animals with different constransations is attached.