Poverty is not only a common challenge to humankind but also an important issue for China’s economic and social development. Sen (1976) believes that poverty leads to the lack and deprivation of capacity, which can reduce the opportunities for the poor population to gain income growth and get out of poverty . In addition, World Bank (2001) points out that poverty in China causes the absence of education, health, and nutritional aspects prevail, which hinders the development of China’s society . Since the launch of the ‘‘Reform and Opening Up’’ in 1978, China has made significant progress in poverty alleviation. However, the marginal benefit of poverty alleviation decreases year by year. By 2018, there are still 16.6 million poor people in rural China, and there are many difficulties in poverty alleviation. In the context of large data, the appropriate poverty recognition method can provide methodological support for poverty alleviation in poor areas.
Poverty has traditionally been seen as lacking income (or consumption). However, over the past 40 years, this multidimensional concept of poverty has been queried, and Scholars tend to analyze the poverty from a multi-dimensional perspective, according to the seminal works of Sen (1976) . From this perspective, income is not the only indicator of poverty. The reasons are mainly reflected in two aspects. On the one hand, existing studies have shown that there are often high tolerance and exclusion errors between people with low incomes and those deprived of other aspects of human well-being (Baulch and Maasset 2003; Ruggeri Laderchi et al. 2003)  . On the other hand, the disadvantage of the monetary-metric income approach is that not all non-monetary characteristics can be directly measured. The reason is that the markets do not work well in many developing countries (Bourguignon and Chakravarty 2003) . Therefore, although income is a significant dimension to evaluate human development, other dimensions should be considered to measure human deprivation such as education, living standards, health, and assets. Multidimensional poverty measurement and recognition have been the main research direction since its conceptual foundation was put forward by Sen. A variety of multidimensional poverty measurement methods were proposed. For example, Hagenaars et al. (1987) constructed the multi-dimensional poverty index system from the two dimensions of leisure and income . Lugo et al. (2009) built a multidimensional poverty measurement model based on information theory . Tsui (2002) and Bourguignon (2003) used axiomatic methods to measure the multi-dimensional poverty index  . Alkire (2011) put forward the AF method based on axiomatization . The core of the AF method is the “double critical value” method. First, the critical value of each index is used to judge whether the object is deficient in this dimension; Then, the critical value of all dimensions is established to judge whether the individual belongs to multidimensional poverty. However, this method belongs to the unitary statistical method, which is essentially an extension of the analysis method of one-dimensional poverty. It can only examine the contribution of a single dimension to the multi-dimensional poverty, and cannot recognize the poor households from a multidimensional perspective.
To solve the above problems, this paper constructs a recognition method of multidimensional poverty by using the Mahalanobis-Taguchi system (MTS). MTS can separate normal samples from abnormal samples by using Mahalanobis distance (MD), which can take into account the correlation between features, and select features with larger information gain through signal-to-noise ratio (SNR) and orthogonal arrays (OAs). As a data-driven pattern recognition method, MTS has been widely used in manufacturing cost accounting (Abu, 2018)  , automobile motor-head machining process (Reyes-Carlos, 2017)  , rolling bearing fault diagnosis and health assessment (Chen Junxun, 2016)  and management decision-making (Chang Zhipeng, 2016) . So, this paper will propose a recognition method of multidimensional poverty based on MTS.
2. Mahalanobis-Taguchi System
MTS is a pattern recognition method developed by Dr. Taguchi, which is applied to classify data and select useful features. MTS is composed of MD and Taguchi’s Robust Engineering. MD is a covariance distance used to measure the similarities between unknown and known sample sets, which is used to construct a measurement scale to recognize samples in multidimensional systems. Taguchi method is a statistical method to improve engineering quality (Taguchi, 2007)  and enhance system robustness (Taguchi, 2001) . This study uses MTS to select useful features of multidimensional poverty and recognize poor households. There are four steps in MTS, as shown in Figure 1.
Step 1: Construct a “Full Model Measurement Scale” with MS as the Reference
In this stage, we collect sample data from the non-poor families to construct the normal sample dataset. Moreover, their MDs are used to construct the Mahalanobis Space (MS), which are around one.MS can be considered a database for the normal dataset, combining its mean vector, standard deviation vector, and covariance matrix. Ordinarily, the samples in the normal group should be similar and have common characteristics. We use the mean point and the average MD of the normal group for serving as the reference point and the base of the measurement scale.
Figure 1. Four steps in Mahalanobis–Taguchi system.
We assume t items that should be measured to recognize multidimensional poverty. We first collect n normal samples with p features to construct an MS as a reference, where xit is the original value of tth feature of the ith normal sample. Standardization of each feature using the mean and standard deviation st is essential because features have different measurement scales, the equitation as follows:
After standardization, The mean and standard deviation of each feature are 1 and 0. the correlation matrix C is computed by the equation as below:
The MD of ith sample is calculated as follows:
where C−1 is the inverse of the correlation matrix.
Step 2: Validate the Measurement Scale
To validate the scale, poor households as different known “abnormal” samples must be checked. Abnormal samples are selected out first. Their feature datasets are normalized using the mean and standard deviation of the normal data set. Then their MDs are computed using the normalized feature data and the covariance coefficient matrix of the normal samples. If the MS is appropriately constructed, MDs related to the abnormal samples will be out of the MS. Otherwise, the MS is necessary to be reconstructed.
Step 3: Identify the Useful Features
In this step, important features can be selected out by orthogonal arrays (OAs) and signal-to-noise ratios (SNR). We use OAs to recognize the critical features by minimizing the different combinations of the original set of features. The number of columns in OA is be decided by the number of features. Two-level factors are used: Level-1 expresses including the feature, while level-2 expresses excluding the feature. Then, a proper orthogonal array is selected, and the features are distributed into different columns of the orthogonal array. Inside the orthogonal array, every row (run) means a different level composition of features. In MTS, we use abnormal samples to measure the accuracy of the MS for predicting by SNR. corresponding to each run of the OAs is calculated using the larger-the-better SNR and is defined using the following formula:
where k is the number of abnormal samples.
The useful features are obtained by evaluating the ‘‘Gain’’ in SNR. The Gain of each feature is calculated using (6). Features with positive ‘‘Gain’’ are identified as useful ones.
where, is used to means the average SNR of all runs including the feature, and means the average SNR of all runs excluding the feature. If the ‘‘Gain’’ corresponding to a feature is positive, the feature may be essential and may be considered as worth keeping. However, a feature with negative gain should be removed.
Step 4: Future Diagnosis with Useful Features
In the final stage, the MS is reconstructed using the useful features and validated. If MDs are within the MS, the households belong to the non-poor (normal). If MDs are out of the MS, the households represent the poor (abnormal). More deviation between the poor families and the non-poor families if the higher the MDs are. To recognize the MDs of poor and non-poor families, we calculate the threshold using the following equation proposed by Chao-Ton Su (2007)  :
is the average of the MDs of the normal sample,
SMD is the standard deviation of the MDs of the normal sample,
ω is the percentage of the normal sample whose MDs is smaller than the minimum MD of the abnormal sample,
λ is a small parameter, usually set subjectively.
3. Case Study
3.1. Case Description
Screening the dimensions, indexes, and cutoffs is usually tricky. And it inevitably needs to judge value. Therefore, this study intends to adopt a set of possible dimensions and indexes in view of existing research and the availability of data. In particular, six dimensions and twenty-three indexes were conducted and their related deprivation cutoffs as shown in Table 1.
Income dimension should be considered first because we have recognized that it is a vital method to acquire valuable ends (Stiglitz et al., 2009) . We also consider health and education dimensions, which are extensively acknowledged as being essential valuable ends (Sen, 1985) .
Table 1. Multidimensional poverty index system of rural families.
Note: “≤” or “≥” means that when the threshold value is less or higher, the rural households are in poverty.
Good health and education are pivotal aspects of human capability as well, comprehended in Sen’s theory as freedom of guiding a person to have a different type of life. It is noteworthy that improving sanitation facilities has important positive effects on reducing the contagion of various diseases like hepatitis, cholera, and diarrhea. The three education indexes are access to improved educational attainment, reduced expenditure for education, and accessibility of public education. Besides, there are four indexes to measure deprivation of health, such as health conditions, Sanitation facilities, expenditure for health, and accessibility of healthcare. The reason for including these dimensions is that with rural, income does not assure to get education and health services. Living dimension is acknowledged as a standard to measure access to basic services, which is included four indexes, such as clean water, improved cooking fuel, electricity, and Engel’s coefficient. Among them, unsafe water can cause many diseases in rural, and the availability of safe water is to a fundamental human right. Good electricity can help people improve accessibility to information by using a wide range of facilities like television, refrigerators, telephones, and computers. In China, using solid fuel caused by indoor air pollution is the primary reason for more than 40,000 premature deaths annually. Finally, we consider that asset and housing dimensions are also essential to enhance the quality of life. These reflect the rate of accumulation assets of rural families and provide a buffer territory for people to relieve the negative effects of social and economic risks. There are four assets indexes to evaluate household capital accumulation, such as means of production, assisted living assets, cultivated land quantity, types of current housing. In addition, we use seven indexes to reflect housing conditions, such as Per capita housing, congestion, healthy conditions, lighting conditions, ventilate conditions, air condition, and noisy conditions.
The dataset used in this paper is from the China Labor-force Dynamic survey (CLDS) in 2014, which surveyed the working population aged 15 - 64. It is an interdisciplinary large-scale survey, including issues of labor education, employment, household property and income, household consumption, production and land of the rural household.CLDS2014uses multi-stage and multi-level probability sampling method, which can better reflect the real situation of Chinese society. This paper selected rural household data from central provinces of China in CLDS2014 for analysis, including Anhui, Henan, Hunan, Hubei, and Jiangxi. The central provinces have the characteristics of large agricultural population, wide distribution of poor people, and complex and diverse causes of poverty. Therefore, this sample data is selected for recognition in the hope of providing a reference for poverty alleviation.
The households were taken as the basic unit to analyze. Besides, we selected family-level data as the primary data source of this study. Nevertheless, we turned to the individual-level data for more detailed information when the description about the conditions of members was ambiguous in the family-level data. A method was constructed to screen the original data strictly after considering research purposes and data quality. Firstly, we segregated rural households from urban households in term of where they lived in, village committees (Cunweihui) or neighborhood committees (Juweihui). Secondly, if families rejected answering significant problems corresponding to the study, we would delete it. Thirdly, we preserved families with total income/expenditure are equal to the sum of income/expenditure from sub-component sources. Finally, we got 425 households.
According to the suggestion of MPI, families with poverty indexes greater than 5 are defined as abnormal groups. The data were randomly sampled and split into training and test sets. 163 normal and 83 abnormal samples are used as the training set to construct a measurement scale, and 115 normal and 64 abnormal samples are used as the test set to validate the capability of the scale.
Step 1: Construct a “Full Model Measurement Scale” with MS as the Reference
The 163 normal samples in the training set are set as the reference (normal) group. First, the mean vector, standard deviation vector, and covariance matrix of the normal group are computed. Then, we calculate the inverse of the correlation matrix of the normal group. Finally, the MDs of the normal group are calculated by using (5) and defined an MS to take as a reference for measurement scale, as shown in Figure 2.
Step 2: Validate the Measurement Scale
The MDs corresponding to the 83 abnormal samples in the training set is also computed by using (5) to validate the accuracy of the MS. If the MS is constructed in Step 1 is good, the MDs of the normal group will be smaller than that of the abnormal group. By calculating the MDs of poor households and non-poor households in training set, the MDs of non-poor households are smaller overall, with an average value of 1. And the MDs of poor households are higher on the whole, with an average of 4.879, as shown in Figure 2. It represents that the measurement scale is valid.
Step 3: Recognize the Useful Features
Figure 2. The distribution of MDs of the rural households inspection training set (full model).
In this step, 23 indexes are regarded as the features of multidimensional poverty. We take each feature into two levels, that is level-1 means including the feature and level-2 means excluding the feature. We distribute the 23 features to the first 23 columns of an array. For each run of the OA, The features with level-1 are used to construct an MS. And we calculate the MDs related to the 83 abnormal samples on the basis of the MS. The larger-the-better SNR is computed for each run by using (6) with the MDs of abnormal samples. The distribution of features in the OA and the SNR are shown in Table 2. After acquiring the SNR of each run, the effect gain of each index is computed and plotted into a graph, as shown in Figure 3. According to the value of gain, we keep the positive gain and removing the negative gain, as shown in Table 3.
The number of indexes of multidimensional poverty was reduced from 23 to 17, allocated by positive gains. Take the case of gain > 0, the preserved features are X4, X5, X6, X7, X8, X9, X10, X11, X14, X15, X17, X18, X20, X21, X22 and X23, the normal group of 17 indexes is used to reconstruct a reduced model measurement scale. In the same way, we apply 83 abnormal samples to demonstrate the MS. Figure 4 depicts the MD allocations under the reduced model, it denotes that the new MS is good. After confirming the effectiveness of the reduced model measurement scale, a threshold is determined using (8) as follows:
Step 4: Future Diagnosis with Useful Features
For this reduced MS with 17 indexes, using 2.182 to be threshold leads to 90.244% accuracy of classification on the training set, which is shown in Figure 4. In the end, we use the test set to validate the classification performance of the reduced model. The MDs of the test set are calculated using (5), and the distribution of MDs are shown in Figure 5. For this reduced model with 14
Figure 3. The effect gains of the rural households inspection attributes.
Table 2. OAs.
Table 3. Indexes after reduced model.
Figure 4. The distribution of MDs of the rural households inspection training set (reduced model with 17 features).
Figure 5. The distribution of MDs of the rural households inspection test set (reduced model with 17 features).
attributes, using 2.724 to be the threshold resulted in 90.503% classification accuracy on the training set. These results verified the effectiveness of the proposed method for the recognition of multidimensional poverty.
In this paper, 425 families are recognized as two classes, that is, poor and non-poor by using the MTS. The MTS can recognize poor and non-poor households and select the main indexes to measure multidimensional poverty, which mainly focuses on income, health, and housing conditions. Rural families got a relatively low income because they are short of capacity and fell into agricultural production of poor efficiency. For rural families, it was of significance to raise their income or enhance sustainable livelihood capacity. Improving farming efficiency was the most remunerative strategy which required relatively high technical support and financial input. However, because of the shortage of investment in public services and infrastructure, a set of social problems emerged, such as the medical treatment, housing problem, and the education problem of the poor population. The health conditions of the households were a barrier to the prevention of falling into “disease-poverty-disease” and breaking down intergenerational poverty caused by poor sanitation. At the same time, for families with housing problems, they need the transfer income to achieve poverty reduction because of a low level of asset possession. Thus, we need to improve the living and health conditions of poor households to complete poverty alleviation.
This paper uses MTS to construct a model to recognize poverty from the multidimensional perspective. MTS can be used not only to realize the classification of rural households but also to identify the useful features from the system of multidimensional poverty. Besides, when using MTS in real performance, setting a threshold can distinguish between two types of samples and avoid overfitting problems. Finally, this study uses the MTS to recognize poverty with CLDS 2014 dataset. The case study aims to remove eliminate the redundant features and improve the accuracy of recognition. The results denote that the number of features is effectively reduced from 23 to 17 without losing accuracy. Therefore, Poverty recognition based on MTS is easy to apply and popularize in poverty alleviation work.
This paper has three contributions. First, the concepts, principles and computational process of MTS are described in detail. This helps us research this diagnostic method. Second, a multidimensional poverty index system is constructed from six dimensions, which includes income, education, health, living, asset, and housing. Finally, we successfully introduced MTS to recognize multidimensional poverty. With the case study, this paper indicates that MTS in poverty reorganization is robust and practicability and provides methodological support for poverty recognition. In addition, healthy, cultivated land, income, and electricity are the main reason for the difference between the poor and non-poor in the case study. Therefore, we should enhance income and improve farming efficiency. For policy-making, improving rural living and health conditions by strengthening public services and infrastructure should also be concerned in the poverty alleviation.
However, there are some significant issues that need to be discussed in our future study. In the multidimensional poverty index system, the multicollinearity problem is unavoidable. That will lead to some errors in establishing the correlation matrix and influence the construction of MS. This is one of the significant research subjects that will be tackled in the near future.
This work was supported by General Project of National Natural Science Foundation of China [grant number: 71673001]; Key Project of Excellent Young Talents Support Program for Colleges and Universities of Anhui Province [grant number: gxyqZD2017040].