In the field of traffic and transport, road traffic safety, efficiency and environment are major issues. In order to deal with these issues, a creative approach could be to implement suitable advanced driver assistance systems (ADAS). The aim is to support the driver to maintain a safe, efficient and comfortable driving state. Since the control actions that the driver needs to perform are assisted by sensor and computer, the driving behaviors will influence the perception and judgment of the systems. From this perspective, the driving behaviors become very important in the development of such advanced driver assistance systems. So, for the design and the evaluation of advanced driver assistance systems, driver behavior classifications are necessary. Based on different drivers groups, different system control algorithms can be applied to coincide with them, so that the function of such systems can take a much better effect.
Classifying human drivers is a very complex task because of the various nuances and peculiarities of human behaviors  . According to current correlative research from home and abroad, the data of driver behavior classifications are divided into two main types: ① the objective experimental data, including vehicles maneuver data (the accelerator, brake pedal, steering wheel, etc.) and motion data (Speed, acceleration, distance headway, headway, etc.); ② the subjective questionnaire data (such as driver behavior questionnaire (DBQ)).
In early research, driver behavior classifications were almost based on the objective experimental data from actual driving experiments or driving simulators. SangJo Choi  used hidden Markov models (HMMs) to model the driving characteristic data gathered from the CAN-bus information of a vehicle. The emphasis of this paper is more towards identifying some of the actions taken by a driver, such as turning or braking. Ma  used a fuzzy clustering algorithm to analyze human driving behavior with respect to car following and lane change maneuvering based on longitudinal and lateral acceleration, applied brake pressure, engine speed and some GPS data. Van  explored the possibility of using the vehicle’s inertial sensors from the CAN bus to build a classify driving. This study proposed that braking and turning events have been better at characterizing an individual compared with acceleration events. Zhang  developed a model capable of classifying drivers from their driving behaviors sensed by the diagnostic outlet (OBD) of the car and smartphone sensors. Aljaafreh  proposed a driving performance inference system based on the signature of acceleration in the two dimensions and speed. Driving style could been categorized to: below normal, normal, aggressive, and very aggressive. Vaitkus  presented pattern recognition approach to classify driving style into aggressive or normal automatically without expert evaluation and knowledge using accelerometer data when driving the same route in different driving styles by used 3-axis accelerometer signal statistical features.
Driver self-reported investigation as an effective approach to study the driver characteristics has been widely applied on obtaining driver subjective questionnaire data. Especially in the field of psychology and driving traffic accident analysis, driving behavior could be predicted by their preference and subjective assessment of driving style.
The DBQ developed by Reason et al.  has been commonly used as a standardized questionnaire. Many scholars had conducted research on the national driver groups based on DBQ. Winter  investigated the relation of errors and violations from the DBQ to accident involvement. The meta-analysis showed that errors and violations correlated negatively with age and positively with exposure, and that males reported fewer errors and more violations than females. Warner  studied drivers’ tendency to commit different aberrant driving behaviors (violations, errors and lapses) in Finland, Sweden, Greece and Turkey. This study showed that different countries have different problems with regard to aberrant driving behaviors which need to be taken into account when promoting traffic safety interventions. Özkan  studies have shown that males and young drivers reported more violations than females and older drivers, whereas female drivers reported more errors and lapses through to examine the changes on self-reported driving pattern after three years of the first responses. Rimmö  investigated the fit of the Swedish DBQ (DBQ-SWE) across different driver subgroups: new drivers, inexperienced drivers, young drivers and experienced drivers.
Previous DBQ-based researches focused primarily on macro-statistical analysis, such as the comparison of driving behavior cultures  , regions  , age and gender  , or searching the psychological and social factors which influenced the driving behavior  . However, these studies were usually based on the theory of planned behavior. The driver’s reaction and attitude were obtained in different driving situations by DBQ so that the driver behavior was predicted. So far, the researches have focused on driver behavior character in different influencing factors (such as age, gender, driving experience, culture). There were few researches relating to driver behavior pattern quantification and driver classification analysis.
A self-reported survey about nonprofessional drivers’ driving behavior was designed based on the DBQ. 225 samples were obtained through the Internet finally. Four latent factors were derived by confirmatory factor analysis. The number of driver classification distribution was discussed by FCM algorithm. Then the number of driver classification was determined by statistical indices. The classification results and the survey finding on whether the driver within five years occurred in traffic accidents were compared to verify the reasonableness of driver classification. Finally, correlation between the demographic and types of driving behavior has been analyzed.
2. Method and Data Collection
A self-reported survey was designed based on the DBQ and some modifications were made to adjust the items for Chinese traffic and driver conditions (e.g. the competitive driving in China is more often than that in other countries, so jumping the queue is taken into account in this self-reported survey).
The questionnaire used in this research included two sections.
(1) The driver behavior self-reported survey consisted of 17 items (see Table 2). These questions concerning traffic matters were selected to cover violations of the traffic code, vehicle interaction and the pedestrian-vehicle interaction. Drivers were asked to indicate how often they carried out each of the activities. The five point Likert scale applied ranged from “very often (mark 1)” to “never (mark 5)”.
(2) Information was also gathered regarding respondents, including gender, age, driving experience, driving time per week, whether the driver occurred in traffic accidents within five years, and so on.
A questionnaire survey was carried out in May 2014 on the internet. 225 valid questionnaires were received. All respondents were required to have a valid driver’s license in Beijing and be the nonprofessional driver. Table 1 shows the socio-demographic information about the respondents.
2.3. Data Statistical Analysis
Driver self-reported investigation was a common ways to obtain driving psychological, but its disadvantage was that the respondent may not fully understand items in a short time, and this may cause a deviation of statistics results.
In the paper, reliability test was analyzed by SPSS 15.0 software. Generally, alpha reliability obtained for scales should equal or exceed 0.70  . The value
Table 1. Demographic distribution of individuals in the sample.
of Cronbach’s about the self-reported survey was 0.937 indicating that the data had a high reliability.
3. Principle Component Analysis
The research used Bartlett’s Test of Sphericity (BTS) which test the hypothesis “correlation matrix = unit matrix”. The rejection of the hypothesis shows that correlation between the variables is different from 1.0 and the factor analysis is appropriate for the variables  . Both the Bartlett test of sphericity (,) and the Kaiser?Meyer?Olkin (KMO) measure of sampling adequacy (KMO = 0.905) indicated that there were sufficient inter-item correlations within the data for performing factor analysis.
Table 2 showed that the reliability coefficients were acceptable for most of the factors. Alpha values being internal consistency with every dimension are 0.653, 0.747, 0.824, and 0.874, respectively. The first factor (α = 0.653) had a slightly
Table 2. Means, SD, factor loading and reliability analysis for driving behavior.
lower a-value than 0.7. Others factors were greater than 0.7, and they were satisfactory.
Principle component analysis with varimax rotation was used to extract factors. Eigenvalue was used to determine the number of factors extracted. The initial factor analysis result revealed a four-factor solution, which accounted for 66% of variance.
As shown in Table 2, the first factor was entitled speed advantage, such as attempting to overtake when he kept the same speed with other vehicles, attempting to overtake in the absence of proper conditions and frequently change lane in order to secure the speed advantage. The second factor, space occupation means that the driver himself occupies a favorable position, and prevents the other vehicle passing. It related to driving long time on the passing lane, no yield when other vehicles trying to overtake, etc. Factor 3, contend the rights of way indicates that the driver competes for the right of way with other traffic respondents while driving. It included no yield, honking and attempting to cross the pedestrian at the crossing, driving in the emergency/bus/bicycle lane. The last factor, contend the space advantage means that the driver grabs the favorable traffic position in traffic jams. It covered item related to jumping the queue when traffic jams, driving without enough safety margins, shortening the headway in order to prevent other vehicle to jump the queue, and so on. Space occupation and contend the space advantage have a different connotation. The former is to maintain its own favorable position; the latter is to grab the favorable traffic position.
4. Driver Classification Based on the FCM
Advanced driver assistance systems need to adapt different strategies to different groups of drivers. This paper quantified driver characteristics by confirmatory factor analysis. The FCM algorithm which uses an iterative algorithm to determine the membership degree from each object to its cluster centroid over all clusters by the membership function was selected to classify the drivers.
In this section, the FCM algorithm was briefly described. Consider a set of unlabeled patterns, where n is the number of patterns and f is the dimension of pattern vectors (features). The FCM algorithm focuses on minimizing the value of an objective function. The objective function measures the quality of the partitioning that divides a dataset into C clusters. The objective function is an optimization function that calculates the weighted within-group sum of squared errors (WGSS) as follows:
where: n is the number of patterns in X, c is the number of clusters, U is the membership function matrix; the elements of U is, is the value of the membership function of the i pattern belonging to the j cluster, is the distance from xi to cj, cj denotes the cluster center of the j cluster, m is the exponent on to control fuzziness or amount of clusters overlap.
The FCM algorithm subject to the following constraints on U:
Function (1) describes a constrained optimization problem, which can be converted to an unconstrained optimization problem by using the Lagrange multiplier technique.
The FCM algorithm starts with a set of initial cluster centers (or arbitrary membership values). Then, it iterates the two updating functions (4) and (5) until the cluster centers are stable or the objective function in (1) converges to a local minimum. The complete algorithm consists of the following steps:
Step 1: Given a fixed number C, initialize the cluster center matrix U0 by using a random generator from the original dataset. Record the cluster centers, set k = 0, m = 2, and decide e , where e is a small positive constant.
Step 2: Initialize the membership matrix U0 by functions (5).
Step 3: Compute the new cluster center matrix (candidate) cj by (4).
Step 4: Compute the new membership matrix Uk by (5).
Step 5: if the stop, otherwise go to step 3.
The value of classification centers was obtained basing on 2, 3, 4, 5, 6 level driver classification (see Table 3).
4.2. Determination of Classifications Number
Driver classification in cluster analysis could be categorized according to different standards and from different angles. The problem of the classifications number was very difficult, but necessary to be solved. The most appropriate classifications number was decided by using the reason-based likelihood information.
The larger the extra-cluster sum of squares of deviations, and the smaller the intra-cluster sum of squares of deviations, the better the classifying effect is. is the ratio of the intra-cluster sum of squares of deviations to the overall sum of squares of deviations.
Table 3. Values of significant influence factors classification centers
where: is intra-cluster sum of squares of deviations for k clusters, is sum of squares of deviations for all the samples, is sum of squares of deviations for Kth cluster, is the level of likelihood to driving behavior of the factor n for situation i for respondent j, k is the number of clusters, m is the number of factors, is all represents each cluster (),is the centroid point of cluster i for factor n.
SPRSQ is defined as the difference between of the k+1 clusters and of the k clusters. The greater the value of SPRSQ, the better the clustering effect is.
The larger Pseudo statistics value indicating cluster the more reasonable.
Statistical indices (such as, ,) were calculated, then judge the cluster results with discriminant analysis, see Table 4.
As shown in Table 4, for the index, Large descents were found to occur from classifications 2 to classifications 4, which means by splitting the 2 classifications into 3 and by splitting the 3 classifications into 4, the dissimilarity of driver classification is removed largely. But a slightly descent was found to occur from classifications 4 to classifications 5, which means while by splitting into 5, the dissimilarity is not removed as much. The vary of index was opposite to the vary of. For the index, when the classification number was larger than 4, the index did decrease much. Therefore, it was recommended that the appropriate number of classifications used was 4.
Using the driving behavior of the four factors obtained together with the above FCM cluster analysis, the respondents can be categorized into four groups defined as Type A, B, C and D. The categorization, with the corresponding observation of characteristics of the respondents, is summarized in Table 5.
4.3. Driver Classification Result Validation
Combined with driver accident information, the proportion of the drivers who had a traffic accident within five years in each classification was counted, see Figure 1.
As shown in Figure 1, Type A driver (driving style conservative) who had a traffic accident had the lowest percentage, as 5%. The percentages of Type B and Type C were 6% and 12% respectively. However, for Type D driver (driving style aggressive) the proportion was as high as 26%. It showed that there was a corre- lation between driving style and traffic accident rates. So driver behavior classi-
Table 4. Statistical test for FCM Clustering ().
Table 5. Driver characteristic of each classification.
Figure 1. Accident driver rate of each classification.
fications in this paper were consistent with actual driving conditions.
5. The Influence of Demographic Information
The membership grade of each sample was calculated by the FCM clustering, and the classification of driver classification sample was determined. Correlation between the demographic factors (such as age, gender, driving experience, whether driving time per week) and types of driving behavior has been analyzed.
The ratio of driver types in the same gender as shown in Figure 2. For the female driver, Type A and Type B had the more proportion, as 40.4% and 32.6% respectively, but proportion of Type D was only 5.6%. For the male driver, Type D and Type A were shown with higher proportion (30.1%) and lower proportion (21.3%). It is observed that female were more likely than male to careful driving.
The relationship between age and driver type was shown in Figure 3. The proportion of Type D was 37.8% between 18 - 29 years old, but the proportion of that was only 7% beyond 50 years old. Type A driver of beyond 50 years had the highest proportion, as 50%. We can see that with the increasing of age, the driver has lots of experiences, and driving behavior was careful and moderate.
5.3. Driving Experience
The relationship between driving experience and driver type was shown in Figure 4. For drivers with less one years of experience, Type A driver had the highest proportion, which was close to 50%. But the proportion of Type A was only 4.3% for drivers with more ten years of experience. It has shown that with the increase of driving experience, the proportions of Type B, Type C and Type D driver were increased gradually and the proportions of Type A was decreased gradually. To explain such phenomenon, road competitive driving environment
Figure 2. Relationship between gender and driver type.
Figure 3. Relationship between age and driver type.
Figure 4. Relationship between driving experience and driver type.
might be a key factor. Novice drivers had a small influence by competitive driving environment, so driving behavior was modest. With the increase of driving experience, the influence of competitive driving environment was more and more, increasing their competitive driving.
In part 5.2, the driver has lots of experiences, and driving behavior was careful and moderate. In part 5.3, drivers with more driving experience have more probability of competitive driving. It is not paradoxically. Because, there is no strong correlation between driving experience and age. Such as a 50-year-old man might have only 2 years driving experience, and a 30-year-old man might have 10 years driving experience.
5.4. Driving Time Per Week
The relationship between driving time per week and driver type, as shown in Figure 5. For the driver who has less 5 hours of driving time a week, Type A driver had the highest proportion (36%), and Type D driver had the smallest proportion (15.2%). For the driver who has 5 - 10 hours of driving time a week, Type B and Type C driver had the high proportion, as 33.9% and 30.4% respectively, the proportion of Type D driver was the smallest. For the driver who has 10 - 15 hours of driving time a week, Type B had the high proportion, as 50%. For the driver who has 15 - 20 hours of driving time a week, Type A had the high proportion, as 33.3%. It has shown that differences in driving time per week have no significant effect on driver ratio of each type.
(1) Driver behavior self-reported investigation was conducted with standardized DBQ by 225 nonprofessional drivers. Questionnaire’s reliability was verified based on statistics analysis. CFA was used to analyze the underlying factor structure. Speed advantage, space occupation, the contend right of way and the contend space advantage were extracted from the questionnaire results to quantify driver characteristics.
(2) Based on FCM algorithm, the number of driver cluster distribution was discussed with the four factors as pattern features. Finally using the, and indices, the respondents were categorized into four groups.
Figure 5. Relationship between driving time per week and driver type.
(3) Consistency of clustering results and actual driving conditions was verified basing on the item “whether the driver occurred in traffic accidents within five years”. The result can provide the basis for designing intelligent driver assistance systems.
This work was supported by National Basic Research Program of China (2012CB723303) and Beijing Municipal Education Commission key Project (KZ20151005007).