Connected vehicle data is opening new frontiers for agencies to evaluate the performance of their road networks. Every month, hundreds of billion passenger vehicle trajectory waypoints, consisting of latitude, longitude, heading, speed, and timestamp, are collected in the United States. The resulting data sets can provide agencies with a plethora of historically difficult to collect data, such as traffic signal performance measures, hard-braking events, interstate congestion, and common detours around road closures    .
However, many agencies are concerned about the representativeness of this data. This paper builds upon an earlier paper that focused on 24 sites in Indiana over three months  and characterizes the connected vehicle penetration levels over during 7 months at 54 sites in Indiana, Ohio, and Pennsylvania (Figure 1). As this paper is one of the first papers on connected vehicle penetration, it presents a preliminary methodology for calculating percent penetration that compares two data sets: Department of Transportation (DOT) collected traffic count data and connected vehicle trajectory data. The main objective of this paper is to report the percent penetration of connected vehicle data observed in the states of Indiana (IN), Ohio (OH), and Pennsylvania (PA).
2. Literature Review
Connected vehicle data is just the latest in the evolution of vehicle data. As early as 1999, GPS based travel time data was used to evaluate agency infrastructure in Louisiana . By the early 2010s, crowdsourced vehicle probe data became available to both drivers and agencies through many providers and smartphone applications   . While data gathered from smartphones was the main component of this crowdsourced data, some providers incorporated GPS-enabled vehicles as well  . In the following years, many studies have been conducted to understand the accuracy of these datasets. These studies include a study conducted on 2500 miles of roadway on and around I-95 evaluating commercially provided travel time and speed data , a two-month study comparing probe data speeds to speeds obtained from loop detectors , studies comparing probe data to Bluetooth sensors with a focus on arterials and surface streets  , and a multi-year study comparing probe data to radar sensors .
Figure 1. Location of DOT count stations used in this study.
These past iterations of vehicle data have been well tested and have been validated for many years. Connected vehicle trajectory data, which contains individual vehicle locations, timestamps, speeds, and heading from onboard sensors, however, is still in the pilot phase for many agencies. Over the past several years, many studies focused on creating methodologies for evaluating road networks at low penetration. One study presented a method, tested against simulations and real-world data, for estimating queue length and traffic volumes without needing to explicitly know the market penetration . A study conducted by Zhang et al. found that a 4% penetration was sufficient to improve ramp metering performance . However, studies by Day et al. found that aggregated data at penetration levels as low as 0.09% - 0.8% would provide acceptable levels of representation for corridor retiming given a large enough aggregation period  .
While connected vehicle data has led to the creation of new techniques to evaluate road networks       , there are few studies looking at connected vehicle penetration rates. In 2016, Li et al. compared loop detector counts to vehicle trajectory counts and found an average percent penetration of 1.1% with a range of 0.2% to 2.0% depending on the time of day . This paper is an updated version of a previous paper analyzing the percent penetration for 3 months in 2020 in Indiana which found interstates to have an average percent penetration of 4.3% and non-interstates to have an average percent penetration rate of 5.0% . Using the same methodology, this paper reports updated percent penetration values for Indiana and extends the geographic analysis area to include Ohio and Pennsylvania.
3.1. State Departments of Transportation
For this study, the 54 continuous count stations were selected to be geographically distributed, represent both interstate (Int) and non-interstate (Non-Int) roadways, have a variety of traffic volumes, and be in both rural and urban environments (Figure 1). Table 1 provides information on the number of locations by road type.
The traffic counts for the 54 count stations were obtained from their respective state DOTs and are, for the purposes of this study, considered the ground
Table 1. Count station attributes.
truth vehicle counts. Many different technologies are utilized at continuous count stations, such as inductive loops, piezoelectric sensors, and magnetic sensors . An example count station, IN-12, located on I-465 in Indianapolis, IN utilizes inductive loops, as shown in Figure 2, and the location of inductive loop sensors is identified with callout i. It is also possible to see the piezoelectric sensor between the two loops identified by callout i.
The traffic volume data (aggregated by hour) used in this study are publicly available online at the Indiana Department of Transportation’s (INDOT) Traffic Count Database system , the Ohio Department of Transportation’s (ODOT) Traffic Monitoring Management System , and the Pennsylvania Department of Transportation’s (PennDOT) Traffic Information Repository .
3.2. Vehicle Trajectory Data
The vehicle trajectory data used in this study consists of anonymized individual waypoints that are collected every three seconds along with an anonymized trajectory identifier and GPS, timestamp, and heading information. This data was obtained through a third-party provider. This provider receives its data directly from the original equipment manufacturers (OEMs).
The vehicle trajectory counts were obtained by identifying quarter mile geofence regions centered at the count station for both travel directions. The vehicle trajectory waypoints located inside the geofence region were selected and the number of unique trajectories was counted.
Figure 2. Inductive loops (i) on an Indiana roadway (IN-12).
Fourteen days, seven Wednesdays and seven Saturdays, over seven months between January 2020 and June 2021 were analyzed. The dates include:
Wednesday, January 15, 2020;
Saturday, January 11, 2020;
Wednesday, August 26, 2020;
Saturday, August 22, 2020;
Wednesday, September 23, 2020;
Saturday, September 26, 2020;
Wednesday, December 9, 2020;
Saturday, December 12, 2020;
Wednesday, January 13, 2021;
Saturday, January 9, 2021;
Wednesday, May 26, 2021;
Saturday, May 22, 2021;
Wednesday, June 30, 2021;
Saturday, June 26, 2021.
However, Ohio and Pennsylvania were limited to the two days in August 2020 due to data availability.
To calculate the hourly, directional percent penetration, the DOT and vehicle trajectory counts were aggregated by hour and by direction. This was calculated by
where Hp is the hourly percent penetration per direction, Vh is the hourly count of unique vehicle trajectories, and Ch is the hourly count of vehicles to pass the count station. The hourly ODOT counts, hourly vehicle trajectory counts, and resulting hourly percent penetration for the northbound (NB) direction of location OH-6, located along I-75 near Toledo, OH, for August 26, 2020 are shown in Figure 3.
The daily, directional percent penetration was determined by
where Dp is the daily percent penetration per direction, Vh is the hourly count of the vehicle trajectories, and Ch is the hourly count of the vehicles to across the count station. Table 2 contains the daily, directional counts and resulting daily penetration for location OH-6.
The monthly, bi-directional percent penetration is calculated using counts from both directions and both the Wednesday and Saturday of each month using,
Figure 3. ODOT and vehicle trajectory hourly counts and percent penetration for OH-6 for the NB direction on August 26, 2020. (a) ODOT hourly counts; (b) Vehicle trajectory hourly counts; (c) Hourly penetration.
where Mp is the monthly, bi-directional percent penetration, Vd is the daily count of vehicle trajectories for both directions, and Cd is the daily count of the vehicles to cross the count station for both directions. Table 3 contains the number of ODOT counts and vehicle trajectory counts for both northbound (NB) and southbound (SB) directions for Wednesday and Saturday in August 2020. The resulting monthly penetration is shown at the bottom.
A weighted average approach of aggregating raw counts, instead of percentages, was chosen to eliminate the effects of outlier hourly or daily percent penetrations.
Tables 4-6 contain the average monthly penetration for Indiana, Ohio, and Pennsylvania, respectively. Although the data was collected from continuous count stations, some days did not contain data; Asterisks or blank boxes indicate that either one or both days were missing data. Percent penetration in Indiana had the largest range of penetrations between 2.5% and 9.8%, while the percent penetrations were between 2.4% and 8.3% for Ohio and between 2.3% and 5.9% for Pennsylvania. Table 7 presents the summary statistics for August 2020 for
Table 2. ODOT and vehicle trajectory hourly counts and percent penetration for OH-6 for the NB direction on August 26, 2020.
Table 3. Monthly summary for OH-6 for August 2020.
Table 4. Average monthly penetration for Indiana roadways.
*Count station data only available for one day of the two days. Note: Blank boxes indicate that INDOT counts were unavailable.
Table 5. Average August penetration for Ohio.
Table 6. Average August penetration for Pennsylvania.
*Count station data only available for one day of the two days.
Table 7. Summary statistics for August 2020.
interstates, non-interstates, Indiana roadways, Ohio roadways, and Pennsylvania roadways. Across all time periods, road types, and states, the connected vehicle percent penetration ranged from 1.8% to 9.8% with an average of 4.6%, a median of 4.5%, and a standard deviation of 1.3%.
While this study did not delve into the factors that affect percent penetration, a few possible factors did stand out. Location IN-21 is unique because it consistently has percent penetrations three to four standard deviations above the average. This count station is located on a non-interstate road near Anderson, IN. Non-interstate roadways typically have higher percent penetrations than interstates likely due to interstate routes having higher volumes of truck traffic. However, the connected vehicle data used in this study is predominantly obtained from passenger vehicles. Non-interstate percent penetrations are, on average, roughly 1% larger than interstate percent penetrations; therefore, this alone doesn’t account for the high percent penetration at IN-21. Average Annual Daily Traffic (AADT), on the other hand, didn’t seem to affect the percent penetration. Of the Indiana non-interstate roads, this location had the median AADT; therefore, AADT likely isn’t a factor in this location’s high percent penetration. A possible explanation is the close proximity of an OEM facility which is one of the significant contributors to the connected vehicle data used in this study.
Temporal and seasonal variations also affected the percent penetration. A longitudinal study was conducted on Indiana’s 24 count stations between January 2020 and June 2021 (Figure 4). Between January 2020 and December 2020, the percent penetration decreased by less than 0.8%. Indiana began implementing coronavirus (COVID-19) pandemic restrictions in March 2020 (29). This caused a decrease in passenger vehicle traffic which likely led to a reduction in percent penetration. Between December 2020 and June 2021, the percent penetration increased modestly. This is possibly due to the post-pandemic rebound which resulted in a growth of passenger vehicle traffic. Figure 5 shows the
Figure 4. Average monthly penetration over time by road type and state.
Figure 5. Aggregated average percent penetration by time of day over all count stations for August 2020.
variation in average percent penetration through the day. For both interstate and non-interstate roads, percent penetration is at its highest during the day, especially during evening peak periods with a high of 4.4% for interstates and 5.4% for non-interstates. Nighttime, especially during the early hours of the day, saw average percent penetration go below 2%. Daylight hours typically have a higher volume of passenger vehicles compared to truck volumes, while nighttime hours see a reduction in passenger vehicle volumes.
All states maintain a highway performance monitoring system and collect vehicle counts on their roadways. The methodology described in this paper could be easily scaled to other locations. As connected vehicle data enables new techniques for analyzing road networks, the percent penetration of connected vehicles could be an important metric to understanding representativeness of connected vehicle data along different roadways and in different areas.
The objective of this paper was to present a preliminary methodology based on DOT vehicle count data and anonymized connected vehicle trajectory data to report the penetration of connected vehicles. This paper analyzed 54 locations between Indiana, Ohio, and Pennsylvania on select Wednesdays and Saturdays between January 2020 and June 2021. Data from permanent and continuous traffic count stations were compared with unique connected vehicle trips in the same region to generate hourly, daily, and monthly penetration estimates (Table 2 and Table 3). The 54 locations analyzed had percent penetration values between 1.8% and 9.8% with an average percent penetration of around 4.4% and a median penetration of 4.5% (Tables 4-6). Indiana, Ohio, and Pennsylvania had similar monthly percent penetration for August 2020 with average percent penetrations of 4.6%, 4.5%, and 3.9% and median percent penetrations of 4.6%, 4.3%, and 4.0%, respectively (Table 7).
In addition to comparing percent penetration by state, a longitudinal study of connected vehicle penetration in Indiana was conducted. Following January 2020, Indiana’s percent penetration saw a less than 1% dip, possibly due to a decrease in passenger vehicle travel due to the COVID-19 pandemic. The percent penetration saw a slight increase between December 2020 and June 2021, possibly indicating that passenger vehicle travel is increasing as COVID restrictions are lifted (Figure 4). The percent penetration was highest during the daylight hours, especially during evening peak periods (Figure 5).
Since all states have highway performance monitoring systems, this paper concludes by recommending that a connected vehicle penetration monitoring be added so that states can monitor the growth of connected vehicle penetration over time so that transportation professionals have regional specific information regarding the relative penetration of connected vehicles that they can use to determine if connected vehicle data can be used to meet their performance measure needs and/or what level of aggregation is required to obtain statistically robust performance measures.
Trajectory data for seven Wednesdays and seven Saturdays between January 2020 and June 2021 used in this study was provided by Wejo Data Services, Inc. Traffic volumes were retrieved from web count station portals hosted by Indiana DOT, Ohio DOT, and Pennsylvania DOT. This work was supported in part by the Joint Transportation Research Program and Pooled Fund Study (TPF-5(377)) led by the Indiana Department of Transportation (INDOT) and supported by the state transportation agencies of California, Connecticut, Georgia, Minnesota, North Carolina, Ohio, Pennsylvania, Texas, Utah, Wisconsin, plus the City of College Station, Texas, and the FHWA Operations Technical Services Team. The contents of this paper reflect the views of the authors, who are responsible for the facts and the accuracy of the data presented herein, and do not necessarily reflect the official views or policies of the sponsoring organizations. These contents do not constitute a standard, specification, or regulation.
 Saldivar-Carranza, E., Li, H., Mathew, J., Hunter, M., Sturdevant, J. and Bullock, D.M. (2020) Deriving Operational Traffic Signal Performance Measures from Vehicle Trajectory Data. Transportation Research Record: Journal of the Transportation Research Board, 1-15.
 Hunter, M., Saldivar-Carranza, E., Desai, J., Mathew, J., Li, H. and Bullock, D.M. (2021) A Proactive Approach to Evaluating Intersection Safety Using Hard-Braking Data. Springer, Berlin.
 Desai, J., Li, H., Mathew, J.K., Cheng, Y.-T., Habib, A. and Bullock, D.M. (2020) Correlating Hard-Braking Activity with Crash Occurrences on Interstate Construction Projects in Indiana. Journal of Big Data Analytics in Transportation, 3, 27-41.
 Hunter, M., Mathew, J.K., Cox, E., Blackwell, M. and Bullock, D.M. (2021) Estimation of Connected Vehicle Penetration Rate on Indiana Roadways. JTRP Affiliated Reports, Paper 37, 1-6.
 Quiroga, C.A. and Bullock, D. (1999) Travel Time Information Using Global Positioning System and Dynamic Segmentation Techniques. Transportation Research Record, 1660, 48-57.
 Hoseinzadeh, N., Liu, Y., Han, L.D., Brakewood, C. and Mohammadnazar, A. (2020) Quality of Location-Based Crowdsourced Speed Data on Surface Streets: A Case Study of Waze and Bluetooth Speed Data in Sevierville, TN. Computers, Environment and Urban Systems, 83, Article ID: 101518.
 Kim, S. and Coifman, B. (2014) Comparing INRIX Speed Data against Concurrent Loop Detector Stations over Several Months. Transportation Research Part C: Emerging Technologies, 49, 59-72.
 Haghani, A., Hamedi, M. and Sababadi, K.F. (2009) I-95 Corridor Coalition Vehicle Probe Project: Validation of INRIX Data July-September 2008 Final Report. I-95 Corridor Coalition.
 Zhang, X., Hamedi, M. and Haghani, A. (2015) Arterial Travel Time Validation and Augmentation with Two Independent Data Sources. Transportation Research Record, 2526, 79-89.
 Ahsani, V., Amin-Naseri, M., Knickerbocker, S. and Sharma, A. (2019) Quantitative Analysis of Probe Data Characteristics: Coverage, Speed Bias and Congestion Detection Precision. Journal of Intelligent Transportation Systems Technology Planning and Operations, 23, 103-119.
 Zhao, Y., Zheng, J., Wong, W., Wang, X., Meng, Y. and Liu, H.X. (2019) Estimation of Queue Lengths, Probe Vehicle Penetration Rates, and Traffic Volumes at Signalized Intersections Using Probe Vehicle Trajectories. Transportation Research Record, 2673, 660-670.
 Zhang, C., Wang, J., Lai, J., Yang, X., Su, Y. and Dong, Z. (2019) Extracting Origin-Destination with Vehicle Trajectory Data and Applying to Coordinated Ramp Metering. Journal of Advanced Transportation, 2019, Article ID: 8469316.
 Day, C.M. and Bullock, D.M. (2016) Detector-Free Signal Offset Optimization with Limited Connected Vehicle Market Penetration: Proof-of-Concept Study. Transportation Research Record: Journal of the Transportation Research Board, 2558, 54-65.
 Day, C.M., et al. (2017) Detector-Free Optimization of Traffic Signal Offsets with Connected Vehicle Data. Transportation Research Record: Journal of the Transportation Research Board, 2620, 54-68.
 Li, H., Mackey, J., Luker, M., Taylor, M. and Bullock, D.M. (2019) Application of High-Resolution Trip Trace Stitching to Evaluate Traffic Signal System Changes. Transportation Research Record: Journal of the Transportation Research Board, 2673, 188-201.
 Waddell, J.M., Remias, S.M., Kirsch, J.N. and Trepanier, T. (2020) Utilizing Low-Ping Frequency Vehicle Trajectory Data to Characterize Delay at Traffic Signals. Journal of Transportation Engineering, Part A: Systems, 146, Article ID: 04020069.
 Ma, W., Wan, L., Yu, C., Zou, L. and Zheng, J. (2020) Multi-Objective Optimization of Traffic Signals Based on Vehicle Trajectory Data at Isolated Intersections. Transportation Research Part C: Emerging Technologies, 120, Article ID: 102821.
 Li, H., Day, C.M. and Bullock, D.M. (2016) Virtual Detection at Intersections Using Connected Vehicle Trajectory Data. IEEE Conference on Intelligent Transportation Systems, Proceedings, Rio de Janeiro, 1-4 November 2016, 2571-2576.