Research & Data

Data Modernization

What is Data Modernization?

Data Modernization is the multi-year project that NHTSA undertook to redesign, modernize and improve its crash data collection systems. In FY 2012, Congress appropriated $25 million to the agency to modernize the National Automobile Sampling System (NASS). NHTSA identified three major areas for improvement: redesigning the survey sample, modernizing the information technology infrastructure, and revamping its data collection protocols and technology. The goal of DataMod is to affirm NHTSA's position as the leader in motor vehicle crash data collection and analysis, by collecting quality data to keep pace with emerging technologies and policy needs.

NASS was composed of two nested probability sampling systems, the General Estimates System (GES) and the Crashworthiness Data System (CDS). The GES collected general information on traffic crashes from police crash reports. The CDS collected more detailed information on passenger vehicles and occupants. NHTSA developed and implemented the GES in the 1970s. It was based on a three-stage stratified probability sample of primary sampling units (PSUs), police jurisdictions (PJs), and police crash reports. The CDS 24-PSU sample was a subsample of the GES 60-PSU sample. The same PSU and PJ samples had been used for GES data collection since the 1980s.

Over the past two decades, however, the general population, vehicles, and crash characteristics have changed dramatically. In addition, the transportation community’s research interests have expanded to topics such as driver performance, crash avoidance, and the effects of new technologies on crash amelioration.

    Crash Sampling Systems

    What was developed?

    NHTSA launched the redesign in January 2012 with the majority of the effort focused on the formation of conceptual research designs, establishment of sampling frames, selection of data collection locations and sources, and documentation of protocol and results for the new system(s). During this process, two new national probability-based crash sampling systems were designed to replace GES and CDS: the Crash Report Sampling System (CRSS) and the Crash Investigation Sampling System (CISS). Based on its assessment of research objectives and operational considerations, NHTSA decided to design the CRSS independently from CISS to best optimize both systems. Therefore, unlike NASS, the formation and selection of the CRSS PSUs were independent of the CISS PSU formation and selection.

    CRSS is a nationally representative sample of police-reported crashes involving all types of motor vehicles, pedestrians, and cyclists. These crashes include those that result in a fatality, injury, or property damage. Crash reports are chosen from 60 selected areas across the United States that reflect the geography, population, miles driven, and crash distribution in the nation. The annual file of uniformly coded crash report information can be used to track crash trends on the number and severity of crash-related non-fatal injuries.

    CISS is a probability-based sampling system that produces nationally representative probability samples of police-reported motor vehicle crashes where at least one passenger vehicle was towed from the scene for any reason. Trained crash technicians obtain data from 32 crash sites by documenting scene evidence, inspecting the vehicles involved, interviewing crash victims and reviewing crash victims’ medical records to determine the nature and severity of the crash-related injuries. CISS collects detailed vehicle crash and injury outcome data used to identify problems and evaluate the life-saving potential of new technologies.

    What is the current status of DataMod?

    NHTSA has completed the multi-year data modernization effort by releasing nationally representative data from CISS and CRSS. Two years (2016 and 2017) of CRSS data have been released, and data from the 2017 CISS was released in September 2019. In addition, a significant amount of documentation of the sample design, key enhancements, key estimates, analytical users’ manuals and analytic guidance have been released.

    What improvements/enhancements were made as a result of DataMod?

    DataMod provided the opportunity to make many improvements and enhancements to the sample design, data collection processes, and consumer-focused outputs. The two samples were designed to be independent, scalable, and flexible. Splitting CRSS and CRSS allowed NHTSA to optimize each sample to the specific goals of each survey. The ability to scale the sample up or down easily provides the flexibility to accommodate changes in data collection sites and police jurisdictions and adjust for budgetary fluctuations or administrative changes in the police jurisdictions while still enabling NHTSA to achieve desired sample allocations. The CISS sample also allows for replacement cases in situations where the vehicles in the original sampled case is not available. This has resulted in a significant increase in the number of CISS cases available for analysis.

    Data modernization also enabled NHTSA to improve the data collection processes. In CRSS, the agency is using electronic transfer of crash data in some locations to more efficiently list, sample, and code certain data automatically. In CISS, the crash technicians now use electronic distance measuring instruments to efficiently collect and improve the accuracy of scene diagrams and vehicle crush measurements. In CISS, we have also improved the level of injury detail available for crash victims by adding 10 additional data elements to describe injury causation scenarios for seriously injured occupants.

    In addition, NHTSA is improving the accessibility of the data provided to the public for analysis and use. CRSS and CISS data is now available in multiple file formats that allows the public to easily access and use the data. The agency is also providing more useful data in CISS with scalable scene diagrams and automated Event Data Recorder (EDR) data in CISS. NHTSA has also invested in an analytic platform that has enabled it to deploy analytic tools to the general public. These tools, some already public or about to be released to the public, enable end-users to query NHTSA crash data and generate results in the form of trend tables, charts or Geographic Information System (GIS) maps. These products fill long-standing gaps in providing user-friendly analytic tools that unlock the complexity of the crash data and presenting more interactivity in constructing queries. The improvements to sample design, data collection processes, and published data provide two new and improved crash data resources to NHTSA and its customers.