National Surey of speeding and Other
Unsafe Driving Actions: Methodology Report
The survey was conducted by telephone by the national survey research organization of Schulman, Ronca & Bucuvalas, Inc (SRBI). A national household sample was constructed using random digit dialing. Each household was screened to determine the number of adult (age 16 or older) drivers in the household. One eligible driver was systematically selected in each eligible household by the interviewers, using computer-assisted telephone interviewing (CATI) to reduce interview length and minimize recording errors. A Spanish-language translation and bilingual interviewers were used to minimize language barriers to participation. The survey was conducted between February 20 and April 11, 1997. The telephone interviews averaged 30 minutes in length. A total of 6,000 interviews were completed with a participation rate of 73.5 percent.
Since this was the first national survey of speeding and unsafe driving practices the number of issues to be covered was extensive. In order to accommodate the number of questions required without unduly burdening the public, two versions of the questionnaire were developed. One questionnaire (Version 1) focused primarily on speeding issues. The other questionnaire (Version 2) focused primarily on other forms of unsafe driving. Each version was fielded as an independent national sample, constructed in an identical fashion. Hence, for some questions we have national estimates based on sample sizes of 3,000, while estimates for core questions about speeding and unsafe driving behavior, as well as driver and driving characteristics shared by both versions, are based on sample sizes of 6,000.
In addition to these component sample sizes, in a few instances a specific question was asked of a cross sample. That is, Version 1, Split A together with Version 2, Split B for an unweighted sample size of 3,022. The complement would be 2,978 which is made up of Version 1, Split B together with Version 2, Split A (see Table 1).
Most of the statistical formulas associated with sampling theories are based upon the assumption of simple random sampling. Specifically, the statistical formulas for specifying the sampling precision (estimates of sampling variance), given particular sample sizes, are premised on simple random sampling. Unfortunately, random sampling requires that all of the elements in the population have an equal chance of being selected. Since no enumeration of the total population of the United States (or its subdivisions) is available, all surveys of the general public are based upon an approximation of the actual population and survey samples are generated by a process closely resembling true random sampling.
The survey sample was based on a modified stratified random digit dialing method (RDD), using an area probability/RDD sample rather than a single-stage/RDD sample. There are several important advantages to using an area probability base: (1) it draws the sample proportionate to the geographic distribution of the target population rather than the geographic distribution of telephone households, which is vital to constructing unbiased population estimates from telephone surveys; (2) it allows greater geographic stratification of the sample to control for known geographic differences in non-response rates; and (3) it facilitates the use of Census estimates of population characteristics to weight the completed sample to correct for other forms of sampling bias. Moreover, the precision of sample estimates is generally improved by stratification.
Hence, as specified for the study design for the survey, the adult household population of the United States was stratified by the 10 NHTSA regions. The estimated distribution of the population by stratum was calculated on the basis of the Bureau of the Census, Resident Population of the United States, Regions and States by Selected Age Groups and Sex: April 1, 1990 Census and July 1, 1990 to July 1, 1995 Estimates (release date, August 1996). At the time of the survey, these were the most recent projections of the distribution of adult population by state. Based on these Census data on the geographic distribution of the target population, the total sample was proportionately allocated by stratum. The geographic allocation of the cross-sectional sample for the survey is presented in Table 2 (next page).
Once the sample had been geographically stratified with sample allocation proportionate to population distribution, a sample of assigned telephone banks were randomly selected from an enumeration of the Working Residential Hundred Blocks of the active telephone exchanges within the region. The Working Hundreds Blocks were defined as each block of 100 potential telephone numbers within an exchange that included 3 or more residential listings. (Exchanges with one or two listings were excluded because in most cases such listings represent errors in the published listings).
Total does not add to 100% due to rounding.
Source: Population Projections for States, by Age, Sex, Race, and Hispanic Origin: 1993 to 2020 (Current Population Reports, P25-1111), U.S. Bureau of the Census
The total driving population (see Table 3) was estimated using data from the Bureau of the Census= (U.S. Population Estimates by Age, Sex, Race and Hispanic Origin: 1990 to 1997), and 1996 Motor Vehicle Occupant Safety Survey. The single-year-of-age population estimates for November 1997 were aggregated to the categories used in the current study and then multiplied by the proportion of each age cohort who reported driving in the 1996 study. It should be kept in mind that this is an estimate of the driving population rather than the result of an enumeration of the population.
The use of residential listings to identify working residential exchanges is generally described as "listed-assisted" or "truncated" RDD sampling. In a series of empirical studies, Brick, et. al. demonstrated that only about four percent of all telephone households are excluded in national samples using this method. In addition, these studies indicate that the differences between covered and uncovered samples are trivial in most instances, although no direct study of the relationship between driving and having a telephone at home has been made. The principal advantage of "list assisted" sampling is that an equal probability systematic sample of telephone numbers can be selected under this procedure and the variances of estimates from the list-assisted sample are usually lower than those from a clustered design like the Mitovsky-Waksberg RDD method.
In the third stage sample, a two-digit number was randomly generated by computer for each Working Residential Hundreds Block selected in the second stage sample. This third stage sampling process is the random digit dialing (RDD) component. Every telephone number within the Hundreds Block has an equal probability of selection, regardless of whether it is listed or unlisted.
The use of RDD sampling eliminates the otherwise serious problem of unlisted telephone numbers. Nationwide, approximately 20% of all phone subscribers have unlisted phones. Moreover, significant variation occurs among demographic groups, with the number of unlisted phones reaching a high of 26% in the West, 29% in large metropolitan areas, 25% among those earning $5,000-$10,000, and 32% among nonwhites.
The third stage RDD sample of telephone numbers was then dialed by SRBI interviewers to determine which were currently working residential household phone numbers. Non-working numbers and non-residential numbers were immediately replaced by other RDD numbers selected within the same stratum in the same fashion as the initial number. Ineligible households (e.g., no adult in the household, language barriers) were also immediately replaced. Non-answering numbers were not replaced until the research protocol (in this study, a five-call protocol) was exceeded. However, one or more open numbers per case (e.g., for ever case yet to be completed, there may be one or more numbers in working categories such as no answer, callback, etc.,) may have been permitted in order to permit the replicate to be completed within a reasonable period.
SCREENING TO DETERMINE HOUSEHOLD ELIGIBILITY
The sample construction process yielded a population-based, random-digit dialing sample of telephone numbers. The systematic dialing of those numbers to obtain a residential contact yielded an unbiased sample of telephone households. The next step was to select eligible households within the total sample of working numbers.
An adult respondent at each number drawn into the sampling frame was contacted about the composition of the household. Telephone numbers that yielded non-residential contacts such as businesses, churches, and college dormitories, were screened out. Only households, i.e., residences at which any number of related individuals or no more than five unrelated persons living together, were eligible for inclusion in the sample. This minimal screening was only to ascertain that the sample of telephone numbers reached by interviewers are residential households.
SELECTION OF RESPONDENT WITHIN HOUSEHOLD
The multi-stage sampling process described in the previous sections yielded an unbiased national sample of households with telephones, drawn proportionate to the population distribution. The final stage required the selection of one respondent per household for the interview.
A systematic selection procedure was used to select one designated respondent for each household sampled. The "most recent/next birthday method" was used for within household selection among multiple eligibles. The Within Household Selection Procedure is presented in Figure 1. The CATI system alternated the "most recent" and "next" birthday specification for the selected respondent to avoid a temporal bias for birthdays before (or after) the field period.
MONITORING OF TELEPHONE INTERVIEWERS
SRBI draws upon a staff of experienced telephone supervisors for its projects. All supervisors participated in the project training session. In addition, they underwent an additional review on interview editing instructions, refusal prevention and conversion, and study issues.
Two types of supervisors are utilized in SRBI telephone surveys: shift supervisors and monitors. A shift supervisor was on duty each of the 14 weekly shifts. They were responsible for quality control, maintaining production rates and supervising the monitors. In addition, SRBI assigned one monitor for every 10 interviewers.
Each interviewer was silently monitored by a line monitor at least twice each interviewing shift. The study monitor sat at a CRT where he/she can see what the interviewer has recorded, while audio-monitoring the interview. The audio-monitoring allowed the supervisor to determine the quality of the interviewer's performance in terms of:
The supervisor also monitored the interviewer's recording of survey responses on the CRT monitor. The supervisor's CRT emulates the interviewer's CRT. Consequently, the supervisor was able to see whether the interviewer entered the correct code, number or verbatim response to the question.
Initial telephone contact was attempted during the hours of the day and days of the week which have the greatest probability of respondent contact. The primary interviewing period was from 5:30 p.m. to 10:00 p.m. on weekdays, from 9:00 a.m. to 10:00 p.m. on Saturdays, and from 10:00 a.m. to 10:00 p.m. on Sundays (all times are local time). Since interviewing was conducted across time zones, the interviewing shift lasted until 1:00 a.m. Eastern Time (10:00 p.m. Pacific Time).
If the interview was not conducted at the time of initial contact, the interview was rescheduled at a time convenient to the respondent. Although initial contact attempts were made on evenings and weekends, daytime interviews were scheduled when necessary. If four telephone contacts on the night and weekend shifts did not elicit a respondent contact, the fifth contact was attempted on a weekday.
Interviewers attempted a minimum of five calls to each telephone number. When the household was reached, the interviewer asked to speak to an adult to screen the household for eligibility and to determine the designated respondent. When the designated respondent was reached but an interview at that time was inconvenient or inappropriate, interviewers set up appointments with respondents. When contact was made with the household, but not the designated respondent(s), interviewers probed for appropriate callback times and attempted to set up an appointment.
SPANISH LANGUAGE INTERVIEWERS
Spanish language versions of the two survey instruments were developed in order to eliminate language barriers for a small proportion of the U.S. adult population. If the interviewer encountered a language barrier at the telephone number, either with the person answering the phone or with the designated respondent, the interviewer thanked the person and terminated the call. If the case was designated as Spanish language, it was turned over to the next available Spanish-speaking interviewer. All households in which a language barrier (Spanish) was encountered were assigned to a Spanish-speaking interviewer. These bilingual interviewers re-contacted the Spanish-speaking households to screen for eligibility and conduct interviews with eligible respondents.
The process of converting terminations and refusals, once they had occurred, involved the following steps. First, there was a diagnostic period, when refusals and terminates were reported on a daily basis and the Project Director and Operations Manager reviewed them after each shift to see if anything unusual was occurring. Second, after enough time had passed to see a large enough sample of refusals and terminations, the Project Director and his staff developed a refusal conversion script. Third, the refusal conversion effort was fielded with re-interview attempts scheduled about a week after the initial refusal. Fourth, the Project Director and Operations Manager received the outcomes of the refusal conversion efforts on a daily basis. Minor revisions of the script and the procedures were made, as needed. The final refusal conversion script is shown in Figure 2 (next page).
The field interviewing for the study commenced on February 20, 1997, following training of the field interviewers, and was completed on April 11, 1997. Status of cases as of the end of the field period are reported using the categories defined below.
In total, 21,415 randomly selected telephone numbers were sampled within a geographically stratified national sampling frame, with the following results:
At the close of the field period, only 684 cases (3%) were in callback status.
The participation rate represents one of the most critical measures of potential sample bias because it indicates the degree of self-selection by potential respondents into or out of the survey. The participation rate is calculated as the number of completed interviews (a successful interview) plus respondents who screen out as ineligible (assumed to be a successful interview if an eligible person wound have been found) divided by the number of contacts (possibility of a successful interview existed C the sum of completed interviews, terminated interviews, screen outs and refusals to interview). The inclusion of screen outs in the numerator and denominator is mathematically equivalent to discounting the refusals by the estimated rate of non-eligibility among refusals, that is, it assumes that screen outs will be found in the same proportion among refusals as they were found among non-refusals. The participation rate is based on the following elements:
Based on the standard calculations of participation rate, the participation rate for this survey was 73.5 percent.
The Final Summary Disposition sample is given in Table 4 (next page). The average interview length for the survey was 30 minutes.
The characteristics of a perfectly drawn sample of a population will vary from true population characteristics only within certain limits of sample variability (i.e., sampling error). Unfortunately, social surveys do not permit perfect samples. The sampling frames available to survey research are less than perfect. The absence of perfect cooperation from sampled units means that the completed sample will differ from the drawn sample. In order to correct these known problems of sample bias, the achieved sample is weighted to certain characteristics of the total population.
The weighting plan for the survey was a multi-stage sequential process of weighting the achieved sample to correct for sampling and non-sampling biases in the final sample. The first stage in the sample weighting procedures was designed to correct the cases in the completed sample for known selection biases in the sampling procedures. At the household selection stage, a random digit dialing process will give households with more than one telephone number an unequal likelihood of selection. Nationally, about 18% percent of households selected by random digit dialing will have more than one telephone number. This selection bias was corrected by giving each household a first stage weight equal to .5 if there was more than one different telephone number in the household.
The second step in the weighting process was to correct for selection procedures that yielded unequal probability of selection within sampled households. Although the survey was designed as a population survey, only one eligible person per household could be interviewed (because multiple interviews per household are burdensome and introduce additional design effects into the survey estimates). A respondent's probability for selection is inverse to the size (number of other eligible adults) of the household. Hence, the second stage weight was equal to the number of eligible respondents within the household.
The final step in the weighting process was designed to correct for the fact that the total number of cases in the weighted sample was larger than the unweighted sample size because of the use of the number of eligibles weight. In order to avoid misinterpretation of sample size, the total number of cases in the unweighted sample was divided by the total number of cases in the weighted sample to yield a sample size weight. When this weight is applied, the size of the weighted sample is identical to the size of the unweighted sample.
The final weight (WEIGHT3) incorporates all of the intermediate weighting steps described above. The final weight adjusts the 6,000 completed interviews in the achieved sample corrects for known sampling and participation biases, while maintaining the unweighted sample size.
PRECISION OF SAMPLE ESTIMATES
The objective of the sampling procedures used on this study was to produce an unbiased sample of the target population. An unbiased sample shares the same properties and characteristics of the total population from which it is drawn, subject to a certain level of sampling error. This means that with a properly drawn sample we can make statements about the properties and characteristics of the total population within certain specified limits of certainty and sampling variability.
The confidence interval for sample estimates of population proportions, using simple random samplingwithout replacement, is calculated by the following formula:
var (x) = the expected sampling error of the mean of some
variable, expressed as a proportion
p = some proportion of the sample displaying a certain
characteristic or attribute
q = (1 - p)
z = the standardized normal variable, given a specified
confidence level (1.96 for samples of this size).
n = the size of the sample
Using this formula, we can estimate that the maximum expected sampling error at the 95% confidence level (i.e., in 95 out of 100 repeated samples) for a total sample of 6,000 is + 1.3 percentage points. It should be noted that the maximum sampling error is based upon the conservative estimate that p = q = 0.5.
The sample sizes for the surveys are large enough to permit estimates for subsamples of particular interest. Table 5 (next page) presents the expected size of the sampling error for specified sample sizes of 6,000 and less, at different response distributions on a categorical variable. As the table shows, larger samples produce smaller expected sampling variances, but there is a constantly declining marginal utility of variance reduction per sample size increase.
NOTE: Entries are expressed as percentage points (+ or -).
Given extremely small differences in the confidence intervals for this sample and those expected for a simple random sample, the general formula for estimating confidence intervals for a simple random sample will normally be a perfectly reasonable guide for estimating sampling error for this sample. However, in order to conduct a specific interval for estimates from sample, the appropriate statistical formula for calculating the allowance for sampling error (at a 95% confidence interval) in a stratified sample is:
ASE = allowance for sampling error at the 95% confidence level;
sample divided by the number in the universe;
sh2 = the variance in the stratum h -- for proportions this
is equal to ph (1.0 - ph);
nh = the sample size for the stratum h.
Although Table 5 provides a useful approximation of the magnitude of expected sampling error, precise calculation of allowances for sampling error requires the use of this formula.
ESTIMATING STATISTICAL SIGNIFICANCE
The estimates of sampling precision presented in the previous section yield confidence bands around the sample estimates, within which the true population value should lie. This type of sampling estimate is appropriate when the goal of the research is to estimate a population distribution parameter. However, the purpose of some surveys is to provide a comparison of population parameters estimated from independent samples (e.g., annual tracking surveys) or between subsets of the same sample. In such instances, the question is not simply whether or not there is any difference in the sample statistics which estimate the population parameter, but rather is the difference between the sample estimates statistically significant (i.e., beyond the expected limits of sampling error for both sample estimates).
To test whether or not a difference between two sample proportions is statistically significant, a rather simple calculation can be made. Call the total sampling error (symbolized as var (x) in the formula on page 14) of the first sample s1 and the total sampling error of the second sample s2. Then, the sampling error of the difference between these estimates is sd which is calculated as:
Any difference between observed proportions that exceed sd is a statistically significant difference at the specified confidence interval. Note that this technique is mathematically equivalent to generating standardized tests of the difference between proportions. An illustration of the pooled sampling error between subsamples for various sizes is presented in Table 6. This table can be used to indicate the size of difference in proportions between drivers and non-drivers or other subsamples that would be statistically significant.
Brick, J, Waksberg, J, Kulp, D and Starer, A. Bias in List-Assisted Telephone Samples, Public Opinion Quarterly, Summer 1995, Vol. 59, No. 2, pp.218-235.
Casady, R. and Lepkowski, J. Stratified Telephone Survey Designs, Survey Methodology, June 1993, Vol 19, No 1: 103-13.
Groves, R. An Empirical Comparison of Two Telephone Sample Designs. Journal of Market Research, 1978 15:622-31.
Kish, L. A Procedure for Objective Respondent Selection Within the Household. Journal of American Statistical Association 1949 44: 380-387.
Keeter, S. Estimating Telephone Non-coverage bias from a Phone Survey.
Public Opinion Quarterly, Summer 1995, Vol. 59, No. 2, pp.196-217.
Lavrakas, P. Telephone Survey Methods: Sampling, Selection and Supervision. Beverly Hills: Sage Publications, 1987.
Salmon, C. and Nichols, J. Respondent Selection Techniques for Telephone Surveys. Presented to the Midwest Association for Public Opinion Research, Chicago, IL, 1980.
Statistical Characteristics of Random Digit Telephone Sample. Survey Sampling, Inc. Westport, CT 1986.
Tarnai, J., Rosa, E. and Scott, L. An Empirical Comparison of the Kish and the Most Recent Birthday Method for Selecting a Random Household Respondent in Telephone Surveys. Presented at the Annual Meeting of the American Association for Public Opinion Research, Hershey, PA, 1987.
Troldahl, V. and Carter, R. Random Selection of Respondents Within Households in Phone Surveys. Journal of Marketing Research, 1964 1:71-76.
Waksberg, J. Sampling Methods for Random Digit Dialing. Journal of the American Statistical Association, 1978 361: 40-46.