2.5 Tests for spurious effectiveness before MY 1985 and after MY 1986
Although the preliminary estimates are almost uniformly positive, the somewhat complex model for calculating effectiveness might engender concerns about possible hidden biases. It is appropriate to test that this model does not find spurious differences in rear-impact crashes among groups of vehicles that ought not be different. It is especially appropriate given that the "vehicle age effect" has been critiqued as an unwarranted inflator of the CHMSL effectiveness estimates , . As in NHTSA's 1989 evaluation of CHMSL , pp. 19-20 and Farmer's study , the test consists of applying the model to compare cars of two different model years with the same CHMSL status - e.g., MY 1984 vs. 1985 (neither CHMSL-equipped) or MY 1986 vs. 1987 (both CHMSL-equipped).
|1993||.0243||- .0075||- .0135||.0040||.0443||.0413||.0360||.0459|
In particular, we will compute the average of the difference in the vehicle-age-adjusted log odds of a rear impact for four pairs of model years: 1983 vs. 84, 1984 vs. 85, 1986 vs. 87 and 1987 vs. 88. (Of course, we omit the 1985 vs. 86 comparison because that was the year in which cars did change their CHMSL status.) Again, Florida data from CY 1992 will be used to illustrate the computational process. Let:
DEL8384 = ADJODDS(83) - ADJODDS(84) = -1.0560 - (- 1.0153) = - .0407
DEL8485 = ADJODDS(84) - ADJODDS(85) = -1.0153 - (- 0.9938) = - .0215
DEL8687 = ADJODDS(86) - ADJODDS(87) = -1.0648 - (- 1.0834) = +.0186
DEL8788 = ADJODDS(87) - ADJODDS(88) = -1.0834 - (- 1.0940) = +.0106
DELAVG = ¼ [DEL8384 + DEL8485 + DEL8687 + DEL8788] = - .0082
In other words, after correcting for the vehicle age effect, the log-odds of a rear impact increased by 4 percent from MY 83 to 84 and by 2 percent from 84 to 85; however it decreased by 2 percent from 86 to 87 and by 1 percent from 87 to 88. On the average, after controlling for the vehicle age effect, the log odds of a rear impact increased by 0.8 percent per year in the model years where CHMSL status did not change. That is a rather trivial change relative to the 6.5 percent reduction that the model attributes to CHMSL in the 1992 Florida data (CHMSLAVG), and moreover, it is in the opposite direction. In 1992 Florida data, the computational methods of this model, including the adjustment for the vehicle age effect, do not spuriously show a positive effect when CHMSL status did not change; therefore, it may be inferred that the positive effect associated with CHMSL is not inflated with biases introduced by the computational method.
The preceding formulas need to be modified in the CY 86 and 87 analyses, since data are not available for the later model years. In CY 86, DELAVG = ½ [DEL8384 + DEL8485]; and in CY 87, DELAVG = [DEL8384 + DEL8485 + DEL8687].
DELAVG, the measure of spurious effectiveness generated by our model, is computed separately in every State file, in every calendar year, as shown in Table 2-5. In 45 of 79 cases, DELAVG has absolute value less than 1 percent (i.e., it is between -.01 and +.01): essentially trivial, in practical terms, because the CHMSL effectiveness is usually around +5 percent. However, even those cases where DELAVG exceeds 1 percent do not necessarily suggest that the method for computing effectiveness is biased. Every statistic in this report, including DELAVG, is derived from tabulations of finite numbers of crash records and as such is subject to "noise." Individual values can be expected to deviate randomly from what they "ought" to be (zero, in the case of DELAVG). Each individual value of DELAVG need not be close to zero, but it is important that the average value of DELAVG converge on zero.
A simple, nonparametric test of whether DELAVG diverges from zero is to note that 44 of the 79 observed values are negative and 35 are positive. This result is well within the acceptance region for a binomial distribution with p = .5 and two-sided = .05. Only if we had seen at least 49 negatives [or positives] would we have rejected the null hypothesis that DELAVG is equally likely to be positive or negative.
8 State Pop-Wtd
|1986||- .0083||.0037||.0070||- .0071||- .0122||.0053||- .0222||.||- .0031||- .92|
|1987||- .0123||.0029||- .0310||- .0318||- .0275||.0154||- .0215||- .0110||- .0096||-1.43|
|1990||.0066||- .0204||.0091||.0201||.0149||.0005||- .0141||.0055||.0048||1.23|
|1991||.0039||- .0147||- .0045||.0000||- .0036||.0007||- .0201||- .0056||- .0023||-1.06|
|1992||- .0082||- .0314||.0045||- .0068||- .0074||- .0085||- .0050||- .0112||- .0093||-3.23*|
|1993||.0307||.0034||- .0081||- .0029||- .0170||- .0030||.0115||.0014||.0021||.36|
|1994||- .0044||- .0263||- .0256||- .0243||.0046||- .0020||- .0096||- .0005||- .0069||-1.68|
|1995||.0016||- .0227||- .0385||- .0157||- .0242||- .0053||- .0264||- .0151||- .0139||-3.01*|
|Average||.0017||- .0096||- .0061||- .0049||- .0065||.0022||- .0136||- .0006||- .0023|
|t-test||.67||-2.05||- .99||- .94||-1.49||.93||-3.40*||- .16||-1.59|
*Statistically significant difference from zero (two-sided p < .05)
To compute a meaningful average of the 79 values of DELAVG, States that contribute more data ought to be given a higher weight. The 1990 population is a readily available statistic, and it is intuitively well correlated with on-the-road exposure, since our eight States do not differ greatly in per capita vehicle ownership, mileage, etc. These populations were:
The population-weighted average of the 79 observations of DELAVG is -.0023, as shown in the lower right-hand corner of Table 2-5. For all practical purposes, this average bias is essentially zero, given that the average effect of CHMSL is on the order of +.0500. A standard deviation and t-test for those 79 weighted numbers can be obtained by running the General Linear Model (GLM) procedure of the Statistical Analysis System (SAS)  with DELAVG as the dependent variable, no independent variables, 1990 population as the weight factor, and exercising the "INT" option to obtain statistics on the intercept. Table 2-5 shows this t-value is -1.59, and it implies that, given the noise in the 79 individual values, their average is not significantly different from zero.
In addition to a nonsignificant overall average, DELAVG should ideally not be significant within any State or within any calendar year, and there ought not be significant differences between States or between calendar years. DELAVG comes close to achieving that ideal within and between States. The two last rows of Table 2-5 show the 10-year average values (in Virginia, 9-year average) in each State, and the t-test result. Only Utah has an average with absolute value greater than 1 percent, and only Utah's average is significantly different from zero (p < .05); however, one "significant" result in eight tests could easily happen by chance alone. Also, the GLM procedure can be used to run, essentially, analyses of variance on the population-weighted DELAVG values: the differences between the States are nonsignificant in a univariate analysis (F = 1.49; df = 7,71; p > .05). The State effect is likewise nonsignificant in a bivariate analysis including the CY variable (F = 1.98; df = 7,62; p > .05).
DELAVG is a bit less close to achieving that ideal within and between CY. The two right columns of Table 2-5 show the 8-State average values (in 1986, 7-State average) in each CY, and the t-test result. Four of the ten averages are significantly different from zero (p < .05), two positive and two negative. In the analyses of variance based on GLM, the CY effect achieves significance in a univariate analysis (F = 3.27; df = 9,69; p < .05) as well as in a bivariate analysis including the State variable (F = 3.59; df = 9,62; p < .05). Another indication of the model's slight misfit is that DELAVG is rather consistently negative in Indiana, Maryland, Missouri and Texas in the later years. Nevertheless, all of these biases are fairly trivial. Figure 2-7 graphs the 8-State average values of DELAVG (represented by the O's) by calendar year, on the same scale as the best estimates of CHMSL effectiveness that will be defined in the next section (represented by the 's). DELAVG is always much closer to zero than the effect of CHMSL. The 8-State average of DELAVG is less than 1 percent in every CY except 1995 - and in that year it is still only -1.4 percent. Also, unlike the results for some individual States, the 8-
State average goes back and forth between positive and negative throughout 1986-95 in a sine-wave pattern of low amplitude.
2.6 Adjusting the estimates for retrofits and other factors
The preceding discussion hopefully demonstrated that the "vehicle age effect" has been properly accounted for and factored into the model. We now return to the computation of the effectiveness of CHMSL. The preliminary estimates in Table 2-4 are average reductions in the age-adjusted log odds of a rear impact for an MY 1986 or later car relative to a pre-1986 car. However, the customary definition of "effectiveness, " as discussed, for example, in Section 2.2 ("The Basic Contingency Table") is a reduction E 1 in the probability of a rear impact, not the reduction E 0 in the log-odds. The formula to derive E 1 from E 0 is:
For example, in the 1992 Florida data, E 0 = .0646, as shown in Table 2-4, and E 1 = .0626. Whenever E 0 is relatively small, E 0 and E 1 are practically identical, but as E 0 increases the shortfall of E 1 relative to E 0 gradually escalates. When E 0 = .1, which is about as high as it gets in Table 2-4, E 1 = .0952.
Section 2.1 pointed out that a vehicle's model year is occasionally miscoded in State data files. Approximately 1.1 percent of the cars coded MY 82-85 are in fact MY 86 and they are CHMSL equipped. Conversely, 1.1 percent of the cars coded MY 86-89 are in fact MY 85 and they are not CHMSL equipped. Thus, E 1 measures the reduction in rear impacts for a fleet that is 98.9 percent CHMSL-equipped relative to a fleet that is 1.1 percent CHMSL equipped. What we really want is E 2, the reduction for a 100 percent CHMSL fleet relative to a fleet with no CHMSL at all. Because 1 - E 1 = [.011 + .989 (1 - E 2)] / [.989 + .011 (1 - E 2)],
In the 1992 Florida data, E 1 = .0626 and E 2 = .0639. This is a small upward adjustment that more or less compensates for the earlier one.
Section 2.1 also pointed out that the vehicle type is occasionally miscoded. Approximately 3.7 percent of the MY 82-89 vehicles coded as "passenger cars" are in fact light trucks or other vehicles, and did not have CHMSL even if MY 86. Thus, E 2 in fact only measures the reduction in rear impacts for a fleet that is 96.3 percent CHMSL-equipped relative to a fleet with no CHMSL. What we really want is E 3, the reduction for a 100 percent CHMSL fleet relative to a fleet with no CHMSL. Because 1 - E 2 = .037 + .963 (1 - E 3),
In the 1992 Florida data, E 2 = .0639 and E 3 = .0664. This is a slightly larger upward adjustment.
Finally, Section 2.1 mentioned that approximately 10 percent of MY 82-85 cars were retrofitted with aftermarket CHMSL. Thus, E 3 in fact only measures the reduction in rear impacts for a fleet that is 100 percent CHMSL-equipped relative to a fleet that is 10 percent CHMSL-equipped. What we really want is E 4, the reduction for a 100 percent CHMSL fleet relative to a fleet with no CHMSL. Because 1 - E 3 = [1 - E 4] / [.9 + .1 (1 - E 4)],
In the 1992 Florida data, E 3 = .0664 and E 4 = .0732. This is the largest of the upward adjustments, but even this one does not result in a dramatic change from the preliminary estimate.
E 4, the "best" estimate of the reduction in rear impacts attributed by our model to CHMSL, is computed separately in every State file, in every calendar year, as shown in Table 2-6. Unlike the numbers in Table 2-5, these estimates are positive in 77 of 79 cases. The population-weighted average of the 79 estimates is 5.11 percent. Ten-year averages were calculated in each of the eight States, and they are remarkably consistent, ranging from 4.22 percent in Indiana to 6.78 percent in Utah; the averages for the other six States vary only from 4.84 (Pennsylvania) to 6.12 percent (Virginia).
A population-weighted 8-State average is computed for each calendar year and shown on the right side of Table 2-6. Effectiveness is positive in every year, but there is substantial variation, from 3.18 percent in CY 1993 to 8.53 percent in CY 1987. Figure 2-7 graphs the 8-State average effectiveness by calendar year ('s). Several phenomena are immediately visible from the graph: (1) Effectiveness is always well above zero. (2) The effectiveness is clearly higher in CY 1987 and 1988 than in 1986 or 1989-95. (3) Effectiveness is more or less the same throughout 1989-95, with no clear pattern during those years. By the naked eye alone, it is hard to discern whether effectiveness was essentially constant during 1989-95 or if there was a slight decreasing trend within those years. According to the lower section of Table 2-6, the population-weighted average effectiveness for 1989-95 was 4.33 percent, and it was positive in each individual State, ranging from 3.30 (Indiana) to 6.65 percent (Utah).
2.7 Confidence bounds and statistical tests - based on State-to-State and CY-to-CY variation of the effectiveness estimates
Even a cursory perusal of Table 2-6 suggests that CHMSL have had a significant benefit in every year since their 1986 introduction. Computation of variances enables us to provide confidence bounds for the estimates and test the extent to which effectiveness has changed over time. The computation method in this section relies directly on the 79 individual effectiveness estimates by State and CY in Table 2-6, treating them as if they were simply repeated observations of a single variate. In fact, however, these estimates are statistics derived from fairly complex estimation formulas including an adjustment for vehicle age. Additionally, the vehicle age adjustment formula is derived just once in each State, combining data from all the CY for that State; thus, the CHMSL effectiveness estimates within each State are not fully independent from one CY to another, to the extent that they employ shared data in the computation of the age adjustment. The customary critical values of t, intended for use with simple, fully independent variates, may
8 State Pop-Wtd
|1993||.0282||- .0089||- .0161||.0047||.0509||.0475||.0416||.0526||.0318||3.61*|
|10 yr. avg.||.0517||.0422||.0492||.0513||.0484||.0507||.0678||.0612||.0511|
*Statistically significant difference from zero (two-sided p < .05)
not be entirely appropriate for these statistics, and as a result the confidence bounds and levels of significance calculated in this section might understate the uncertainty in the results.
The average of the 79 effectiveness estimates, weighted by the States' 1990 populations, is .0511. A standard deviation and t-test for those 79 weighted numbers can be obtained by running the GLM procedure of SAS with no independent variables and 1990 population as the weight factor, as explained in Section 2.5. The standard deviation is .00282; the t-value is 18.08, as shown in Table 2-6. Effectiveness is significantly higher than zero (p < .05); in fact, p < .0001 - the t-value is very large compared to significance levels normally seen in NHTSA evaluations of crash data.
Moreover, as shown in Table 2-6:
The population-weighted 8-State average effectiveness is statistically significant in each individual calendar year, with t-values ranging from 3.61 (1993) to 21.64 (1987).
The average effectiveness for CY 1989-95, 4.33 percent, is statistically significant, with a t-value of 15.06
The 10-year average effectiveness is statistically significant in each of the eight individual States, with t-values ranging from 4.52 (Utah) to 10.71 (Virginia).
The 1989-95 average is positive and statistically significant in each of the eight individual States, with t-values in the other States ranging from 2.78 (Maryland) to 14.21 (Virginia).
The GLM procedure can be used to run, essentially, analyses of variance on the population-weighted effectiveness estimates to check for significant differences between States or between calendar years. A glance at Table 2-6 shows few differences between States. Analyses of variance confirm that impression. The differences between the States are nonsignificant in a univariate analysis (F = 0.42; df = 7,71; p > .05). The State effect is likewise nonsignificant in a bivariate analysis including the CY variable (F = 0.63; df = 7,62; p > .05).
Table 2-6 and Figure 2-7 show strong differences between calendar years 1987, 1988 and 1989-95, but relatively little change from 1989 to 1995. Again, analyses of variance confirm the visible differences and clarify what is going on during 1989-95.
Let us look first at the effectiveness estimates from CY 1989-95 only: 56 effectiveness estimates from eight States. We have a choice of entering CY in the GLM procedure as a categorical or a linear independent variable. With CY as a categorical variable, the analysis merely asks if the average differences between calendar years are larger than expected given the "noise" in the 56 estimates. They are not: the categorical CY effect is nonsignificant in a univariate analysis (F = 1.26; df = 6,49; p > .05) as well as in a bivariate analysis including the State variable (F = 1.20; df = 6,42; p > .05).
The "eyeball" inspection of Figure 2-7 suggested that just maybe, but not necessarily, effectiveness was declining toward the end of the 1989-95 period. With CY as a linear variable, the analysis tests whether that trend is significant. In fact, the linear CY effect falls short of significance in a univariate analysis (F = 2.82; df = 1,54; p > .05) as well as in a bivariate analysis including the State variable (F = 2.89; df = 1,47; p > .05).
With these results, we may accept the null hypothesis that CHMSL effectiveness stayed about the same throughout CY 1989-95. The conclusion is more tentative than any other in this report, partly because acceptance of the null hypothesis is by definition more tentative than rejection, partly because the linear CY effects, although short of significance, were "not that far away from it" (an F value of 4 would have been significant with two-sided = .05). It is concluded that the 1989-95 average effectiveness of 4 percent is the "long-term" value that will persist into the future, but it cannot be ruled out, based on the results of this chapter alone, that effectiveness could continue to decline quite slowly (the nonsignificant regression coefficient for CY was -.0024, i.e., a continuing decline of ¼ percent per calendar year). Additional evidence that CHMSL effectiveness is not declining comes from Chapter 4, which will show a 4.2 percent effectiveness for light truck CHMSL in CY 1994-96: actually higher than the car CHMSL effect in 1993, 1994 or 1995. The issue will be discussed in more detail in Chapter 4.
If we accept that CHMSL effectiveness was essentially unchanged throughout CY 1989-95, and that the long-term reduction of rear impacts is 4 percent, we may proceed to compare the 1989-95 effect with the results for earlier calendar years. The question, "Did CHMSL effectiveness change significantly over time?" is addressed by defining a variable CY_group with four categories: 1986, 1987, 1988 and 1989-95. When GLM analyses of variance are performed on the 79 effectiveness estimates in Table 2-6, the effect of CY_group is statistically significant in a univariate analysis (F = 12.18; df = 3,75; p < .05) as well as in a bivariate analysis including the State variable (F = 11.74; df = 3,68; p < .05). Effectiveness definitely changed over time.
In addition, we may compare the effectiveness in any two CY_groups, by doing the GLM univariate analysis of variance on the effectiveness estimates from those two groups:
In CY 1987, the average effectiveness of CHMSL was 8.53 percent and in 1989-95, 4.33 percent. Is that a significant decline? Definitely: in the analysis of the 64 data points with CY_group = 87 or 89-95, (F = 29.14; df = 1,62; p < .05).
In CY 1988, the average effectiveness of CHMSL was 7.16 percent and in 1989-95, 4.33 percent. This is also a significant decline: (F = 12.91; df = 1,62; p < .05).
The 8.53 percent effectiveness in CY 1987 is borderline-significantly higher than the 7.16 percent in CY 1988: (F = 4.43; df = 1,14; one-sided p < .05).
The 8.53 percent effectiveness in CY 1987 is significantly higher than the 5.06 percent in CY 1986: (F = 9.60; df = 1,13; p < .05).
The final step of the analysis is to develop confidence bounds (two-sided = .05) for the four principal effectiveness estimates: for CY 1986, 1987, 1988 and 1989-95. The empirical approach is to obtain them directly from the 79 individual effectiveness estimates by State and CY in Table 2-6. The GLM procedure of SAS, run with no independent variables, allows calculation of the standard deviation of the population-weighted average effectiveness in any calendar year or group of calendar years. Although there are some departures from homoscedasticity and normality in the Table 2-6 estimates (different-size States, slightly nonlinear adjustment factors) it still appears safe to use the customary critical values of the t distribution, especially if we extend to a two-sided 95 percent confidence interval rather than the one-sided 95 percent interval generally used in NHTSA evaluations. For example, in CY 1987, there are estimates from all eight States. The population-weighted average effectiveness is .0853 and its standard deviation is .00394. Since there are 7 degrees of freedom, the 95 percent confidence interval includes 2.365 standard deviations on either side of the point estimate: a range from .0760 to .0946. The point and interval estimates of CHMSL effectiveness are:
The exceptionally tight confidence bounds for the effectiveness estimates (compared to other NHTSA evaluations, where the standard deviation has typically been 15 to 30 percent as large as the point estimate) are a consequence of the vast numbers of crash records used in the analyses and the remarkable State-to-State consistency of the results. However, as stated above, there are reasons for suspecting that these confidence bounds may understate the uncertainty in the estimates. The next section develops confidence bounds based on a simpler estimation procedure; although the bounds will be wider, there will not be the same questions about their validity. Ultimately, this report will rely on the confidence bounds in the next section.