The final step in the process of determining changes in the behaviors of drivers after the implementation of the TACT program involved rating the “goodness/badness” of behaviors observed in the videos. As discussed earlier, analyses of the rates of violations showed that a reduction in the number of violations occurred at intervention sites. It was also of interest to determine if the nature of the violations observed after the TACT enforcement and messages (residual violations) had changed at the intervention sites. The characteristics of the residual violations were important to shed light on the reductions obtained. For example, the lowered numbers of violations could have been a result of the elimination of the least egregious behaviors or a general reduction of all violations regardless of their seriousness. Also, it is possible that people in the TACT areas responded in the desired direction but did not change sufficiently to avoid a violation altogether. These questions were addressed by having raters review violation sequences on video to assess their characteristics.
Violation Rating Data Collection. The same video data used to assess the rates of violations were used to determine if any changes occurred in the nature of violations before and after the TACT campaign. Ninety-nine video segments containing a driver violation were randomly selected out of the pool of all possible violations in which a semi truck and another vehicle interacted.6 Violations from which the random sample of 99 was drawn included those where the driver cut off a semi truck, followed too closely, failed to signal, improperly changed lanes, drove negligently7 or drove recklessly.
The interest was in changes in the nature of violations at the intervention sites only because the DOL survey showed little or no penetration of the TACT intervention into the comparison sites, and there was no change in their violation rates. Thus, any observed change in the nature of their violations could not have been associated with the TACT activities.
The total of 99 video segments was composed of 50 segments of video containing a violation in which a semi truck and another vehicle interacted that were randomly selected from the post period at the intervention sites. In addition, 49 segments of video containing a violation in which a semi truck and another vehicle interacted were taken from the pre period. Since there were not a sufficient number of non-speeding violations at the intervention sites in the pre period to provide the 49 video segments, the sample was composed of violation data from both the intervention and comparison sites. It was initially thought that it was reasonable to combine violations from the intervention and comparison sites for the pre period since neither could have been influenced by the TACT program. This assumption proved invalid when analyses showed that pre-TACT ratings of violations at the intervention sites were quite different from ratings of violations at the comparison sites (see results below for a discussion of this point).
In each video segment, a single interacting vehicle was designated by an arrow superimposed on the video presentation. Raters were instructed to rate each segment with respect to both the behaviors of the driver of the vehicle designated by the arrow and those of the semi truck drivers. The video segments were rated on the crash risk, intent, legality, intimidation, and aggressiveness of the driver of the interacting vehicle using four 5-point scales (see Appendix D for the rating instructions and rating form). Participants also answered a summary question characterizing the designated vehicle driver’s behavior as being not a problem, a lapse, an error, or an intentional violation. Participants also indicated whether or not a police officer should stop the driver of the designated vehicle. The truck driver’s behaviors were only rated on aggressiveness, the same summary question and whether or not an officer should stop the semi truck.
Three groups of raters provided the data—six WSP officers, five semi truck drivers, and six members of the WTSC staff. Raters were given a three-ring binder that contained a DVD, instructions, and 99 rating forms—one for each violation. The order of the violation scenes was randomized between pre and post periods, and the timing of the violation was unknown to the raters. They independently rated all 99 segments.
Video Rating Results. The rating data were analyzed with respect to reliability and differences between the pre and post intervention periods. Reliability is discussed first because it is a prerequisite to reaching valid conclusions about any changes in violation characteristics across periods.
Reliability analyses demonstrated that all raters, and groups of raters, were using the scales similarly. Reliability analyses also showed that items were highly inter-correlated for the designated vehicle and semi truck ratings, respectively. Items regarding the designated vehicle’s behavior were initially created with the intent to measure different dimensions of behavior. However, factor analysis8 revealed that all of the items for the driver of the designated vehicle were actually measuring one dimension. This dimension appears to be related to the overall “goodness” or “desirability” of the behavior. As expected a separate dimension was found for the three items relating to behaviors of the semi truck drivers.
Rating data were screened further to determine if the assumption that the video segments in the pre period from the intervention and comparison sites were rated comparably and could therefore be combined in the analyses to determine pre versus post effects. Results indicated that the ratings of the pre period video segments for the intervention and comparison sites were significantly different on all of the items. For example, the mean rating of crash risk during the pre period for the intervention sites was significantly higher than the mean rating of crash risk for the comparison sites (t = 8.771, p < 0.001). Crash risk was calculated from the opinions of all of the scorers of a particular truck and passenger vehicle interaction, with 1 being low risk of a crash resulting from the maneuver to 5 being a high risk of a crash. Mean ratings followed a similar pattern for all of the items, with violations at the intervention sites being rated “worse,” suggesting that there were systematic differences between the intervention and comparison site violations observed on the videos for the pre period. Due to these findings, all of the remaining analyses included only ratings of the intervention site video segments for both the pre and post periods. The net effect of the elimination of the video segments from the comparison sites in the pre period was a reduction in the sample size of violations entering the analysis. This in turn meant that a larger pre-to-post difference was needed for any effect to be deemed statistically significant.
After the screening process, the next step was to determine if any differences in ratings from the pre to post periods for the intervention site video segments occurred. Any differences in ratings would indicate a change in the nature of the violations that were occurring. As discussed earlier, if a difference were found, the nature of the residual violations would provide further information on the effects of the TACT program.
A Repeated Measures ANOVA was conducted for each survey item separately to determine if pre/post or group effects occurred in the ratings.9 Ratings of the designated vehicle drivers’ behaviors indicated significant improvements between the pre and post periods as follows:
In addition, the summary rating question indicated that behaviors were “better” and less likely to be a deliberate violation or major error in the post period (F[1, 14] = 8.970, p = 0.01). The question relating to whether the police should stop the driver also showed a positive effect indicating that the raters thought it was significantly less necessary for an officer to stop the designated driver in the post period (F[1, 14] = 24.570, p < 0.001).
No significant pre/post effect was found for ratings of aggression of the driver of the designated vehicle or any of the three ratings of the semi truck drivers’ behaviors. No effects were expected for the ratings of the semi truck drivers’ behaviors since the video sequences were selected to demonstrate violations by vehicles interacting with the semi trucks.
Some significant between subjects effects were found as a function of rater group. Truckers rated all video segments as significantly more intimidating and aggressive than the WSP troopers and WTSC staff. Also, the WSP troopers identified significantly more drivers of the designated vehicles as needing to be stopped by a police officer than the WTSC staff. None of these findings are surprising since truckers are more likely to be sensitive to driving behaviors that are intimidating and aggressive around semi trucks, and the patrol officers are more likely to be sensitive to which vehicles a police officer should stop.
Summary of Video Rating Results. Overall, the video rating task was successful in identifying differences in behaviors that were likely due to the TACT media and enforcement campaigns. Results indicated that violations were “not as bad” in the post period as they were in the pre period, suggesting another way in which the TACT intervention was successful. The combination of fewer violations and less severe residual violations indicates that the public was getting and acting on the messages the TACT program was publicizing. If violation rates are reduced and the residual violations are less severe, as indicated by the results, an additional safety benefit should be realized.
8 Factor analysis is a statistical technique that examines the extent to which a group of questions or scales actually consists of “clusters” or “factors” measuring the same or similar things rather than as the set of discrete items scored by the raters.
9 Repeated Measures ANOVA was used because each rater rated all of the video segments. Repeated Measures ANOVA considers any differences between the mean ratings of the pre and post period video segments for each rater and takes into consideration that these ratings came from the same individual. Essentially, each rater acts as his/her own comparison, and effects can therefore be attributed to the time period when the video segments were recorded without being confounded by variability among the raters. Between groups and interaction effects are also obtained; however, these effects are not of particular interest in the present study.