Statistical Model Selection – Poisson Regression
The use of Poisson regression to analyze changes in violations as a function of site and time was based on both structural and distributional considerations. The violation data can be viewed as an array in which the counts are cross-tabulated by site and wave.
Since the first wave represents the pre-intervention period, the wave dimension of the array can be collapsed into a binary pre vs. post factor. A variety of log linear models are available for analyzing these types of data but when the cell entries are simple counts and the probability of a violation at any given time and place is low, the probability distribution of the data can usually be closely approximated by the Poisson. Using a Poisson regression log link procedure described in Agresti (2002) and SPSS, the magnitude and statistical significance of the association between period (e.g., pre vs. post wave), site (intervention v. comparison) and their interaction (e.g., site x period) were computed. The resultant parameter estimates are in the form of regression coefficients and odds ratios for the main and interaction effects of site, period and site x period interaction.
The approach also utilizes an “offset” procedure to adjust for the effects of exposure. In the present analysis, this resulted in weighting the violation counts by hours of exposure, thereby transforming the raw counts into rates. Without this adjustment, any differences in the cell counts could be attributed to difference in number of observational hours.
The variance component of central interest in the analysis is the site x period interaction. This interaction tests whether the differences between the intervention and comparison sites are different than would be expected based on the differences in the pre-period.
Statistical Model Selection – Repeated Measures Analysis
These analyses were designed to evaluate qualitative differences in mean scores on several behavioral dimensions measured prior and following the interventions. The factors were period (pre vs. post) and the background of the rater (rater group). The analyses involved three sources of variance: main effect of period, main effect of rater type and rater x period interaction. Since the ratings in the pre and post period involved the same raters, the analysis was formulated as a mixed model involving a within subjects repeated measures factor (the individual raters) and a between groups factor (rater background). By comparing pre vs. post scores within raters, the design controls for differences in rater scoring tendencies and rater bias. At the same time, the inclusion of a rating group factor permitted an evaluation of rater background and rater x period interaction.
The model is considered mixed since the error terms for the variance components are different. The period and period x group interaction components are within subject factors whereas group is a between-subject factor. The precision (mean square error) for measuring the within subjects factor is much higher than the between subjects factor as evidenced by the difference in their mean square errors.
Repeated measures and multivariate analysis of variance design make certain assumptions (sphericity) regarding the distribution of the variance-covariance matrices. There was no evidence that the sphericity assumptions were violated as evidenced by the Greehouse-Geisser and Huynh-Feld tests.