Presentation on theme: "Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman."— Presentation transcript:
Project VIABLE: Behavioral Specificity and Wording Impact on DBR Accuracy Teresa J. LeBel 1, Amy M. Briesch 1, Stephen P. Kilgus 1, T. Chris Riley-Tillman 2, Sandra M. Chafouleas 1, & Theodore J. Christ 3 University of Connecticut 1, East Carolina University 2, University of Minnesota 3 Increased attention has been directed toward the development and evaluation of formative measures of social behavior. The term Direct Behavior Rating (DBR) has been introduced in the literature to refer to a form of assessment designed to incorporate features of systematic direct observation (SDO) and behavior rating scales (Chafouleas, Christ, Riley-Tillman, Briesch, & Chanese, 2007; Chafouleas, Riley-Tillman, & Sugai, 2007). Behaviors are rated on some variation of a visual rating scale immediately following the specified observation period. Potentially, DBR may be more feasible than other behavioral assessment methods, as behaviors can typically be rated by the childs teacher in less than a minute. Considering the importance of implementing feasible systems of data collection for use within school-based problem solving models (e.g., Response to Intervention), DBR holds considerable promise. However, further research is needed to establish guidelines for recommended use, particularly with regard to the psychometric foundation related to instrumentation and procedures. The purpose of this study was to establish preliminary information regarding whether or not the wording (positive vs. negative) and specificity (global vs. specific) of target behaviors that are typically observed in classroom settings influences the data obtained through DBR. Specifically, the study examined the impact of different operational definitions for behaviors indicative of disruption, compliance, and engagement, on the accuracy of ratings. It was hypothesized that directionality would not differentially impact ratings, and that ratings of more specifically defined behaviors would be more accurate. Introduction Participants included 145 undergraduate students enrolled in an introductory psychology course at a large university located in the Southeast. In order to permit many raters to observe an identical sample of behavior, video footage of 2 nd grade students was recorded during a period of simulated classroom instruction. This ensured that specific behaviors of interest were exhibited (e.g., academic engagement, raising hand, calling out). The video footage was edited into four 3-minute clips. Participants were asked to focus on the same female target student while viewing each clip, and to rate the childs behavior immediately after each clip using a DBR form that displayed variations of the following general behaviors: academic engagement, disruptive behavior, and compliance. The behaviors were displayed globally or specifically (e.g., academically engaged vs. raising hand) and positively or negatively (e.g., compliance vs. non-compliance; raising hand vs. calling out). See Figure 1 for more detail. The outcome variable of interest was the degree of accuracy with which participants rated the target students behavior using DBR. DBR scores were compared to SDO scores from the same clips. To establish the true SDO score, two advanced graduate students coded each video clip, one behavior at a time, using the real-time event coding software, Multi Option Observation System for Experimental Studies (MOOSES; Tapp, 2004). Interrater reliability ranged from 84 to 100% for these true scores. Next, to create a difference score, the SDO estimate of behavior was subtracted from each participants rating of that same behavior using DBR. These difference scores were then averaged within each rater to generate a mean difference score for each participant. The mean difference score served as the dependent variable for the purposes of analysis. Method Summary and Conclusions Results Results were generally unexpected in that (a) wording directionality did impact ratings of some behaviors and (b) globally defined behaviors produced more accurate estimates of behavior. Findings suggest the following about DBR instrumentation: Whether the behavior should be phrased in a positive or negative manner may be dependent on the particular category of behavior. Accurate estimates of Academic Engagement may be best achieved when presented with a global, positively-worded definition. A global definition of Disruptive Behavior also appeared to increase likelihood of rating accuracy, but wording the behavior positively or negatively had no significant effect. Although DBR data of Academic Engagement and Disruptive Behavior were somewhat consistent with true score ratings, ratings of Compliance were generally inaccurate. These findings may be welcomed in that global behaviors may be more usable across a variety of situations and settings. Overall, however, results call into question traditional guidelines for defining a behavior in that a single recommendation may not result in the most accurate estimate for all target behaviors when using DBR. These preliminary results suggest that each behavior and associated definition be carefully examined prior to use in DBR. Data Analysis. Although moderate correlations were observed between the outcome variables of academic engagement and disruptive behavior, the fundamental assumptions regarding multivariate normality were not met. Therefore, two separate 2 x 2 between-subjects ANOVAs were conducted in order to analyze the impact of wording and level of specificity on ratings of each targeted behavior. Academic Engagement Results indicate that for academic engagement, the effect of wording on rating accuracy was significantly dependent on the level of specificity used [Wording x Specificity interaction, F(1, 145) = 14.85, p <.001, η² =.09]. Non-significant main effects were identified for both wording and specificity. When presented with a global indicator of a positively-worded definition of behavior indicative of academic engagement, participants were better able to approximate the true score. However, accuracy significantly diminished when participants were asked to rate specific indicators of the same positively-worded behavior (i.e., raising hand). Conversely, raters demonstrated greater accuracy when rating negatively worded specific indicators (i.e., calling out) than negatively worded global indicators (i.e. academic un-engagement). However, although the mean difference between SDO scores and mean DBR scores for calling out appeared to be quite small (see Table 2), there was much variability in ratings (i.e., SD = 19.99), which may have been masked by use of the measure of central tendency. Disruptive Behavior Results indicate that for disruptive behavior, there was a significant main effect of specificity, F(1, 127) = 9.56, p =.002, η² =.07; however, non-significant results were found for the main effect of wording as well as the interaction between wording and level of specificity. There was significantly less variance between participant scores and true scores when rating global indicators of disruptive behavior (i.e., disruptive behavior; well-behaved). However, accuracy of ratings was not differentially impacted by the positive or negative wording of the behavior. Descriptives. When comparing the overall mean DBR score across raters to the true score, no mean DBR scores were exactly the same as the pre-established direct observation true scores, with a substantive range in differences across conditions and behaviors (see Table 1). In particular, ratings of the compliance behavior showed generally larger differences for all conditions than either academic engagement or disruptive behavior. Preparation of this poster was supported by a grant from the Institute for Education Sciences (IES), U.S. Department of Education (R324B060014). For additional information, please direct all correspondence to Sandra Chafouleas at email@example.com Given that general classroom behavior tends not to be highly variable, it was expected that initial data screening revealed distributions of raw scores that exhibited significant levels of both kurtosis and skewness. Therefore, in order to meet the assumption of normality necessary to conduct analyses, a logarithmic transformation was applied to each of the outcome variables (Tabachnick & Fidell, 2001). The compliance behavior was dropped from further statistical analysis as multiple transformations did not result in these data meeting the assumptions of normality. Means and standard deviations for the mean difference scores are presented in Table 2. Higher values indicate a larger discrepancy between participant ratings and true scores, whereas smaller values indicate participant ratings that are more consistent with true score estimates. Table 1. Overall Mean DBR Rating and True Score for Each Behavior Figure 1. Target Behaviors by Condition Table 2. Mean Difference Scores and Standard Deviations by Condition and Behavior
Your consent to our cookies if you continue to use this website.