## Interval-By-Interval Interobserver Agreement

Agreement analysis via frequency counts and event records is common to all event-based IOA algorithms. These measurements consist of (a) total counting, (b) partial agreement in intervals, (c) exact match, and (d) test-by-test IOA algorithms. After a brief overview of each event-based algorithm, Table 1 summarizes the strengths of each of the four event-based algorithms for reliability behavioral analysis considerations. Suppose a research team collects frequency data for a target response on 15 observations of 1 m (see Figure 1). The idea that practicing behavioral analysts should collect and report reliability or interobserver agreement (IOA) in behavioral assessments is evident in the Behavior Analyst Certification Board`s (BACB) assertion that behavioral analysts are competent in applying “various methods of evaluating the results of measurement procedures such as inter-observer agreement, accuracy, and reliability” (BACB, 2005). In addition, Vollmer, Sloman and St. Peter Pipkin (2008) argue that the exclusion of such data significantly limits any interpretation of the effectiveness of a behaviour change process. Therefore, a prerequisite for claims of validity in any study involving behavioural assessment should be the inclusion of reliability data (Friman, 2009). Given these considerations, it is not surprising that a recent review of articles in the journal Applied Behavior Analysis (JABA) from 1995 to 2005 (Mudford, Taylor & Martin, 2009) found that 100% of articles that continuously reported dependent variables included IOA calculations. These data, along with previously published reports on reliability practices in JABA (Kelly, 1977), suggest that the inclusion of the IOA is in fact a trademark – if not a standard – of behavioural assessment. IOA in the point range. One approach to improving the accuracy of the agreement of two observers in recording intervals is simply to limit the analysis of correspondences to cases where at least one of the observers recorded a target response at an interval.

Intervals during which none of the observers reported a target response are excluded from the calculation in order to provide stricter compliance statistics. Cooper et al. (2007) suggest that the assessed interval of the IOA (also referred to as the “occurrence agreement” in the research literature) is more advantageous when target responses occur at low rates. In the sample data in Figure 2, the second, third and fourth intervals are ignored for computational purposes because none of the observed individuals received a response at these intervals. Therefore, the statistics of the IOA are only calculated from the first, fifth, sixth and seventh intervals. As there was only one agreement on half of the intervals (the fifth and sixth intervals), the match score is 50% (2/4). Average duration per IOA occurrence. If the number of times is high, it is important to limit the aggregation of data to detect possible discrepancies in the continuous data of two observers. The IOA algorithm of average duration per event achieves this by determining an IOA score for each timing and then dividing it by the total number of timings in which the two observers collected data. Note that this approach is similar to the partial agreement approach in the interval described above.

In the example in Figure 3, the intervals 1 to 4 were 99.7, 2.3, 69.2, and 92.7%, respectively. The average of these four levels of compliance translates into an average duration score per event of 66% – a much more conservative estimate than that of the IOA statistics with total duration. IOA without interval. The unlabeled interval IOA algorithm (also called the “non-occurrence” agreement in the research literature) is also more rigorous than simple interval-by-interval approaches by considering only those intervals in which at least one observer records the absence of a target response. The rationale for the unrated interval IOA is similar to that of the IOA with the interval noted, except that this measure is best suited to high response rates (Cooper et al., 2007). In the sample data in Figure 2, the fifth and sixth intervals are ignored for computational purposes because both observers received a response at these intervals. Thus, the statistics of the AIO are calculated from the remaining five intervals. Since there was only one agreement on three of the five intervals (the second, third, and fourth intervals), the match value is 60%. This technical report provides detailed information about the reasons for using a common computer spreadsheet (Microsoft Excel®) to calculate various forms of interobserver agreements for continuous and discontinuous records. In addition, we offer a short tutorial on how to use an Excel spreadsheet to automatically calculate the traditional total, partial chord in intervals, exact chord, try per attempt, interval for interval, evaluated interval, unassessed interval, total duration, and average duration per interval of the interobserver agreement….