RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was.

RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was failure to discriminate statistically between one or more grades. Eleven stations had one failure of discrimination; Six stations had two statistically not significant differences between the three groups (see Figure 2); three stations had no fail grades allocated by the examiner although some would be failed by the algorithm. In 2008, fifty single OSCE stations were completed by 255 students; of these, sixteen were repeated stations from 2007. Twenty of fifty stations had some statistically-evident failure of discrimination: three were ‘no fail’ stations according to examiner grading (Figure 2); thirteen had single non-discriminatory overlaps; two stations had two statistically indistinguishable comparisons (Figure 3); two stations had no detectable differences between all three groups (see Figure 4). Determination of ability of station to discriminate Stations with no significant differences between the Pass, Borderline and Fail were compared to those of stations which had shown clear distinctions between the groups by pooling the two years’ results. Problem stations had a poorer correlations (Tau b) median difference = -0.16598; CI = -0.20175 to -0.12899; p<0.0001. The number of problem stations increased with higher overall marks (p=0.0138). See Figures 2 and 3. Logistic regression analysis Likelihood of problem (Logit Y)=1.04 +nPasses^2 – SQRTnBorderline-1.16SQRTnFail. Sensitivity 69%; Specificity 83%; Area under Curve 83% which suggests an immediate method of eliminating a problematic station. Comparison of repeat stations All performance variables were compared for each station and its repeat between each year. Total scores were compared in Table 3 and showed increased scores in seven, decreased scores in four and no change in four; overall median for 2007 was 15 [14-17] vs. 16 [12-17] for 2008 (p = 0.21; Wilcoxon). Further examination of the results for each matching station showed no significant differences in pass, borderline or fail scores. One station, which had been non-discriminatory in 2007, repeated its performance in 2008. Six, which had shown lack of discrimination in 2007, did not show this problem in 2008. In contrast, four apparently adequate stations became defective. Figure 1 OSCE station (D9_07) Kruskal-Wallis ANOVAR (Groups = 3, df = 2, Total observations = 243). Adjusted for ties: T = 157.216415 p < 0.0001 All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner) Pass vs. Failq = -13.627489 > 3.314493p < 0.0001 Pass vs. Borderlineq = -13.31476 > 3.314493p < 0.0001 Borderline vs. Failq = -9.667414 > 3.314493p < 0.0001 Figure 2 Thick line indicates pass mark for Station B9_08. Difference between pass and borderline medians: p<0.0001 Figure 4 Figure legend: Kruskal-Wallis ANOVAR (Groups = 3,,df =2, total observations = 241). Adjusted for ties: T = 0.313181 p = 0.8551 All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner). Pass vs. Borderlineq= 0.570995 > 3.314493)p = 0.9141 Pass vs. Fail q= -0.300373 > 3.314493)p = 0.9754 Borderline vs. Fail q= -0.827908 > 3.314493)p = 0.8279 CONCLUSIONS Non-discriminatory OSCE stations are a major problem and need to be detected before marks are issued. Cut-off scores can disguise the absence of clear differences in marks and may spuriously improve the pass rate of poor candidates. As high- scoring stations are associated with problems, stations, where all candidates are expected to pass by showing an absolute level of competence, may not be suitable for this assessment method. Problems with the ‘Borderline Method’ and OSCEs Philip R Belcher Faculty of Medicine Quality Assurance Officer BACKGROUND This medical school has run fifty OSCE stations for the last two years’ final qualifying examination. The borderline method for determination of the pass mark for OSCEs produces a three-point ordinal scale (pass, borderline or fail), determined by the examiners on the spot which relies on the discriminatory ability of the questions posed by the OSCE station. The computer-generated pass mark for the station is just above the mean of the borderline result (Figure 1); therefore, even if graded as a failure, were a student’s score to exceed this cut-off, the station would have been passed. Further difficulties arise when the examiners have graded no one as a failure or when the fail and borderline groups are only just statistically distinguishable with considerable overlap: thus poor candidates may be rescued and better ones sacrificed. It is therefore not immediately apparent whether the borderline method adds certainty or uncertainty when determining a cut-off score, as OSCE examiners vary. Some stations generated no failures or an excess of borderline results. It is crucial that the spread of marks allows discrimination and that candidates, in whom the examiners lack confidence, are not advantaged spuriously. Where the stations were repeated for a second year their performance was contrasted. We therefore set out to determine the features of the station that might cause difficulty and whether we could pick this up during the examination. METHODS In 2007 all candidates passed through 46 five-minute stations; two were ten-minute stations and were excluded. In 2008, all OSCE stations were single. All mark sheets had standardised instructions and a maximum score of 20; computer-generated scores for each station and candidate were recorded. The data from each OSCE station were graded pass, borderline and fail (as determined by the OSCE examiners). The pass point for the station was determined at the limit: Mean Borderline score + 1SEM An example is shown in Figure 1. Data Handling and Statistical Methods As these were count data, summary statistics are presented as medians [interquartile range]. Pass-, borderline- and fail-group data were examined using point triserial regression using Kendall’s Tau b which dealt with any non-linear relations. Unpaired and paired comparisons were made respectively using Mann-Whitney or Wilcoxon tests and Kruskal-Wallis one-way ANOVAR with multiple comparisons, corrected for ties, by the Dwass-Steel-Chritchlow- Fligner method which generates the Studentized range statistic q (StatsDirect Ltd, Altrincham, WA14 4QA, UK). Attempts were made to associate the derived parameters with problems that were perceived with the OSCE stations. The influences of the measured and derived variables upon non- discriminatory stations were also assessed by logistic regression and ROC curve analysis. Figure 3 Kruskal-Wallis NOVAR: Groups = 3, df = 2, total observations = 253 Adjusted for ties: T = 80.793955, p < 0.0001 all pairwise comparisons (Dwass-Steel-Chritchlow-Fligner) Pass vs. Failq = -11.703169 > 3.314493 p < 0.0001 Pass vs. Borderlineq = -5.985335 > 3.314493 p < 0.0001 Borderline vs. Failq = -1.432188 > 3.314493 p = 0.5688

RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was.

Similar presentations

Presentation on theme: "RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was.

Similar presentations

Presentation on theme: "RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was."— Presentation transcript:

Similar presentations

About project

Feedback