RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

A PowerPoint®-based guide to assist in choosing the suitable statistical test. NOTE: This presentation has the main purpose to assist researchers and students.
Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit
CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Kruskal Wallis and the Friedman Test.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Nonparametric Methods
Departments of Medicine and Biostatistics
Statistics 07 Nonparametric Hypothesis Testing. Parametric testing such as Z test, t test and F test is suitable for the test of range variables or ratio.
Standard setting Determining the pass mark - OSCEs.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
The Research Skills exam: The four horsemen of the apocalypse: pestilence, war, famine and the RS1 exam.
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
1 Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D.
Overview of Major Statistical Tools UAPP 702 Research Methods for Urban & Public Policy Based on notes by Steven W. Peuquet, Ph.D. 1.
Jaw Pain: Characteristics and Prevalence in Fibromyalgia and other Rheumatic Disorders Robert S. Katz 1, Frederick Wolfe 2. 1 Rush University Med Center,
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Parametric & Non-parametric Parametric Non-Parametric  A parameter to compare Mean, S.D.  Normal Distribution & Homogeneity  No parameter is compared.
T-TEST Statistics The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Lecture on Correlation and Regression Analyses. REVIEW - Variable A variable is a characteristic that changes or varies over time or different individuals.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
CHAPTER 20 Representing Quantitative Data. Why ‘re’present your numbers? Few people can extract meaning from arrays of numbers. Summarising them – whether.
Stats 2022n Non-Parametric Approaches to Data Chp 15.5 & Appendix E.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Self-assessment Accuracy: the influence of gender and year in medical school self assessment Elhadi H. Aburawi, Sami Shaban, Margaret El Zubeir, Khalifa.
1 Nonparametric Statistical Techniques Chapter 17.
Non – Parametric Test Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore, India.
1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application.
Nonparametric Statistics
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Example x y We wish to check for a non zero correlation.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Chapter 21prepared by Elizabeth Bauer, Ph.D. 1 Ranking Data –Sometimes your data is ordinal level –We can put people in order and assign them ranks Common.
 Are two random variables related to each other ?  What does it mean if the data are independent?  What is meant by the term covariance?  What does.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
MRCGP The Clinical Skills Assessment January 2013.
 Kolmogor-Smirnov test  Mann-Whitney U test  Wilcoxon test  Kruskal-Wallis  Friedman test  Cochran Q test.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
Chapter 17 Basic Multivariate Techniques Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
GENERAL INTRODUCTION TO WRITING AN ECPE PROFICIENCY LEVEL ESSAY When writing a proficiency level composition, the candidate always has to have in mind.
1 Nonparametric Statistical Techniques Chapter 18.
Inferential Statistics Assoc. Prof. Dr. Şehnaz Şahinkarakaş.
Does the brain compute confidence estimates about decisions?
Norm Referenced Your score can be compared with others 75 th Percentile Normed.
Stats Methods at IC Lecture 3: Regression.
Statistical Data Analysis
Inferential Statistics
Nonparametric tests, χ², logistic analysis
Types of tests Risk Assessment Procedures – Auditors use the results of risk assessment procedures to determine the type and amount of further audit.
Non-parametric test ordinal data
MRCGP The Clinical Skills Assessment January 2013.
Empathy in Medical Care Jessica Ogle (D
Bowden, Shores, & Mathias (2006): Failure to Replicate or Just Failure to Notice. Does Effort Still Account for More Variance in Neuropsychological Test.
Statistics.
Analysis of Data Graphics Quantitative data
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Medical Statistics Dr. Gholamreza Khalili
Practice Mid-Term Exam
قياس المتغيرات في المنهج الكمي
Differentiated Tiered Grading (DTG)
Non – Parametric Test Dr. Anshul Singh Thapa.
Einat S. Peled, Zachary L. Newman, Ehud Y. Isacoff  Current Biology 
The Research Skills exam:
InferentIal StatIstIcs
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
  Using the RUMM2030 outputs as feedback on learner performance in Communication in English for Adult learners Nthabeleng Lepota 13th SAAEA Conference.
Visual Crowding Is Correlated with Awareness
Presentation transcript:

RESULTS In 2007, there were 243 candidates. A clearly discriminating station is shown in Figure 1. ANOVAR revealed eighteen stations in which there was failure to discriminate statistically between one or more grades. Eleven stations had one failure of discrimination; Six stations had two statistically not significant differences between the three groups (see Figure 2); three stations had no fail grades allocated by the examiner although some would be failed by the algorithm. In 2008, fifty single OSCE stations were completed by 255 students; of these, sixteen were repeated stations from Twenty of fifty stations had some statistically-evident failure of discrimination: three were ‘no fail’ stations according to examiner grading (Figure 2); thirteen had single non-discriminatory overlaps; two stations had two statistically indistinguishable comparisons (Figure 3); two stations had no detectable differences between all three groups (see Figure 4). Determination of ability of station to discriminate Stations with no significant differences between the Pass, Borderline and Fail were compared to those of stations which had shown clear distinctions between the groups by pooling the two years’ results. Problem stations had a poorer correlations (Tau b) median difference = ; CI = to ; p< The number of problem stations increased with higher overall marks (p=0.0138). See Figures 2 and 3. Logistic regression analysis Likelihood of problem (Logit Y)=1.04 +nPasses^2 – SQRTnBorderline-1.16SQRTnFail. Sensitivity 69%; Specificity 83%; Area under Curve 83% which suggests an immediate method of eliminating a problematic station. Comparison of repeat stations All performance variables were compared for each station and its repeat between each year. Total scores were compared in Table 3 and showed increased scores in seven, decreased scores in four and no change in four; overall median for 2007 was 15 [14-17] vs. 16 [12-17] for 2008 (p = 0.21; Wilcoxon). Further examination of the results for each matching station showed no significant differences in pass, borderline or fail scores. One station, which had been non-discriminatory in 2007, repeated its performance in Six, which had shown lack of discrimination in 2007, did not show this problem in In contrast, four apparently adequate stations became defective. Figure 1 OSCE station (D9_07) Kruskal-Wallis ANOVAR (Groups = 3, df = 2, Total observations = 243). Adjusted for ties: T = p < All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner) Pass vs. Failq = > p < Pass vs. Borderlineq = > p < Borderline vs. Failq = > p < Figure 2 Thick line indicates pass mark for Station B9_08. Difference between pass and borderline medians: p< Figure 4 Figure legend: Kruskal-Wallis ANOVAR (Groups = 3,,df =2, total observations = 241). Adjusted for ties: T = p = All pairwise comparisons (Dwass-Steel-Chritchlow-Fligner). Pass vs. Borderlineq= > )p = Pass vs. Fail q= > )p = Borderline vs. Fail q= > )p = CONCLUSIONS Non-discriminatory OSCE stations are a major problem and need to be detected before marks are issued. Cut-off scores can disguise the absence of clear differences in marks and may spuriously improve the pass rate of poor candidates. As high- scoring stations are associated with problems, stations, where all candidates are expected to pass by showing an absolute level of competence, may not be suitable for this assessment method. Problems with the ‘Borderline Method’ and OSCEs Philip R Belcher Faculty of Medicine Quality Assurance Officer BACKGROUND This medical school has run fifty OSCE stations for the last two years’ final qualifying examination. The borderline method for determination of the pass mark for OSCEs produces a three-point ordinal scale (pass, borderline or fail), determined by the examiners on the spot which relies on the discriminatory ability of the questions posed by the OSCE station. The computer-generated pass mark for the station is just above the mean of the borderline result (Figure 1); therefore, even if graded as a failure, were a student’s score to exceed this cut-off, the station would have been passed. Further difficulties arise when the examiners have graded no one as a failure or when the fail and borderline groups are only just statistically distinguishable with considerable overlap: thus poor candidates may be rescued and better ones sacrificed. It is therefore not immediately apparent whether the borderline method adds certainty or uncertainty when determining a cut-off score, as OSCE examiners vary. Some stations generated no failures or an excess of borderline results. It is crucial that the spread of marks allows discrimination and that candidates, in whom the examiners lack confidence, are not advantaged spuriously. Where the stations were repeated for a second year their performance was contrasted. We therefore set out to determine the features of the station that might cause difficulty and whether we could pick this up during the examination. METHODS In 2007 all candidates passed through 46 five-minute stations; two were ten-minute stations and were excluded. In 2008, all OSCE stations were single. All mark sheets had standardised instructions and a maximum score of 20; computer-generated scores for each station and candidate were recorded. The data from each OSCE station were graded pass, borderline and fail (as determined by the OSCE examiners). The pass point for the station was determined at the limit: Mean Borderline score + 1SEM An example is shown in Figure 1. Data Handling and Statistical Methods As these were count data, summary statistics are presented as medians [interquartile range]. Pass-, borderline- and fail-group data were examined using point triserial regression using Kendall’s Tau b which dealt with any non-linear relations. Unpaired and paired comparisons were made respectively using Mann-Whitney or Wilcoxon tests and Kruskal-Wallis one-way ANOVAR with multiple comparisons, corrected for ties, by the Dwass-Steel-Chritchlow- Fligner method which generates the Studentized range statistic q (StatsDirect Ltd, Altrincham, WA14 4QA, UK). Attempts were made to associate the derived parameters with problems that were perceived with the OSCE stations. The influences of the measured and derived variables upon non- discriminatory stations were also assessed by logistic regression and ROC curve analysis. Figure 3 Kruskal-Wallis NOVAR: Groups = 3, df = 2, total observations = 253 Adjusted for ties: T = , p < all pairwise comparisons (Dwass-Steel-Chritchlow-Fligner) Pass vs. Failq = > p < Pass vs. Borderlineq = > p < Borderline vs. Failq = > p =