Presentation is loading. Please wait.

Presentation is loading. Please wait.

6/25/2016TEST CONSTRUCTION Workshop1 IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP J.KOOHPAYEHZADEH M.D, MPH Education development center Iran University.

Similar presentations


Presentation on theme: "6/25/2016TEST CONSTRUCTION Workshop1 IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP J.KOOHPAYEHZADEH M.D, MPH Education development center Iran University."— Presentation transcript:

1 6/25/2016TEST CONSTRUCTION Workshop1 IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP J.KOOHPAYEHZADEH M.D, MPH Education development center Iran University of Medical Sciences

2 6/25/2016TEST CONSTRUCTION Workshop2 “ Tell me, I forget. Ask me, I remember. Involve me, I understand. ”

3 6/25/2016TEST CONSTRUCTION Workshop3 Why Test? Testing is 50% of Teaching

4 6/25/2016TEST CONSTRUCTION Workshop4 Well defined educational objectives prerequsite for assessment Example for this session: At the end of this session participants will be able: To named at list three differences between summative and formative assessment To make a list of at least three written AM To name the most effective AM to assess clinical skills To describe the most effective AM to assess attitudes

5 6/25/2016TEST CONSTRUCTION Workshop5 چرخه آموزش آموزش تدوين اهداف طراحی نظام ارزشيابی شيوه های تدريس راهبرد های اجرای برنامه ارزشيابی

6 6/25/2016TEST CONSTRUCTION Workshop6 « طرح درس » عنوان درس : رده دانشجويي : تعداد واحد : پيشنیاز : هدف كلي آموزشي : رفتارهاي ويژه عينيحيطه ‌ هاي يادگيريرئوس مطالب روش تدريس شيوه ارزيابي شناختيمهارتينگرشي

7 6/25/2016TEST CONSTRUCTION Workshop7 Evaluating Students: Tests ARE Not the Only Way! Tests Projects Performance Participation

8 6/25/2016TEST CONSTRUCTION Workshop8 اندازه ‌ گيري Measurement فرآيندي كه تعيين مي ‌ كند يك شخص يا شي چه مقدار از يك ويژگي را دارا مي ‌ باشد. فرآيند منظم براي كمي كردن متغيير پيوسته

9 6/25/2016TEST CONSTRUCTION Workshop9 سنجش Assessment جمع ‌ آوري اطلاعات براي رسيدن به يك تخمين يا برآورد

10 6/25/2016TEST CONSTRUCTION Workshop10 تعيين ارزش براي هر چيز يا داوري ارزشي داوري ارزشي در مورد مطلوب بودن يا مطلوب نبودن ويژگي يا موضوع. فرآيند نظامدار براي جمع ‌ اوري، تحليل و تفسير اطلاعات به منظور تعيين ميزان دستيابي به هدفها فرآيند جمع ‌ آوري اطلاعات و مقايسه آنها با استانداردها براي قضاوت يا رسيدن به تصميم درباره فعاليت آموزشي يا هر فرآيند ديگري است. EVALUATION ارزشيابي

11 6/25/2016TEST CONSTRUCTION Workshop11 وسيله اندازه ‌ گيري ويژگي در شخص يا شي است. آزمون Test

12 6/25/2016TEST CONSTRUCTION Workshop12 ارزيابي چرا؟ Why چگونه ؟ How چه موقع؟ When چه چيزي را؟ What

13 6/25/2016TEST CONSTRUCTION Workshop13 چه موقع ارزيابي كنيم؟ When? در پايان آموزش Summative در طول آموزش Formative قبل از آموزش Pre-test

14 6/25/2016TEST CONSTRUCTION Workshop14 چرا ارزيابي مي ‌ كنيم؟ WHY? 1. تشويق به يادگيري 2. آگاه نمودن دانشجو 3. آگاه نمودن مدرس 4. اصلاح فعاليتهاي يادگيري 5. انتخاب دانشجو 6. گواهي دادن 7. كسب آمادگي ارتقاء

15 6/25/2016TEST CONSTRUCTION Workshop15 Why Evaluate Students? To help students improve To assess student learning To determine if the teacher is teaching Motivation tool To communicate with others such as parents

16 6/25/2016TEST CONSTRUCTION Workshop16 Why Assess? To certify competence (S. A.) To assess the progress of learning To aid learning (F. A.) To diagnosis learning problems (D. A.) To assess effectiveness of faculty teaching To assess effectiveness of educational program

17 6/25/2016TEST CONSTRUCTION Workshop17 نكته مهم اگر مايليد دانشجويان كتاب درسي را بخوانند، قبلاً آنها را آگاه كنيد كه از محتواي كتاب به عنوان بخشي از امتحان پايان ترم، سؤال اجباري طرح خواهد شد. اگر مي خواهيد دانشجويان قادر باشند با دستگاهي كار كنند، به آنها بگوئيد كه از آنها در مورد اين توانايي خاص امتحان به عمل خواهد آمد.

18 6/25/2016TEST CONSTRUCTION Workshop18 نكته مهم بنابراين : آنچه كه شما مورد سنجش قرار مي ‌ دهيد، تعيين كننده مطالبي است كه دانشجويان ياد مي ‌ گيرند. اين به آن معناست كه قبل از آن كه نگران نوع ابزار سنجش مورد استفاده باشيد، بايد تصميم بگيريد كه دانشجويان چه چيزي را و چرا بايد ياد بگيرند.

19 6/25/2016TEST CONSTRUCTION Workshop19 What? دانش مهارت نگرش

20 6/25/2016TEST CONSTRUCTION Workshop20 What to Assess? Knowledge Attitude Skill

21 6/25/2016TEST CONSTRUCTION Workshop21 What to Assess? Knowledge: Relevant knowledge: Objectives according to “ need to know ” based on common clinical practice Test knowledge application & problem solving (interpretation, analysis, synthesis) not just facts

22 6/25/2016TEST CONSTRUCTION Workshop22 What to Assess? Skills: Clinical: Hx, PE, Procedural skills Communication skills Critical reasoning skills: Data interpretation and decision-making

23 6/25/2016TEST CONSTRUCTION Workshop23 Attitude and behaviors: Honest, has integrity, not rigid Responsible, punctual & regular, complete tasks Team player or leader Empathetic, patient advocate Effective communication skills Used best current evidence What to Assess?

24 6/25/2016TEST CONSTRUCTION Workshop24 Competence: Problem solving What to Assess?

25 6/25/2016TEST CONSTRUCTION Workshop25 What can be assessed by different AM? Factual knowledge Interpretations Problem-solving skills Ethical Clinical skills Emotional reactions Communication

26 6/25/2016TEST CONSTRUCTION Workshop26 Who Should Assess? Faculty Self Peers Tutors Other team members Standardized patients, patients External and internal examiners Public, society, … 360 o

27 6/25/2016TEST CONSTRUCTION Workshop27 Where? knows Knows how Shows how Does Work Place Assessment Examination Hall Test Center/Skill Lab

28 6/25/2016TEST CONSTRUCTION Workshop28 چه چيزي را ارزشيابي مي ‌ كنيم؟ WHAT? چطور ارزشيابي مي ‌ كنيم؟ HOW? چه چيز؟ چطور؟ دانش (knowledge) كتبي و شفاهي عملكرد Practice)) مشاهده با استفاده از چك ليست و Rating Scale نگرشي Attitude)) مشاهده با استفاده از چك ليست و Rating Scale

29 6/25/2016TEST CONSTRUCTION Workshop29 How to use assessment? Summative: usually undertaken at the end of a training programme and determines whether the educational objectives have been successfully achieved. With summative assessment the students usually receives a grade or a mark. Exam Formative: This is testing that is part of developmental or ongoing teaching / learning process. It should include delivery of feedback to the student.

30 6/25/2016TEST CONSTRUCTION Workshop30 Summative - Examination What the exams are? For students – A difficult and unpleasent steeplechase to run on a way for diploma For teachers – A less desirable teaching activity For public – An important protection from un-competent doctors

31 6/25/2016TEST CONSTRUCTION Workshop31 Summative assessment The reasons: A statement of achievement - university degree (diploma) An entrance requirement to an educational institution A guide as to the wisdom of continuing with further study A certification of competence – public responsibility (licence) A determinant of programme effectiveness

32 6/25/2016TEST CONSTRUCTION Workshop32 Formative assessment The reasons: Information for the student about his/her achievement of educational objectives Repetitive – progress measurement Discover “ week points ”– teacher ’ s support Help to a teacher to correct programme The results should not be used in summative assessment

33 6/25/2016TEST CONSTRUCTION Workshop33 Formative assesssment Feedback

34 6/25/2016TEST CONSTRUCTION Workshop34 1. حيطه شناختي (Knowledge) Cognitive Domain 2. حيطه نگرشي Attitude Domain 3. حيطه مهارتي Psychomotor Domain انواع حيطه ‌ هاي يادگيري

35 6/25/2016TEST CONSTRUCTION Workshop35 فرآيند يادگيري و فعاليتهاي فراگير در سطوح حيطه شناختي شامل : يادآوري و درك كاربرد و تجزيه و تحليل مقايسه و تشخيص ( حل مسئله ) فرآيند يادگيري و فعاليتهاي فراگير در سطوح حيطه نگرشي شامل : توجه به محرك و پاسخ به آن ارزش گذاري به پاسخ اعتقاد يافتن فرآيند يادگيري و فعاليتهاي فراگير در سطوح مختلف حيطه عملي شامل : مشاهده و اجراي كار با كمك هماهنگي و اجراي كار بدون كمك اجراي كار بطور خودكار ( عادت ) سطوح حيطه ‌ ها

36 6/25/2016TEST CONSTRUCTION Workshop36 General instructional objectives GIO كليه معلومات و تواناييهايي كه فراگير قبل از آغاز دوره فاقد آن بوده و انتظار ميرود در پايان آموزش به آن دست پيدا كند. خصوصيات : عبارتي كلي،غير قابل اندازه گيري در مدت زمان محدود. مثال : كسب دانش در زمينه بيماريهاي داخلي آشنايي با اصول و فنون راديولوژي و انجام كار در بخش

37 6/25/2016TEST CONSTRUCTION Workshop37 هدفهاي ويژه از تجزيه هدفهاي كلي بدست مي ‌ آيد شامل مهارتها و تواناييهايي است كه فراگيران در جريان آموزش به آن دست مي ‌ يابند. معمولاً بصورت هدفهاي رفتاري بيان مي ‌ شوند.

38 6/25/2016TEST CONSTRUCTION Workshop38 رفتارهاي ويژه عینی S.O.B Specific Observable Behaviors رفتار هاي قابل اندازه گيري و مشاهده كه فراگير در جريان آموزش كسب كرده و قادر است در طول دوره آموزش از خود بروز دهد. ويژگيهای اهداف رفتاری : محتوا شرايط معيار فعل رفتاري مثال : قد نوزاد را در حالت خوابيده و با ۱ % خطا اندازه گيري كند. محتوا شرايط ضابطه يا معيار فعل

39 6/25/2016TEST CONSTRUCTION Workshop39 معیارمعیار عملعمل اجزاي اهداف ويژه عيني در يك راديوگرافي قفسه سينه از روبرو ( شرايط ) وجود يا نبود تصاوير دانس پارانشيم ريوي ( محتوا ) با قطر بيش از 2/1 سانتيمتر را در 80% موارد تشخيص دهد

40 6/25/2016TEST CONSTRUCTION Workshop40 ABCD model A (Audience) B (Behavior) C (Condition) D (Degree) Performance Agreement

41 6/25/2016TEST CONSTRUCTION Workshop41 THANK YOU ANY QUESTIONS?

42 6/25/2016TEST CONSTRUCTION Workshop42 Stages of test development Conceptualization Construction Tryout Item analysis Revision

43 6/25/2016TEST CONSTRUCTION Workshop43 Conceptualization An idea …

44 6/25/2016TEST CONSTRUCTION Workshop44 Conceptualization What will it measure? What is the objective? Is there a need? Who will use it? Etc …

45 6/25/2016TEST CONSTRUCTION Workshop45 Test Construction Principles Adequate provision should be made for evaluating all the teacher objectives of the instruction. The test should reflect the approximate proportion of emphasis in the course.

46 6/25/2016TEST CONSTRUCTION Workshop46 Preparing the test The preliminary draft of the test should be prepared as early as possible. As a rule the test should include more than one type of item.

47 6/25/2016TEST CONSTRUCTION Workshop47 Preparing the test, continued The content of the test should range from very easy to very difficult for the group being measured. The items in the test should be arranged in order of difficulty. The items should be so phrased that the content rather than the form of the statement will determine the answer.

48 6/25/2016TEST CONSTRUCTION Workshop48 Preparing the test, continued A regular sequence in the pattern of response should be avoided. The directions to the pupils should be as clear, complete and concise as possible. One question should not provide the answer to another question.

49 6/25/2016TEST CONSTRUCTION Workshop49 Tryout Tried on similar population to that of interest Standardized conditions 5-10 people for each item on the test, but “ the more the better ” “ Good items ” Determined by item analysis

50 6/25/2016TEST CONSTRUCTION Workshop50 Item Analysis Process of determining which items are “ good ” Tools in item analysis Item difficulty index Item reliability index Item validity index Item discrimination index

51 6/25/2016TEST CONSTRUCTION Workshop51 Item Difficulty Index Underlying assumption: Every item should be failed or passed based on the testtakers level of knowledge about the material Proportion of the total number of testtakers who got the item correct P n Can calculate the average item difficulty on a test Optimal average item difficulty is the midpoint between 1.00 and chance success For true/false =.50 + 1.00 = 1.5/2 =.75 For four option multiple choice???

52 6/25/2016TEST CONSTRUCTION Workshop52 Item Reliability Index Internal consistency of a test Higher this index = greater internal consistency Use factor analysis Want to maximize internal consistency so choose those items

53 6/25/2016TEST CONSTRUCTION Workshop53 Item Validity Index Indication of the degree to which a test measures what it is supposed to measure Higher item validity index = higher criterion-related validity Want to maximize criterion-related validity so choose those items

54 6/25/2016TEST CONSTRUCTION Workshop54 Item Discrimination Index Indicates how well an item discriminates between high scores and low scorers Want high scorers to answer correctly and low scorers to answer incorrectly – otherwise throw out item d Higher value of d, the great number of high scorers answering correctly Negative d, low scorers more likely than high to answer correctly

55 6/25/2016TEST CONSTRUCTION Workshop55 Characteristics of assessment Tools

56 6/25/2016TEST CONSTRUCTION Workshop56 Reliability What is it? Given the same test on same person at the same time: Same test result Should differentiate between well and ill Importance: If the tests result changes, test is not reliable If test is unreliable, can not say whether student passes or fails

57 6/25/2016TEST CONSTRUCTION Workshop57 Reliability If an assessment is repeated with the same trainees, they should get the same results

58 6/25/2016TEST CONSTRUCTION Workshop58 Validity What is it? the degree to which a measurement instrument truly measures what it is intended to measure Importance: If the assessment test does not test what it is meant to test so the test is useless Reliability is a pre-req for validity but not sufficient by itself

59 6/25/2016TEST CONSTRUCTION Workshop59 Validity Validity is the degree to which the inferences based on scores are correct

60 6/25/2016TEST CONSTRUCTION Workshop60 Standardization What is it? All students are tested on the same test items, patients, tasks & according to the same criteria Importance: So that no one gets more easy or difficult questions (Fairness)

61 6/25/2016TEST CONSTRUCTION Workshop61 Feasibility What is it? Importance

62 6/25/2016TEST CONSTRUCTION Workshop62 Objectivity What is it? it is a level of agreement among independent assessors (experts) about the right answer to certain question Importance Decreases intra-rater and inter-rater bias

63 6/25/2016TEST CONSTRUCTION Workshop63 اعتبار Validity ميزان دقت يك وسيله اندازه ‌ گيري در اندازه ‌ گيري موضوع مورد نظر قابليت اطمينان Reliability ميزان ثبات يك وسيله اندازه ‌ گيري در اندازه ‌ گيري يك متغيير عينيت Objectivity درجه توافق بين قضاوتهاي مستقل تعدادي ممتحن خبره بر سر پاسخهاي خوب براي هر يك از اجزاي وسايل اندازه ‌ گيري عملي بودن Practicability سهولت كلي استفاده از يك آزمون هم براي سازنده آزمون و هم براي دانشجويان ويژگيهاي يك آزمون

64 6/25/2016TEST CONSTRUCTION Workshop64 رابطه ميان روايي و پايايي validity+ reliability + Validity-Reliability+ validity-Reliability-

65 6/25/2016TEST CONSTRUCTION Workshop65 عوامل مؤثر در :Validity بسيار مشكل بودن و يا بسيار ساده بودن آزمون راهنمائي هاي ناخواسته در آزمون عدم وجود رابطه بين سئوال و محتواي آموزشي نبودن وقت كافي كوتاه بودن امتحان نظم و ترتيب سؤالات ( از ساده به مشكل ) عوامل مؤثر در Reliability: توافق ممتحن در سئوالات مبهم محيط مناسب و مساعد انتخاب نمونه ‌ نا همگن ويژگيهاي يك آزمون

66 6/25/2016TEST CONSTRUCTION Workshop66 عوامل مؤثر در Objectivity: دو نمره داده شده بر يك آزمون توسط دو ممتحن يكسان باشد. نتايج ارزيابي و نمره ‌ دهي متناسب با محتواي مورد امتحان يكنواخت باشد. عوامل مؤثر در Practicability: متناسب با تعداد دانشجويان باشد. امكانات مناسب فراهم بودن وسايل و فضاي مورد نياز متناسب با شيوه ارزيابي vc ويژگيهاي يك آزمون

67 6/25/2016TEST CONSTRUCTION Workshop67 Metric characteristics of AM Validity - the degree to which a measurement instrument truly measures what it is intended to measure Reliability – it is an expression of the precision, consistency and reproducibility. Ideally, measurements should be the same when repeated by the same student or made by the different assessors. Relevance – it is a degree to which the assessment questions and educational objectives are in concordance Objectivity – it is a level of agreement among independent assessors (experts) about the right answer to certain question

68 6/25/2016TEST CONSTRUCTION Workshop68 Components of Good Test Validity Reliability Objectivity Discrimination Comprehensiveness Score-ability

69 6/25/2016TEST CONSTRUCTION Workshop69 جدول مشخصات آزمون (Table of specifications ) يك جدول دوبعدي است : 1 - بعد افقي : محتواي آموزشي مورد نظر 2 - بعد عمودي : سطوح حيطه شناختي ( دانش ، ادراك ، كاربرد، تجزيه و تحليل،..)

70 6/25/2016TEST CONSTRUCTION Workshop70 تجزيه و تحليل كاربرددركدانش م سطوح محتواي آموزشي 0 سؤال 1 سؤال 2 سؤال نارسايي قلب 1 سؤال 2 سؤالشوك 0 سؤال 1 سؤال مسموميت با ديگوكسين

71 6/25/2016TEST CONSTRUCTION Workshop71 تعداد كل سئوالها 3.2.1. بعد محتوا بعد هدف 1. 2. 3. دانش 1. 2. فهميدن تحليل تركيب ارزشيابي تعداد كل سئوالها درصد سئوالها جدول مشخصات آزمون

72 6/25/2016TEST CONSTRUCTION Workshop72 تعداد ساعتهائي كه صرف تدريس يك موضوع شده نسبت ساعتهاي تدريس براي هر موضوع ( بخش )= تعداد كل ساعتهاي تدريس يك دوره ( واحد درسي ) درصد سئوالات هر بخش = 100* نسبت ساعتهاي تدريس هر موضوع تعداد سئوالهادرصدسئوالهايساعتهاي تدريس عناوين يك دوره درسي يا 2 واحد درسي ( 36) 611% 428428 1. 2. 3. 50100%36 جمع در صد سؤالات بخش يك 11%=100* 11/0= 4 = نسبت ساعتهاي تدريس حال آنچه يك آزمون 50 سئوال از اين دوره درسي بايد تهيه شود تعداد سئوالات مربوط به بخش يك مي ‌ شود. 11 100  *50 6 36

73 6/25/2016TEST CONSTRUCTION Workshop73 Thank you for your Time Any Questions or Comments?

74 6/25/2016TEST CONSTRUCTION Workshop74 1. كتبي (Written) عینی : MCQ غیر عینی : Essay 2. شفاهي (Oral) 3. عملي (Practical) MSF OSCE DOPS Log Book Portfolio MiniCEX انواع آزمونها

75 6/25/2016TEST CONSTRUCTION Workshop75 What are assessment tools? محدود پاسخ restricted گسترده پاسخ extended تشريحي كوتاه پاسخ صحيح - غلط جور كردني چندگزينه ‌ اي باز بسته كتبي انجام تكاليف Assignments

76 6/25/2016TEST CONSTRUCTION Workshop76 Student Assessment Direct Methods: 1. Real: Group dynamics assessments, ward observations, lab observations, ward evaluation. 2. Simulated: OSCE, OSPE, GOSPE, ICE, SCOPE. Indirect methods: 1. Written tests: MCQs, SEQs, MEQs, PMPs, long essay questions, questionnaires. 2. Oral tests: unstructured and structured oral exams 3. Practical tests: PETs, portfolios

77 6/25/2016TEST CONSTRUCTION Workshop77 انواع آزمونهاي تشريحي گسترده پاسخ Extended response سطح تركيب و ارزشيابي محدود پاسخ Restricted response سطوح فهميدن، كاربستن و تحليل

78 6/25/2016TEST CONSTRUCTION Workshop78 انواع آزمونهاي كوتاه پاسخ براي سطوح پايين حيطه شناختي ( حداكثر تا مرحله به كار بستن ) پرسشي كامل كردني تشخيصي ( تداعي )

79 6/25/2016TEST CONSTRUCTION Workshop79 انواع آزمونهاي عيني (objective) صحيح - غلط True- False جور كردني matching ‌ چند گزينه ‌ اي Multiple- choice

80 6/25/2016TEST CONSTRUCTION Workshop80 Shows How Knows How Knows DOES ActionDecisionMakingReasoningAwareness Action 1.Professionalism Eval Form 2.End-of-Rotation Eval 3.360° Evals 4.Mini-CEX 5.Critical Incident Reports 6.Record Reviews Decision Making 1.OSCE 2.SP Exam 3.Computer Simulated Patient Reasoning 1.Oral Exam 2.Essay 3.MCQ Awareness 1.Oral Exam 2.Essay 3.MCQ Miller’s Pyramid Miller 1990 ASSESSMENT TOOLS

81 6/25/2016TEST CONSTRUCTION Workshop81 How to assess Knowledge, Skills, Attitudes Written Exams Clinical Exams Viva Knowledge +++++++ Psychomot or skills -++++- Attitude -++

82 6/25/2016TEST CONSTRUCTION Workshop82 True and False Items Make approximately half of the items true and half false. Do not lift statements directly from books. Use direct statements. Avoid words with general meanings such as large, great, many and few.

83 6/25/2016TEST CONSTRUCTION Workshop83 True and False Items, Continued Whenever you use words such as no, never, always, may, should, all and only be sure that they do not make the correct answers obvious. The question is usually false when all, always, none, never and all-inclusive terms are used. The question is usually true when usually or sometimes is used.

84 6/25/2016TEST CONSTRUCTION Workshop84 True and False Items, Continued Do not make the true statements consistently longer than the false statements. Avoid negative statements.

85 6/25/2016TEST CONSTRUCTION Workshop85 Matching The number of possible responses should exceed the number of questions. Have 5-7 items to be matched. Directions should tell if a response can be used more than once.

86 6/25/2016TEST CONSTRUCTION Workshop86 Recall Tests (Completion, Listing) Use direct questions whenever possible. Make sentence-completion items as specific as possible. In simple recall items place the blanks near or at the end of the statement. Construct the item so there is only one correct response.

87 6/25/2016TEST CONSTRUCTION Workshop87 Recall Tests, Continued Design enumeration items to call for specific facts. In fill-in-the-blank items, have all the blanks the same length. Do not leave too many blanks in the statements.

88 6/25/2016TEST CONSTRUCTION Workshop88 Essay Tests Before writing the question, know exactly what mental process of the student you want to bring out. Start essay questions with 1. compare, 2. contrast, 3. give the reasons for, 4. present the arguments for and against, 5. give original examples of, 6. explain how or why.

89 6/25/2016TEST CONSTRUCTION Workshop89 Essay Tests, Continued Use clear, precise questions. Do not have too many questions for time available. Make a list of all pertinent points that should be covered in the student ’ s answer for each question. Use these when grading.

90 6/25/2016TEST CONSTRUCTION Workshop90 نكاتي از تدوين آزمونهاي كتبي سؤالات را به ترتيب ذيل قرار دهيد : 1- صحيح - غلط 2- جوركردني 3- چندگزينه ‌ اي 4- كوتاه پاسخ 5- تشريحي سؤالات از ساده به دشوار مرتب شود. سؤالات را به ترتيب سازمان اصلي مطالب به دنبال هم مرتب كنيد.

91 6/25/2016TEST CONSTRUCTION Workshop91 MCQ تنه اصليگزينه يا پاسخ پاسخ درست Key پاسخ انحرافي Destructor

92 6/25/2016TEST CONSTRUCTION Workshop92 انواع آزمونهاي چند گزينه ‌ اي تنها گزينه درست بهترين گزينه درست منفي

93 6/25/2016TEST CONSTRUCTION Workshop93 قوانين Millman در خصوص MCQ

94 6/25/2016TEST CONSTRUCTION Workshop94 21 قانون Millman در خصوص MCQ 1- پايه بايد مسائل اصلي و كميتها را در برگيرد. 2- هر Item بايد تا حد امكان كوتاه باشد ( ضمن حفظ وضوح جملات ) 3- از ذكر سئوالات منفي در پايه حتي ‌ المقدور خودداري شود. در صورت انجام اين امر زير جمله منفي خط كشيده شود يا با حروف درشت نوشته شود.

95 6/25/2016TEST CONSTRUCTION Workshop95 21 قانون Millman در خصوص MCQ 4- پايه سئوال بايد بنحوي تنظيم شود كه بدون كمك گرفتن از ديگر موارد گزينه ‌ ها بيان كننده مسئله اصلي باشد. گزينه ها نيز بايد حتي ‌ المقدور مستقل از يكدگير باشد. 5- بهترين پاسخ بايد خواسته شود يا از عبارت بيشترين و اوليه استفاده شود. ( در صورتيكه بيش از يك پاسخ نسبتاً صحيح داشته باشد ) 6- در پايه سئوالاتي كه جاي خالي گذاشته مي ‌ باشد. قسمت حذف شده كه بايد پرشود حتي ‌ المقدور نبايد ابتداي جمله گذاشته شود.

96 6/25/2016TEST CONSTRUCTION Workshop96 21 قانون Millman در خصوص MCQ 7- دشواري ‌ هاي زباني گزينه ‌ ها بايد پايين باشد. 8- با هر گزينه يك نقطه نظر را بايد مورد سئوال قرار داد. 9- حتي ‌ المقدور از تكرار كلمات در گزينه ‌ ها خودداري شود مگر توالي منطقي وجود داشته باشد.

97 6/25/2016TEST CONSTRUCTION Workshop97 21 قانون Millman در خصوص MCQ 10- سئوالات انحرافي بايد منطقي و جالب توجه باشد ( در صورتي كه پايه سئوال درك و فهم واقعي را اندازه ‌ گيري نمايد ). 11- تمام گزينه ‌ ها از نظر دستور زبان و اصول گرامر بايد مطابق با پايه سئوال باشد يعني اگر پايه سئوال جمع است گزينه ‌ ها نيز همه جمع باشند. 12- گزينه از نظر طول جمله، دشواري فني و كاربردي يكسان باشند.

98 6/25/2016TEST CONSTRUCTION Workshop98 21 قانون Millman در خصوص MCQ 13- پايه و گزينه ‌ ها بايد از نظر قواعد دستوري، محتوي موضوعي و شكل يكنواخت و همگن باشد. 14- از توالي پاسخ صحيح در مجموعه سئوالات امتحاني خودداري شود. ( بترتيب : الف، ب، ج ، د جواب صحيح نباشد يا اكثريت با جواب ج نباشد )

99 6/25/2016TEST CONSTRUCTION Workshop99 21 قانون Millman در خصوص MCQ 15- بازاي هر موضوع حداقل 4 گزينه داشته باشيد. 16- از بكاربردن عباراتي كه بنحوي تشابه بين پايه و سئوال باشد، بايد خودداري كرد. 17- از بكاربردن عين عبارت كتاب خودداري شود. 18- از بكار بردن پايه سئوالاتي كه پاسخ به سئوال بعدي است، خودداري شود

100 6/25/2016TEST CONSTRUCTION Workshop100 21 قانون Millman در خصوص MCQ 19- گزينه ‌ ها نبايد شامل يكديگر يا در حقيقت با يك منظور باشند. 20- از شاخصهاي معلوم و خاص مثل هميشه، هرگز خودداري شود. 21- در پرسش راجع به فهم و درك يك اصطلاح يا مفهوم، ابتدا اصطلاح را ارائه نمود و سپس با يك سري مشخصه و تعاريف گزينه ها را انتخاب نمود.

101 6/25/2016TEST CONSTRUCTION Workshop101 علاوه بر اين 21 قانون 22- هيچكدام از موارد فوق براي پرسشهاي رياضي مفيد است. 23- استفاده از محتواي زياد در پرسش خودداري شود. 24- نوشتن پرسش بصورت ترتيبي مثل TEXT خودداري شود.

102 6/25/2016TEST CONSTRUCTION Workshop102 Thank you for your Time Any Questions or Comments?

103 6/25/2016TEST CONSTRUCTION Workshop103 Bloom ’ s Taxonomy CognitiveAffectivePsychomotor Knowledge-Recall Comprehension Application Analysis Synthesis Evaluation Receive/Attend Respond Valuing Synthesizing Characterized by internal values Perception of sense Preparatory Adjustment Guided Response Complex overt Response Adaptation Origination

104 6/25/2016TEST CONSTRUCTION Workshop104 سطح ارزشيابي ارزشيابيسطح تركيب تركيب سطح تجزيه و تحليل تجزيه و تحليل سطح كاربرد كاربرد سطح درك درك سطح دانش دانش نمودار ارتباط طبقات حيطه شناختي

105 6/25/2016TEST CONSTRUCTION Workshop105 « سطوح ارزشيابي دانشجو» Recognition Recall نام برده مي ‌ شود. ليست مي ‌ شود. تعريف مي ‌ شود. پديده ‌ اي مشخص مي ‌ شود. بيان مي ‌ شود. ترسيم مي ‌ شود. شمرده مي ‌ شود. مربوط به پله اول حيطه شناختي (Knowledge) Tax-1

106 6/25/2016TEST CONSTRUCTION Workshop106 سطح دروني شدن دروني شدنسطح تدوين تدوين سطح ارزش گذاري ارزش گذاري سطح واكنش واكنش سطح آمادگي آمادگي نمودار ارتباط طبقات حيطه عاطفي

107 6/25/2016TEST CONSTRUCTION Workshop107 تفسير اطلاعات بمنظور كاربرد توانايي استفاده وكاربردي ‌ توانايي استفاده و كاربرد دانسته ‌ هاي قبلي در موقعيت جديد. مثال : كاربرد - عمل مي ‌ كند - آماده مي ‌ سازد. محاسبه مي ‌ كند - ارتباط مي ‌ دهد. نمايش مي ‌ دهد - معاينه مي ‌ كند. آزمايش مي ‌ كند - تفسير مي ‌ كند. نشان مي ‌ دهد - توليد مي ‌ كند. مربوط به پله سوم حيطه شناختي Interpretation for Application Tax-2

108 6/25/2016TEST CONSTRUCTION Workshop108 عادي شدن هماهنگي حركات اجراي عمل بدون كمك آمادگي و تقليد نمودار ارتباط طبقات حيطه عملي

109 6/25/2016TEST CONSTRUCTION Workshop109 توانايي قضاوت براساس معيارهاي معين Problem Solving مثال : ارزيابي – برآورد تشخيص – انتخاب مقايسه – انتقاد تخمين - رتبه ‌ بندي ارزيابي - قضاوت اندازه ‌ گيري - تجديد نظر جداسازي - رده بندي مربوط به پله آخر حيطه شناختي Evolution Tax-3

110 6/25/2016TEST CONSTRUCTION Workshop110 Thank you for your Time Any Questions or Comments?

111 6/25/2016TEST CONSTRUCTION Workshop111 ارزش اختصاص داده شده به گزينه صحيح حدنصاب قبولي براي هر سئوال = مجموع امتياز داده شده به كليه گزينه ‌ ها مجموع حدنصاب قبولي سئوالات امتحان حد نصاب قبولي براي امتحان = تعداد سئوالات محاسبه حد نصاب قبولي M.P.L. Minimum Pass Level

112 6/25/2016TEST CONSTRUCTION Workshop112 Item Analysis Main purpose of item analysis is to improve the test Analyze items to identify: Potential mistakes in scoring Ambiguous/tricky items Alternatives that do not work well Problems with time limits

113 6/25/2016TEST CONSTRUCTION Workshop113 Criterion- Referenced and Norm- Referenced TESTS آزمونهاي معياري ( ملاكي ) آزمونهاي هنجاري ( رقابتي ) انواع آزمونها

114 6/25/2016TEST CONSTRUCTION Workshop114 TYPES OF TESTS BY PURPOSE 1. Norm-referenced Tests a. Discrimination most important aspect b. Easy items eliminated 2. Criterion-referenced Tests a. Discrimination not of critical importance. b. Items not altered or eliminated due to difficulty

115 6/25/2016TEST CONSTRUCTION Workshop115 Criterion- Referenced قبل از برگزاري آزمون معيارهاي مشخص جهت اطمينان از كسب حداقل دانش و توانايي ‌ هاي خاص تعيين مي ‌ شود و سنجش موفقيت يا عدم موفقيت دانشجو در آزمون با مقايسه وضعيت وي با معيارهاي تعيين شده انجام مي ‌ گيرد. اين روش بيشتر براي امتحانات نهايي و جهت اعطاي گواهينامه كاربرد دارد. مثال : آزمون ورودي دانشكده خلباني آزمون دانشنامه تخصصي

116 6/25/2016TEST CONSTRUCTION Workshop116 Norm- Referenced نتايج بدست آمده از كليه دانشجويان با هم مقايسه مي ‌ شوند. حدنصاب قبولي بصورت قرادادي و يا با توجه به نمرات اخذ شده توسط دانشجويان تعيين مي ‌ شود. اين روش بيشتر براي امتحانات ورودي و تشخيصي كاربرد دارد. مثال : آزمون ورودي دانشگاهها

117 6/25/2016TEST CONSTRUCTION Workshop117 بررسي تحليلي سئوالات در آزمونهاي هنجاري Norm Reference

118 6/25/2016TEST CONSTRUCTION Workshop118 ITEM ANALYSIS an Assessment tool has 3 parts 1. Item Difficulty 2. Item Discrimination 3. Distraction Analysis

119 6/25/2016TEST CONSTRUCTION Workshop119 1. تعيين نمره هر يك از دانشجويان 2. رتبه بندي دانشجويان براساس شايستگي 3. تعيين گروههاي بالا و پائين 4. محاسبه ضريب و شاخص دشواري براي هر سئوال 5. محاسبه ضريب و شاخص تشخيص براي هر سئوال 6. ارزيابي انتقادي سئوالات مراحل تجزيه و تحليل سئوالات

120 6/25/2016TEST CONSTRUCTION Workshop120 كارت تحليل سئوال عنوان آزمون : آمار استنباطي تاريخ اجراي آزمون 2/11/73 موضوع سئوال : ضريب همبستگي كدام يك از ارقام زير معرف ضريب همبستگي بيشتري است؟ الف - 55/0 * ب - 61/0 ج - 49/0 د - 23/0 بدون پاسخ دجبالفگروهها 10 2020 0000 3333 5252 0505 25% بالا 25% پايين ضريب دشواري =35 ضريب تميز =3/0

121 6/25/2016TEST CONSTRUCTION Workshop121 Tests of individual differences Two groups of individuals U – Upper group – 27% of highest scorers L – Lower group – 27% of lowest scorers U = L item difficulty index Upper group individuals who got the item right Lower group individuals who got the item right item discrimination index

122 6/25/2016TEST CONSTRUCTION Workshop122 Example – cont. 60 students who took the test. Item 14: Among 16 upper scorers, 5 have the item right. Among 16 lower scorers, only 1 has the item right.

123 6/25/2016TEST CONSTRUCTION Workshop123 Guidelines for p Consider the purpose of the test p should be low for selection tests that will select a small % of examinees (e.g., scholarships) p should be high if the test is assessing need for remedial education p should be around.5 if testing a broad range of abilities In a MC test, p should depend on the number of options

124 6/25/2016TEST CONSTRUCTION Workshop124 Guidelines for d D is ideally 1, but it never really is 1 in practice D=.30 is usually assumed acceptable D and p are interdependent, so if p is extreme, D may be lower than.30

125 6/25/2016TEST CONSTRUCTION Workshop125 Item Validity Validity of an item w.r.t. an external criterion y point biserial correlation Mean criterion score for those who got the item right Mean criterion score for everyone Total number of examinees Number of examinees who got the item right SD of criterion score

126 6/25/2016TEST CONSTRUCTION Workshop126 Example A new translation of S-B IQ test administered to 60 Turkish students Turkish version of WISC-R is also administered to check validity Item 12: 18 got it right. Mean WISC-R for those who got it right is 106. Mean WISC-R for all is 97. SD of WISC-R is 14. What is the validity of item 12?

127 6/25/2016TEST CONSTRUCTION Workshop127 ITEM ANALYSIS Difficulty Index Level of difficulty of an exam or a question 0 = Difficulty; 1= Easy Discrimination Index AKA Discriminant Index Ability of question to discriminate between Students who know the information Students who DO NOT know the information

128 6/25/2016TEST CONSTRUCTION Workshop128 ITEM ANALYSIS Difficulty Index (D): 0 - 1 Top 1/3 Scores Bottom 1/3 Scores D = N correct + N correct N + N

129 6/25/2016TEST CONSTRUCTION Workshop129 ITEM ANALYSIS Difficulty (D): 0 - 1 0______________0.5____________1.0 Hard Moderate Easy

130 6/25/2016TEST CONSTRUCTION Workshop130 ITEM ANALYSIS Example: 30 students in class 5 of Top 10 scorers got ? correct 3 of Bottom 10 scorers got ? correct D = 5 correct + 3 correct = 8 =.4 (Moderate 10 + 10 20 Difficulty)

131 6/25/2016TEST CONSTRUCTION Workshop131 Item Difficulty Defined as the proportion of people who get the item correct Symbolized by “ p ” p= (# who were correct)/ (# who responded) Difficulty should be greater than the percent who could get the item correct by chance

132 6/25/2016TEST CONSTRUCTION Workshop132 محاسبه ضريب دشواري سئوال انتخابهاي درست گروه پايين + انتخابهاي درست گروه بالا تعداد افراد گروه بالا + تعداد افراد گروه پايين براي محاسبه ضريب دشواري سئوال : 2+5 10+10 7 20 35= هر اندازه ضريب دشواري يك سئوال بزرگتر ( به 100 نزديكتر ) باشد آن سئوال آسانتر است 100  =ضريب دشواري سئوال= P 100  =ضريب دشواري سئوال= P 100 

133 6/25/2016TEST CONSTRUCTION Workshop133 تفسير ضريب دشواري سئوال هرچه واريانس نمرات حاصل از يك آزمون وابسته به هنجار بزرگتر باشد آن آزمون بهتري است (P-1)  P= واريانس سئوال 0=(0-1)  0= 0=(1-1)  1= (P-1)  P= واريانس سئوال (5/0-1)  5/0= 25/0=5/0  5/0= در نتيجه، از لحاظ انتخاب براي گنجاندن در فرم نهايي آزمون، سئوالهايي بهتر هستند كه ضريب دشواري آنها از 1 كمتر و از صفر بيشتر و به 5/0 نزديك باشد. ضريب دشواري مناسب 0.3-0.7

134 6/25/2016TEST CONSTRUCTION Workshop134 ITEM ANALYSIS Discrimination Index (P): 0-1 (AKA Discrimination Index) Top 1/3 Scores Bottom 1/3 Scores P = N correct - N correct 1/2 (N)

135 6/25/2016TEST CONSTRUCTION Workshop135 ITEM ANALYSIS Discrimination Index 0____________0.5_____________1.0 NoModerateExcellent (-) Something is wrong

136 6/25/2016TEST CONSTRUCTION Workshop136 ITEM ANALYSIS Example: 30 students in class 10 of Top 10 scorers got ? correct 2 of Bottom 10 scorers got ? correct D = 10 correct - 2 correct = 8 =.8 (Good (10 + 10)/2 10 Discrimination)

137 6/25/2016TEST CONSTRUCTION Workshop137 محاسبه ضريب تميز سئوال قدرت سئوال را در تمايز گذاري يا تشخيص بين گروه قوي و ضعيف آزمون شوندگان مشخص مي ‌ كند انتخابهاي درست گروه پايين - انتخابهاي درست گروه بالا تعداد افراد يك گروه ( بالا يا پايين ) ضريب تميز سئوا ل : 2-5 10 3 10 3/0= = ضريب تميز سئوال= d

138 6/25/2016TEST CONSTRUCTION Workshop138 تفسير ضريب تميز سئوال هر قدر ضريب تميز بزرگتر باشد، قوه تميز آن سئوال بيشتر و هر قدراين ضريب كوچكتر باشد قوه تميز آن كمتر است. در نتيجه سئوااهاي خوب يك آزمون آنهايي هستند كه داراي ضريب دشواري متوسط و ضريب تميز بالايي است.

139 6/25/2016TEST CONSTRUCTION Workshop139 Things to Remember about D Index D IndexInterpretation Maximum D (100%)all students in upper group got item right and none in lower group got it right Zero D (0%)equal numbers in both groups got item right Negative D (-75%) more students in lower group than upper group got item right Zero or Negative D discard or vastly improve item before using again on a test

140 6/25/2016TEST CONSTRUCTION Workshop140 D Index Rule of Thumb for Classroom Tests D Index Interpretation >40%excellent discrimination 25% to 39%acceptable discrimination < 25%poor discrimination

141 6/25/2016TEST CONSTRUCTION Workshop141 Summary of Standards of Acceptance Item Difficulty (P) 30% - 90% Item Discrimination (by D) 25% and above

142 6/25/2016TEST CONSTRUCTION Workshop142 Difficulty Index 0,3 0,5 0,6 0,7 ------/---------------(------------)----------/----------- recommended ------------------------------------------- acceptable too difficult too easy

143 6/25/2016TEST CONSTRUCTION Workshop143 Format Ideal Difficulty Five-response multiple-choice 70 Four-response multiple-choice 74 Three-response multiple-choice 77 True-false (two-response multiplechoice) 85

144 6/25/2016TEST CONSTRUCTION Workshop144 Discrimination Index 0.15 0.25 0.35 ----------/----------/----------/---------- throw off to check good excelent

145 6/25/2016TEST CONSTRUCTION Workshop145 Be aware very easy or very difficult test items have little discrimination items of moderate difficulty (60% to 80% answering correctly) generally are more discriminating.

146 6/25/2016TEST CONSTRUCTION Workshop146 Point-biserial correlation Used to correlate a dichotomous variable with a continuous variable In testing, used to correlate a person ’ s performance on an item (correct, incorrect) with their total test score Used as an index of item discrimination the point biserial ranges from – 1.00 to +1.00 The higher, the better. As a general rule, >+0.20 is desirable

147 6/25/2016TEST CONSTRUCTION Workshop147 Point-biserial formula Mean on the test for people who got item correct Mean on the test for people who got item incorrect Standard deviation for test IF for item 1 – IF for item

148 6/25/2016TEST CONSTRUCTION Workshop148 What is the reliability of the exam 1. Kuder- Richardson 20 2. Kuder-Richardson 21 3. Cronbach alpha

149 6/25/2016TEST CONSTRUCTION Workshop149 What is the reliability of the exam Range 0-1 Higher value indicates a strong relationship between items and test Lower value indicates a weaker relationship between test item and test

150 6/25/2016TEST CONSTRUCTION Workshop150 Guided Practice Studen t Raw scoreItem 1Item 2Item 3Item 4Item 5 A8abade B6cbece C6acecb D4abeac E2cabdc F8abcce G10abace H6abcde I8acace J4acadb

151 6/25/2016TEST CONSTRUCTION Workshop151 Difficulty Factor Item # 1 =.8 Item # 2 =.6 Item # 3 =.4 What does it mean? Item # 1 =.8 may be too easy Item # 2 =.6 good Item # 3 =.4 good

152 6/25/2016TEST CONSTRUCTION Workshop152 What does it mean? Kuder 20 Item # 1 =.88 Item # 2 =.63 Item # 3 =.40 Item # 4 =.76 Item # 5 =.89 Item 3 may not relate as well Overall the test is reliable

153 6/25/2016TEST CONSTRUCTION Workshop153 More Practice … ItemDifficult y Discrimina tion Reliability # 1.28.40.80 # 2.30.68.76 # 3.80.78.70 # 4.10.20

154 6/25/2016TEST CONSTRUCTION Workshop154 RU+RL 100*P= T Difficulty Index =P =RU تعداد دانشجويان زرنگ كه پاسخ صحيح داده ‌ اند =RL تعداد دانشجويان گروه پائين ( غير زرنگ ) كه پاسخ صحيح داده ‌ اند. =T تعداد كل دانشجويان گروه بالا ( زرنگ ) و گروه پائين ( غير زرنگ ) كه با سئوال پاسخ داده ‌ اند. هر چقدر ضريب دشواري كمتر باشد سئوال دشوارتر است. براي آزمون هنجاري بهترين ضريب دشواري 50% مي ‌ باشد. براي آزمون ملاكي از فرمول زير بهترين ضريب دشواري بدست مي ‌ آيد. محاسبه شاخص دشواري

155 6/25/2016TEST CONSTRUCTION Workshop155 تمايز دانشجويان زرنگ و غيرزرنگ را از يكديگر نشان مي ‌ دهد. RU-R DI= 2/1 T تمايز نزديك به 50% شاخص منفي = دانشجويان غيرزرنگ بيش از دانشجويان زرنگ به سئوال پاسخ درست داده ‌ اند شاخص صفر = تعداد برابري از هر دو گروه پاسخ صحيح داده ‌ اند. محاسبه شاخص تمايز

156 6/25/2016TEST CONSTRUCTION Workshop156 كاربرد شاخص ‌ ها هدف : بررسي مجدد سئوالات شاخص دشواري بالا + پايين شاخص دشواري = * 100 تعداد كل 706050 30 توصيه شده قابل قبول شاخص تمايز ( بالا - پايين ) شاخص تمايز = 2/1 تعداد كل 350/0/25150/0/0 عاليخوب تجديد نظر شود بايد حذف شوند ( با احتمال زياد )

157 6/25/2016TEST CONSTRUCTION Workshop157 بررسي تحليلي سئوالات در آزمونهاي معياري Criterion Reference

158 6/25/2016TEST CONSTRUCTION Workshop158 Criterion referenced tests Two groups of individuals U – Upper group (above criterion) L – Lower group item difficulty index Upper group individuals who got the item right Lower group individuals who got the item right item discrimination index

159 6/25/2016TEST CONSTRUCTION Workshop159 Example A test of mastery of Istanbul geography. Outcome is that 60 individuals are “ masters ” and 20 failed the test. Item 3: 45 “ masters ” and 10 who failed got the item right. What are the item difficulty and item discrimination indices?

160 6/25/2016TEST CONSTRUCTION Workshop160 هدف : ميزان دستيابي افراد به دانش مورد نظر پس از طي دوره - بر حسب هدف آموزشي سئوال ممكن است دشوار يا آسان باشد. - شاخص دشواري در اين امتحان ارزش متفاوت دارد - سئوالات بسيار آسان و يا بسيار مشكل لزوماًُ نياز به تغيير يا حذف شدن ندارد ( اگر اعتبار كافي داشته باشد ) - براي بررسي سئوالات در اين آزمونها از Pretest, Post test و مقايسه نتايج آنها استفاده مي ‌ شود. بررسي تحصيلي سئوالات در آزمونهاي معياري Criterion Reference

161 6/25/2016TEST CONSTRUCTION Workshop161 54321 شماره سؤال الفب ب ب ب ب الف : Post test ب :Pre test نام افراد +--+--+++- ح. د ++-+--+++- س. ن +--+--+++- خ. پ +--+--+++- ش. ف ++----+++- د. ه ---+--+++- ف. پ = Ra - Rb S T S=Sensitivity Instructional Effects تعداد كساني كه پس از آموزش به سؤال پاسخ درست داده ‌ اند =Ra تعداد كساني كه پيش از آموزش به سؤال پاسخ درست داده ‌ اند =Rb تعدادكساني كه به سؤال هم پيش و همه پس از آزمون پاسخ داده ‌ اند =T

162 6/25/2016TEST CONSTRUCTION Workshop162 ضريب S براي بهترين سئوال و آزمونهاي معياري معادل يك است. سئوالاتي كه با ضريب S صفر و يا كمتر يا منفي باشد قادر به سنجش تأثير آموزش نخواهد بود.

163 6/25/2016TEST CONSTRUCTION Workshop163 تحلیل آزمونهای تشریحی و عملکردی نمره میانگین سوال 2/4 ضریب دشواری = = دامنه ممکن نمرات سوال 1-6 تفاوت بین نمرات میانگین گروههای بالا و پایین برای سوال 8/2- 3/5 ضریب تمیز = = دامنه ممکن نمرات سوال 1- 6

164 6/25/2016TEST CONSTRUCTION Workshop164 تحليل گزينه هاي انحرافي هر گزينه انحرافي بايد حداقل يك نفر از گروه ضعيف را به خود جلب كند. گزينه انحرافي بايد افراد ضعيف را بيش از افراد قوي به خود جلب كند.

165 6/25/2016TEST CONSTRUCTION Workshop165 Thank you for your Time Any Questions or Comments?

166 6/25/2016TEST CONSTRUCTION Workshop166 Two issues in using instruments... Reliability 2. Reliability: the degree to which the instrument consistently measures what it purports to measure Validity 1. Validity: the degree to which the instrument measures what it purports to measure

167 6/25/2016TEST CONSTRUCTION Workshop167 Types of reliability... 2. Equivalence 1. Stability 3. Internal consistency

168 6/25/2016TEST CONSTRUCTION Workshop168 Stability 1. Stability (“test-retest”): the degree to which two scores on the same instrument are consistent over time

169 6/25/2016TEST CONSTRUCTION Workshop169 Equivalence 2. Equivalence (“equivalent forms”): the degree to which identical instruments (except for the actual items included) yield identical scores

170 6/25/2016TEST CONSTRUCTION Workshop170 Internal consistency 3. Internal consistency (“split-half” reliability with Spearman-Brown correction formula, Kuder- Richardson and Cronback’s Alpha reliabilities, scorer/rater reliability): the degree to which one instrument yields consistent results

171 6/25/2016TEST CONSTRUCTION Workshop171 RELIABILITY TEST-RETEST (COEFFICIENT OF STABILITY) PARALLEL FORM (COEFFICIENT OF EQUIVALLENCE) INTERNAL CONSISTENCY

172 6/25/2016TEST CONSTRUCTION Workshop172 INTERNAL CONSISTENCY SPLITHALF METHOD SPEARMAN BROWN PROPHECY FORMULA KRUDER-RICHARDSON METHOD COEFFICIENT ALPHA

173 6/25/2016TEST CONSTRUCTION Workshop173 KR 20 KR 20 = [K / (K-1)] x [(S2x -  pq) / S2x] K = # of trials or items S2x = variance of scores p = percentage answering item right q = percentage answering item wrong  pq = sum of pq products for all k items

174 6/25/2016TEST CONSTRUCTION Workshop174 KR 20 Example Itempq 1.50.50 2.25.75 3.80.20 4.90.10 If Mean = 2.45 and SD = 1.2, what is KR 20 ? pq.25.1875.16.09  pq = 0.6875 KR 20 = (4/3) x (1.44 – 0.6875)/1.44 KR 20 =.70

175 6/25/2016TEST CONSTRUCTION Workshop175 KR 21 If assume all test items are equally difficult, KR 20 can be simplified to KR 21 KR 21 =[(K x S2)-(Mean x (K - Mean)] ÷ [(K-1) x S2] K = # of trials or items S2 = variance of test Mean = mean of test

176 6/25/2016TEST CONSTRUCTION Workshop176 RELIABILITY OF ORAL TESTS توجه دقيق به طراحي سوالات شفاهي ساخت مدلهاي پاسخ براي هر سوال قبل از انجام ازمون برقراري INTERRATER RELIABILITY CO.=.6 خوب فكر كردن و بعد پاسخ دادن ضبط پاسخها و ارزيابي مجدد توسط ديگر أزمون كننده ها

177 6/25/2016TEST CONSTRUCTION Workshop177 RELIABILITY OF CRITERION – REFERENCED LINDMAN AND MERENDA

178 6/25/2016TEST CONSTRUCTION Workshop178 Rule of Thumb for Acceptable Reliability Coefficients for Classroom Tests Reliability CoefficientInterpretation.70 or higheracceptable reliability

179 6/25/2016TEST CONSTRUCTION Workshop179 Types of Validity: Face Content Predictive Concurrent Construct ویژگیهای روش ارزیابی 1.Item validity 2.Sampling validity Determined by expert judgment Blueprinting

180 6/25/2016TEST CONSTRUCTION Workshop180 Types of validity... 2. Criterion-related validity 3. Construct validity 1. Content validity

181 6/25/2016TEST CONSTRUCTION Workshop181 Content validity 1. Content validity: the degree to which an instrument measures an intended content area

182 6/25/2016TEST CONSTRUCTION Workshop182 Construct validity 3. Construct validity: a series of studies validate that the instrument really measures what it purports to measure

183 6/25/2016TEST CONSTRUCTION Workshop183 forms of content validity forms of content validity… sampling validity …sampling validity: does the instrument reflect the total content area? item validity …item validity: are the items included on the instrument relevant to the measurement of the intended content area?

184 6/25/2016TEST CONSTRUCTION Workshop184 Criterion-related validity 2. Criterion-related validity: an individual takes two forms of an instrument which are then correlated to discriminate between those individuals who possess a certain characteristic from those who do not

185 6/25/2016TEST CONSTRUCTION Workshop185 forms of criterion-related validity forms of criterion-related validity… …concurrent validity: the degree to which scores on one test correlate to scores on another test when both tests are administered in the same time frame predictive validity …predictive validity: the degree to which a test can predict how well individual will do in a future situation

186 6/25/2016TEST CONSTRUCTION Workshop186 Types of Validity 1. Content Validity Face Validity Sampling Validity (content validity) 2. Empirical Validity Concurrent Validity Predictive Validity 3. Construct Validity

187 6/25/2016TEST CONSTRUCTION Workshop187

188 6/25/2016TEST CONSTRUCTION Workshop188 Item discrimination How well does the item separate those that know the material from those that do not. In LXR, measured by the Point-Biserial (rpb) correlation (ranges from -1 to 1). rbp is the correlation between item and exam performance

189 6/25/2016TEST CONSTRUCTION Workshop189 Item discrimination + rpb means that those scoring higher on the exam were more likely to answer the item correctly. (better discrimination) - rpb means that high scorers on the exam answered the item wrong more frequently than low scorers. (poor discrimination) A desirable rpb correlation is +0.20 or higher.

190 6/25/2016TEST CONSTRUCTION Workshop190 Evaluation of Distractors Distractors are designed to fool those that do not know the material. Those that do not know the answer, guess among the choices. Distractors should be equally popular. (# expected = # answered item wrong / # of distractors) Distractors ideally have a low or -rpb

191 6/25/2016TEST CONSTRUCTION Workshop191 LXR Example 1 (* correct answer) A*BCDE N 860010 % 99%0% 1%0% Avg % Correct on Exam 85.3%0% 82.0%0% rpb +.06--------.06--- Very easy item, would probably review the alternates to make sure they are not ambiguous and/or provide clues that they are wrong.

192 6/25/2016TEST CONSTRUCTION Workshop192 LXR Example 2 (* correct answer) ABC*DE N 0216520 % 0%24%74%2%0% Avg % Correct on Exam 0%80.7%87.2%78.7%0% rpb ----.33+.36-.13--- Three of the alternatives are not functioning well, would review them.

193 6/25/2016TEST CONSTRUCTION Workshop193 LXR Example 3 (* correct answer) ABC*DE N3115566 %3%1%17%6%76% Avg % Correct on Exam 83.0%80.0%83.4%82.2% 86.8 % rpb-.07-.09-.15-.12+.23 Probably a miskeyed item. The correct answer is likely option E.

194 6/25/2016TEST CONSTRUCTION Workshop194 LXR Example 4 (* correct answer) AB*CDE N11433228 %13%49%3%25%9% Avg % Correct on Exam 81.5%87.4%82.3%84.5%82.4% rpb-.24+.35-.09-.08-.15 Relatively hard item with good discrimination. Would review alternatives C & D to see why they attract a relatively low & high number of students.

195 6/25/2016TEST CONSTRUCTION Workshop195 LXR Example 5 (* correct answer) AB*CDE N3601518 %3%69%1%6%21% Avg % Correct on Exam 83.0%85.3%80.0%82.2%86.8% rpb-.07+.002-.09-.12+.13 Poor discrimination for correct choice “B”. Choice “E” actually does a better job discriminating. Would review item for proper keying, ambiguous wording, proper wording of alternatives, etc. This item needs revision.

196 6/25/2016TEST CONSTRUCTION Workshop196


Download ppt "6/25/2016TEST CONSTRUCTION Workshop1 IN THE NAME OF GOD TEST CONSTRUCTION WORKSHOP J.KOOHPAYEHZADEH M.D, MPH Education development center Iran University."

Similar presentations


Ads by Google