Presentation is loading. Please wait.

Presentation is loading. Please wait.

Session Objectives Explain the principles of good assessment

Similar presentations


Presentation on theme: "Session Objectives Explain the principles of good assessment"— Presentation transcript:

1 Session Objectives Explain the principles of good assessment
Evaluate the standards for good assessment Analyse the components of a test item Conduct qualitative item analysis Compare and contrast a range of assessment methods Evaluate a range of test items Compare and contrast marking systems Compare and contrast marking system for open response/performance-based items Identify common pitfalls in assessing student work

2 The need for good assessment
“In assessing students, we are making claims to know them in certain important ways” Derek Rowntree Our views of self – Self Concept – is greatly influenced by the feedback we receive from others. The grades we give students provides data for their interpretation of how intelligent or capable they are.

3 The Purposes of Assessment
The are many purposes of assessment: Selection and grading Maintaining standards Diagnosing learning difficulties Proving feedback to support the learning process As a source of information for evaluating the effectiveness of the teaching/learning strategy You may note that these purposes are not necessarily complimentary. For example, while grades and standards may be of keen interest to employers, they are unlikely to help student to learn more effectively

4 Summative and Formative Assessment
Summative Assessment This refers to any assessment where final marks or grade are allocated to a learner’s performance. Typically, this is related to end-of-course examinations. However, all assessments that contribute to the overall assessment mark/grade are at some stage summatively assessed in that the assessment decision is final – at least for that course. Formative Assessment This refers to assessment that is focused on supporting the learning process and providing clear and supportive feedback to learners – both in terms of identifying competency gaps and providing guidance for future learning. To understand these differences in assessment focus, contrast having a driving lesson and taking the final test. During the lesson, the instructor will be assessing your performance and helping you to improve – there is no pass or fail – this is Formative Assessment. However, in the actual test, you are either pass or fail – this is Summative Assessment .

5 Good Assessment Unlike teaching, good assessment is a far less contested set of practices. While assessment may not be an ‘exact science’ there are well constituted processes and procedures to ensure that the assessment system is as ‘good as it can be’

6 Principles of Good Assessment
Valid Reliable Sufficient Authentic Fairness Flexible Current Efficient These Principles of Assessment are key criteria to apply in the design and conduct of the assessment process as well as the development of assessment items and instruments

7 Valid This refers to a tests capability in measuring accurately what it is we intend to measure. For example, a valid driving test is one in which driving skills (Performance) is measured in typical traffic (Conditions) against the criteria established by the Motoring Authority (Standard)

8 Reliable This refers to the capability of a test to produce the
same scores with different examiners (persons scoring the test) Grade A Grade A Grade A It must be noted that a reliable test may not be a valid one. It is possible to reliably assess the wrong things. However, reliability is important in assessment as it deals with the consistency of performance and rating. A reliable test should consistently produce the same results (grades) with different assessors doing the marking. Examiner 3 Examiner 2 Examiner 1

9 Sufficient This refers to the important question of ‘how much
assessment evidence’ do we need in order to feel confident that a student is competent in the area assessed?’ Sufficiency can be one of the toughest challenges when planning and conducting assessment. If the assessment is insufficient, we may be accrediting students with competence that they don’t actually possess. This would not be well received by employers who trust us to deliver competent graduates who can do what our courses claim to develop. However, assessment is time consuming, for both teachers and students, and it is not serving anyone’s interest to over-assess and take up time that could be better utilized elsewhere. A key point to consider in dealing with the question of sufficiency of assessment is the importance of the learning outcomes being assessed. For example, if the knowledge and skills are key to effective performance or safety issues, then more focus must be directed to ensure competence. When the knowledge is more of a general or contextual nature, this is less likely to present a problem in cases where sufficiency has not been fully met.

10 Authentic Quite simply this refers to how sure we are that the work produced has been done by the student In an examination, we can be more confident of authenticity However, with assignments done by students in their own time, authenticity becomes a concern Authenticity refers to how sure we are that the work done by students for assessment is really their work and not copied from other sources. Ensuring authenticity is one of the main reasons for formal examinations. Unless the student is exceptionally clever at cheating (as in the graphic shown in the PPT Slide) we can be confident of the authenticity of a students work. However, when students are doing homework (take away and do in own time assignments) then authenticity becomes a serious concern – especially now with internet access, where students can download essays and assignments from students in other countries. Authenticity is also a concern with group activities, where some students may contribute little but benefit from the work done by the more hardworking students. In non examination assessment situations, there is a need to use other methods to authenticate students work. This can be done by interviewing students and asking them key questions about the work done to explore their understanding of it - but this does take time and may not guarantee authenticity.

11 Fairness Fairness relates to a number of considerations in assessment. However, they are all concerned with ensuring that learners, when being assessed, are provided with appropriate access to the assessment activities and are not unfairly discriminated against in the assessment process. Unfair discrimination typically means discrimination based on criteria unrelated to the assessment activity itself, for example, gender or racial characteristics. Fairness is a general concern throughout assessment, relating as much to providing learners with sufficient knowledge and time for assessment, to non-discriminatory processes in marking their work.

12 Flexible Flexibility is concerned with the process of assessment, not the standard of the assessment. Learners can display their learning in a range of ways (e.g., orally, written, demonstration, etc), provided the evidence is validly demonstrated. Flexibility typically becomes a consideration for learners with special needs (e.g., visual/auditory impairment, second language, etc) or untypical situations (e.g., sickness on exam day, etc). The arrangements for flexibility are usually specified by exam boards.

13 Current This refers to how recent the evidence is generated and whether it fully relates to the most up-to-date knowledge, skills and practices for the work function being assessed. This consideration mainly comes into play when prior learning or achievement is part of the assessment evidence. It may need to be checked against the industry standard and any specific policy guidelines stated.

14 Efficient Assessment can be time consuming. It is important to:
use methods (where possible) that enable assessment of a wider range of learning outcomes avoid over-assessing produce marking systems that reduce time encourage peer assessment – at the formative level Assessment takes up valuable time and resources – both for teachers and students. It is essential, therefore, that the methods used are both valid and make good use of the assessment opportunity (e.g., assessment activity, time, resources available) to assess as many of the learning outcomes as possible. Making assessment more efficient will, over the duration of the assessment process, enable better sufficiency of assessment. Balancing sufficiency with efficiency and, at the same time, ensuring validity is a big challenge and the mark of good assessment practices. VALIDITY

15 Standards for Good Assessment
Assessment Standards and criteria relating to 3 interrelated areas: A well constituted scheme of assessment (incorporating the principles of good assessment) An effective and efficient approach to conducting assessment (to ensure accurate judgement of learner performance) A means of providing feedback on assessment decisions (to support future learning) Assessing thinking shares most things in common with assessing any other skill or process. Assessment must be valid, reliable, fair and cost/resource-effective. However, there are certain distinct challenges that thinking poses for assessment – as identified in the following slide

16 Produce and review a scheme of assessment (Assessment Plan)
The scheme specifies the assessment methods to be used, their purpose, the marks to be allocated, and the timing of assessments The selected assessment methods are designed to incorporate the principles of good assessment (Validity, Reliability, Sufficiency, Authenticity, Fairness, etc) The assessment methods are well constructed and sufficiently varied The key aspects of the assessment scheme are explained to learners Opportunities are provided for learners to seek clarification on assessment requirements The scheme is reviewed at agreed times and up-dated as necessary.

17 Conduct Assessment (Judge and make decisions relating to the assessment evidence presented by learners) Learners are provided with clear access to assessment The assessment evidence is judged accurately against the agreed assessment criteria Only the criteria specified for the assessment are used to judge assessment evidence The assessment decisions are based on all relevant assessment evidence available Inconsistencies in assessment evidence are clarified and resolved The requirements to ensure authenticity are maintained

18 Providing feedback on assessment decisions
The assessment decisions are promptly communicated to learners Feedback to learners is clear, constructive and seeks to promote future learning Learners are encouraged to seek clarification and advice The assessment decisions are appropriately recorded to meet verification requirements Records are legible, accurate, stored securely and promptly passed to the next stage of the recording/certification process.

19 Planning the overall assessment framework
There are certain key decisions that need to be borne in mind when planning the assessment framework: What is to be assessed and what marks weighting is to be allocated? What assessment methods are to used? When is assessment to be conducted? Where and what resources are needed? How are assessment decisions communicated to students?

20 What is to be assessed and what marks weighting is to be allocated?
What is assessed, and the marks weighting, must directly reflect the learning outcomes for the module or unit.   A Table of Specifications is often used to documents the main subject areas, general learning objectives and the weighting attached. From the table it is possible to directly identify what to assess and the weighing to be allocated.

21 Preparing A Table of Specifications
A Table of Specifications is a two-way chart that identifies the subject content, the type of learning outcomes and their relative weighting in the module. Preparing a Table of Specifications involves the following steps: Identify the learning outcomes and content areas to be measured Weight the learning outcomes and content areas in terms of their relative importance Build the table in accordance with these relative weights by distributing the test items proportionately among the relevant cells of the table. The completed table provides a framework for systematically planning the amount and types of items to use

22 What goes into a objective?
As an objective describes some Performance aspect of learning, it must contain both knowledge and cognitive processes. For example, State the year that England won the soccer world cup? – contains specific factual knowledge – and the cognitive process of memory. A successful Performance would be recall (e.g., in written or oral form) of the date

23 What goes into a test item?
Any type of test item must include: The subject content on which the item is based: Facts, Concepts, Principles and Procedures The type of cognitive behaviour needed to respond appropriately: Memory Types of Thinking (e.g.,analysis, comparison & contrast, inference, evaluation, metacognition) It may also include: Other generic skills and affective components: Communication, Teamwork, Attitudes

24 Using a Taxonomy for writing objectives at different levels of cognitive complexity
Many educational institutions use Blooms taxonomy Potential uses: Clarifying what the desired learning entails Understanding the integral relationship between the knowledge and cognitive processes inherent in the objectives Planning teaching strategies and assessments calibrated to the objectives

25 Designing the Objective
The completed learning outcome will combine... Cognitive Processes: Memorize Analyze Compare & contrast Infer and interpret Evaluate Create Knowledge Dimension: Facts Concepts Principles Procedures Compare and contrast values & ethics

26 Characteristics of useful specific learning outcomes
Performance (what the learner is to be able to do) Conditions (important conditions under which the performance is expected) Criterion (the quality or level of performance that will be considered acceptable) Though it is not always necessary to include the second characteristic and not always practical to include the third, the more you can say about them, the better the objective will communicate its intent

27 Performance R. F. Mager, 1984, ‘Preparing Instructional Objectives’
“The most important and indispensable characteristic of a useful objective is that it describes the kind of performance that will be accepted as evidence that the learner has mastered the objective” (p.24) R. F. Mager, 1984, ‘Preparing Instructional Objectives’ Note: A performance may be either Overt or Covert. Overt refers to any kind of performance that can be observed directly (e.g., operate a software programme to produce 3-drawings from a set of given specifications) Covert performances cannot be observed directly as they are cognitive, internal and invisible (e.g., thinking, adding, etc). It is important to be able to identify the behavioural indicators that enable a valid inference of measuring this performance (e.g., thinking can be inferred from what a learner writes, speaks, does in a specific problem- solving activity

28 Conditions Some key questions in identifying conditions:
What will the learner be allowed to use? What will the learner be denied? Under what conditions will you expect the desired performance to occur? Are there any skills that you are NOT trying to develop? Does the objective exclude such skills? NB. Don’t add conditions if you don’t need them (e.g., the desired performance is perfectly clear)

29 Criterion The criterion is the standard by which performance is
evaluated, the yardstick by which achievement is assessed For example: Initiate a fire alarm at SP according to the standard operating procedures in less than 1 minute NB – Occasionally conditions and criterion blend together, but this is not a problem providing the intent is clear. Also, as with conditions, if this is not useful don’t include

30 Adding Conditions & Criteria to Performance
Adds clarity and precision to the intended learning Enables more accuracy and reliability in assessment Against: Can become cumbersome May lead to a focus on basic knowledge and skills that are easy to measure Reduces flexibility – without the conditions and standard, an objective can be used in many contexts

31 The range of student performances that confirm outcomes have been met?
Typically these performances demonstrate that the student can actually do what is clearly identified in the outcomes. This may refer to: Accurately recalling specific knowledge that has been acquired e.g., effectively memorized) Displaying understanding of concepts, principles, and procedures by being able to explain their connectedness and applications in a range of situations (e.g., transfer). This typically results from the application of good thinking to the various knowledge components involved) Showing competence in specific skills that actually apply knowledge basis and skill sets in real world applications (e.g., testing a circuit, producing a report, displaying team-work, etc).

32 Table of Specifications for this workshop
Table of Specifications for this workshop Topics Abilities M U D Total A Key planning considerations in conducting assessment 5 10 B Preparing a Table of Specifications 3 7 C Types of test items 20 30 55 D Preparing a marking scheme 15 25 18 47 35 100

33 Qualitative Item Analysis
A non-numerical method for analysing test items that checks for: Content Validity (the degree to which a test item measures an intended content area) Construct Validity (the degree to which a test measures intended mental operations, e.g., recall, types of thinking, application) Item Design Quality (the degree to which a test has technically well designed items)

34 What assessment methods are to be used?
Assessment is not an exact science. All methods have limitations in terms of the measurement of human capability rendered. The following are key questions to ask in designing and using methods of assessment: ·        do they accurately measure identified learning outcomes? ·        is a sufficient range employed to encourage learner motivation and enable them to display competence in different ways? ·        are they fostering (wherever possible) an understanding of the key concepts, principles and procedures of the subject matter? ·        do they make cost-effective use of time in generating sufficiency of evidence to infer competence? ·        do they provide fair assessment situations for learners? ·        are they systematically organized into an effective and balanced assessment scheme?

35 When is assessment to be conducted?
The major considerations typically revolve around how much assessment should be conducted at the end of the programme (terminal assessment) and how much over the duration of the programme (continuous assessment). Continual assessment captures a more representative picture of student performance and spreads the assessment load. Terminal assessment creates more assessment pressure and perhaps pushes the student to learn more at one given point in time. Other important questions are: Have students had sufficient opportunity to have internalised this learning and be able to demonstrate the necessary competence What other commitments do student have at the time of assessment – do these create unnecessary or unrealistic burdens?

36 Where and what resources are needed for conducting assessment?
These are ‘nuts and bolts’ planning decisions, but very important. It is important to book necessary rooms and ensure that appropriate resources are available for the type of assessment to be conducted. This is especially the case when laboratory equipment needs to be prepared, etc. It may also be necessary to ensure sufficient supervising staff to ensure the smooth running of the assessment.

37 How are assessment decisions communicated to students?
Assessment decisions are not simply grades on a piece of paper, but represent judgements of worth. Many students are likely to internalise the assessments we make of them. In giving feedback to learners, it is important that learners are provided with: A clear explanation for the assessment decision made. Students need positive reinforcement for what has been achieved, but they also need to know what they have not demonstrated in the assessment and why it is important Constructive guidance on what learning needs to be developed, and how this might be achieved, in order to develop necessary competencies presently lacking or not sufficiently established

38 Types of Assessment Item
Assessment items are the ‘nuts and bolts’ of any assessment strategy. They are what we get learners to do in order for them to show us that they are competent in the areas assessed. Basically, assessment items can be seen in terms of two broad categories: Fixed response (Objective Tests): where the student chooses an answer from limited options provided (True-False Items; Multiple-Choice Items; Matching Items; Completion Items; Interpretive Exercises) Open response (Essay-Type): where the student, to varying degrees, chooses the answers provided

39 Selecting Items In selecting the type of assessment items, the guiding
principle is: Use the items types that provide the most direct measures of student performance specified by the intended learning outcomes From this principle, the following rules typically apply: Skills are best tested by performance tests (the learner performs the tasks described in the objective under real or simulated conditions) Knowledge is best tested by written or oral tests Attitudes are best tested by observations of performance in a range of situations and over time.

40 True-False Items The true-false question consists of a statement with which the student can register agreement or disagreement. Examples: Formative assessment is primarily concerned with allocating grades T F Objective tests are more reliable than essay-type questions T F Assessment is an exact science T F True - false test items can validity assess types of thinking T F Moderation increases the reliability of assessment T F

41 Uses and Limitations of True-False Items
Easy to construct, administer and score Learner response is simple – requiring only the identification of the statements as true or false Validly assesses learning outcomes at the level of knowledge acquisition and basic comprehension Limitations Possibility of learners getting one-half of the questions correct by chance Not valid for assessing deeper understanding of the content or application

42 Multiple-Choice Items (MCQ’s)
MCQ’s often vary in format and structure, but essentially provide the student with a question presented in the stem and a choice of response answers (one correct answer – the key – and typically 3 wrong answers or distracters). The learner must select his/her answer from the options provided. There are 5 basic formats for the construction of MCQ’s: A premise (p) presented in the stem is followed by a particular consequence © in one of the options Two or more premises in combination lead to a particular conclusion Two propositions are presented in the stem. The learner must decide whether both are true: neither is true; (a) but not (b) is true; (b) but not (a) is true Classification of terms, names, statements A set of information is presented as a stimulus (e.g., a written scenario, graph, table, article, etc). MCQ’s are then based on the information presented

43 If test scores for a group of students remain constant over time, we can conclude that:
a) validity is increasing b) reliability is increasing c) verification is decreasing d) representativeness is decreasing. If pass rates for a module have progressively deteriorated over the past 3 years, and there is no evidence of change in terms of student or staff cohort, we are most likely to conclude that: a) student attitudes to work had deteriorated b) lecturers are assessing more stringently c) examinations had increased in difficulty d) teaching quality had deteriorated. Structured questions are best classified as: a) student response items b) objective test items c) multiple choice items d) criteria referenced items

44 The NVQ National Framework is composed of elements with performance criteria.
Performance criteria are examples of (a) norm-referencing; (b) competency-based assessment? a) (a) but not (b) b) (b) but not (a) c) both (a) and (b) d) Neither (a) or (b) Table 1 shows the number of students responses to each test item in an examination paper and those scoring 60% or over. Question no. No. of response (60%+) From the data presented in Table 1, the most likely inference is: a) students had done well overall b) some questions were more confusing than others c) certain topic areas had been studied in more detail d) students had done poorly overall.

45 Uses and Limitations of MCQ’s
If well designed they provide an effective method for assessing a wide range of learning outcomes, from knowledge acquisition to types of thinking They are easy to administer and score Are more reliable than true-false items Limitations They can be difficult and time-consuming to design well, particularly at application level They are not particularly valid for assessing: skill applications, whether technical or human communications. complex activities that require a number of interrelated abilities and skills attitudes, dispositions creativity

46 Matching Items Matching items could be considered as a group of multiple choice items combined together, but having a single set of responses. Typically, a range of options is presented in one column, which need to be correctly matched with options in a second column. Example below: Column A contains a list of characteristics of test items. On the line to the left of each statement, write the letter of the test item in Column B that best fits the statement. Each response in Column B may be used once, more than once, or not at all. Column A Column B __ 1. Least useful for educational diagnosis A. Multiple-choice items. __ 2. Measures greatest variety of learning outcomes B. True-false items. __ 3. Most difficult to score objectively C. Short-answer items. __ 4. Provides the highest score by guessing

47 Uses and Limitations of Matching Items
It is possible to measure a large amount of related factual material in a relatively short time Limitations Restricted to the measurement of factual information based on rote learning Susceptible to the presence of irrelevant clues

48 Completion Items This form of test is commonly used to measure familiarity with rules, procedures, formulas, etc., by requiring the learner to complete a statement in which a critical or unique element is missing. Examples below: Q.1 When a test item truly measures what it intended to measure, we can say that the item is __________ Q.2 When assessment focuses on the development of learning, we are likely to refer to such assessment as __________ Q.3 The process for ensuring the quality of assessment practice is referred to as __________

49 Uses and Limitations of Completion Items
Easy to construct Very effective for assessing recall of information, basic understanding and certain mathematical skills, such as formula application and computation Reduces the chances of guessing as compared to other objective tests Limitations Can be difficult to mark. Answers may not be exactly the words expected, though be partially correct Not particularly valid for assessing application and types of thinking

50 Interpretive Exercises
An interpretive exercise consists of a series of objective items (MCQ’s and/or True-False) based on a common set data (e.g., written materials, graphs,charts, tables, maps or pictures). This format allows for a greater range and flexibility in measuring more complex learning outcomes. These include: Analyse relationships between part and systems Make inferences and interpretations from various sources of information Compare and contrast different options and scenarios Evaluate alternatives and make decisions

51 Choosing the fixed-response item to use
The Multiple-Choice Item provides the most generally useful format for measuring achievement at various levels of learning. However, there are occasions when other formats are more usable, for example: When there are only two possible alternatives, a shift can be made to a true –false item When there are a number of similar factors to be related, a shift can be made to a matching-item When the items are to measure types of thinking and other complex outcomes, a shift can be made to the interpretive exercise

52 Open-Response (essay type) items
The main feature of open-response items is that the student supplies, rather than selects, the correct answer. In broad terms there are two main types of open-response item: Restricted Response Items These are shorter and more focused essay items. They involve the student to apply less content knowledge and are more specific in the cognitive abilities that are involved. Extended Response Items In extended response items the student has a high degree of freedom in the response to the essay item. This type of essay item is typically used to assess a range as well as a high level of cognitive abilities.

53 Restricted Response Extended Response
1. List the major similarities and differences between multiple choice and performance tests. Focus specifically on validity and reliability. Limit your answer to one page. (8 marks) Explain what is meant by verification. Identify 3 ways in which it can promote quality in assessment practice. (10 marks) Extended Response Compare and contrast 3 different assessment methods that you think are appropriate for assessing thinking in your module. Explain and illustrate the advantages and limitations of each method. (20 marks) 2. There are many purposes for assessment. Evaluate the extent to which these different purposes supports the learning process. (25 marks) Distinctions between these two types of essay item are of degree and their use would depend on what was to be the main purpose of assessment in terms of identified learning outcomes.

54 Uses and Limitations of Essay Type Items
Relatively easy to construct and provide a means to assess a wide range of higher-level cognitive abilities. : analyze relationships compare and contrast options identify assumptions in a position(s) taken explain cause-and-effect relations make predictions organize data to support a viewpoint point out advantages and disadvantages to a option(s) integrate data from several sources evaluate the quality or worth of an item, product, or action Limitations There are four main limitations to essay type items: It is difficult to establish marking criteria that can be applied consistently by markers. Subjectivity is a major concern with essay type items – hence reliability is lessened Writing skill influences the scoring. Skilful bluffing can raise scores; errors in grammar and spelling typically lower scores Fewer essay items can be used as compared to objective tests – therefore more limited sampling across the range of learning outcomes They take a long time to mark.

55 General Design Considerations
Clearly identify the key knowledge areas (concepts, principles and procedures) and cognitive abilities (e.g., analysis, inference and interpretation, evaluation) you want the learner to be able to demonstrate before you write the question Write the question in a way that clearly and unambiguously defines the task to the student. Cue the main cognitive abilities you want students to demonstrate in the wording of the essay. For example, compare and contrast, give reasons for, predict, etc. Consider these guidelines in relation to the following examples: Poor item: Discuss the value of performance-based tests Better version: Evaluate the usefulness and limitations of using performance tests in a module you teach. Identify one topic area where you think such tests provide the most valid assessment of student learning. Explain and illustrate why this topic area is most validly assessed by performance tests.

56 What is Performance-Based Assessment?
Performance tests are the most authentic form of assessment as they measure direct competence in real world situations. A performance test is one that directly measures identified learning, focusing on the actual competence displayed in the performance. A driving test is a typical example of a performance test, where the examinee is tested on real driving performance in context, i.e. on the road.

57 Examples of Performance – Based Assessment
In educational testing, the degree of authenticity or realism is usually a matter of degree. However, there are many ways in which assessments tasks can achieve a reasonable degree of realism – for example: Real work projects and tasks Simulations Problem solving through case studies Presentations Any activity that largely models what would be done by professionals in the world of work

58 Example 1: Design A Food Package
Select a food product and design the packaging that you think will give it best marketability. You must be able to identify the product attributes, protection and enhancement needed to satisfy the functional and marketing requirements, and use suitable packaging material(s) and package type. The work produced should reflect the quality of your thinking in the following areas: identify the criteria for evaluating the marketability of a product analyze the components of a product that constitute an effective design generate new ways of viewing a product design beyond existing standard forms predict potential clients response to the product given the information you have monitor the development on the group’s progress and revise strategy where necessary An example of a task scenario from an engineering module. The key types of thinking have been cued for the students. It is important to note that by cuing the types of thinking for students, this does not do the work for them. This is no different from knowing what you have to do in a driving test – it does not make the test any easier, you must still have the competence to meet the driving standards. Once students become familiar with the types of thinking and develop competence in using them, it becomes less necessary to provide these cues.

59 Example 2: Design and conduct a small experiment to test the Halo Effect
In groups of 3-4, design and conduct a small experiment to test the Halo Effect in person perception. You may choose the particular focus for the experiment, but it must: Clearly test the Halo Effect in person perception Be viable in terms of accessing relevant data Meet ethical standards in conducting experiments with persons Follow an established method and procedure Produce results that support or refute the hypothesis Once completed, the experiment should be written up in an appropriate format of approximately 2000 words. It should document the important stages of the experiment and compare and contrast the data found with existing findings on the Halo Effect. This is the learning task/performance test item used to assess the learning outcomes identified in the previous slide.

60 Steps in designing performance – based items
Step 1: Identify the knowledge, skills (and attitudes if relevant) to be incorporated into the task For this step it is important to: Choose specific topic areas in your curriculum that contain knowledge essential for key understanding of the subject (e.g., key concepts, principles, etc) Identify the types of thinking that are important for promoting student understanding and subsequent competence in these topic areas (e.g., analysis, comparison and contrast, inference and interpretation, evaluation, generating possibilities, metacognition) Identify other attributes (e.g., attitudes/habits of mind) that may be relevant to effective learning in the task (e.g., perseverance, time-management, etc) In a Thinking Curriculum, a lot of focus is placed on students being actively and collaboratively involved in real world problem-solving. The use of real life performance tasks, projects, case studies and other simulated real world activity is essential. However, it is important that these tasks are carefully designed to effectively promote the learning outcomes of the curriculum. In this and the following slide, the key steps and notes of guidance are provided to help you design these tasks. Remember, in the first set of slides “Underpinning model of learning…” – competent performance involves the dynamic use of knowledge, thinking, doing and desire. You will probably need to do a fair bit of thinking and doing in order to produce good learning tasks. However, the effort put in will be worth the benefits gained in terms of supporting instruction and helping students to learn effectively. Also, interesting and challenging tasks usually motivate students much more than traditional classroom learning activities.

61 Steps in designing performance – based items
Step 2: Produce the learning task It is important that the task: Clearly involves the application of the knowledge, skills and attitudes identified from Step 1. Is sufficiently challenging, but realistically achievable in terms of student’s prior competence, access to resources, and time frames allocated. Successful completion involves more than one correct answer or more than one correct way of achieving the correct answer Clear notes of guidance are provided, which: Identify the products of the task and what formats of presentation are acceptable (e.g. written report, oral presentation, portfolio, etc) Specify the parameters of the activity (e.g. time, length, areas to incorporate, individual/collaborative, how much choice is permitted, support provided, etc) Cue the types of thinking and other desired process skills Spell out all aspects of the assessment process and criteria.

62 Performance – Based Assessment +’s and –’s
PLUS Measures a range of complex skills and processes in real world or authentically simulated contexts Enables assessment of both the effectiveness of the process and product resulting from performance of a task Links clearly with learning and instruction in a planned developmental manner Motivates students through meaningful and challenging activities MINUS More time consuming than paper and pencil type assessment Where courses focus on underpinning knowledge, there is less opportunity for performance – based assessment As these items often involve professional judgement, there is always the problem of subjectivity in marking

63 Evaluating test items Has each item’s subject content been verified
(matched to learning outcomes)? Yes No Have you identified the cognitive response behaviour (types of thinking involved)? Yes No Has the correct answer been identified or appropriate marking format created? Yes No Have you followed the item writing advice for each type of item? Yes No Have you edited the items for clarity, bias or insensitivity? Yes No Have you piloted the items? Yes No

64 The importance of a valid marking scheme
Having a well designed and accurate marking scheme and scoring system for the assessments that learners complete is essential to the assessment process. For fixed response items this is a simple process of tabulating the number of correct scores on test items. These can then be converted into grades if necessary However, for essay-type/open response items, there needs to be a clear and well constructed marking scheme – especially for extended response items.

65 Key planning considerations in producing a marking scheme
Decide on what exactly is to be assessed from the item - Performance Areas. These must be reflect the learning objectives for the module Decide on the Performance Criteria for each of the performance areas. These are the key operations or elements that underpin competence in each of the performance areas. Decide on the Marks Weighting for each performance area. This must reflect the table of specifications in the module document Decide on the sources of Performance Evidence to be used in assessing the item (e.g., written, oral, products, observation, questioning) Decide on the Format for the marking scheme – typically a Checklist or Rating Scale/Scoring Rubric For MCQ’s, there is no need to develop a marking system as correct scores can simply be collated and presented accordingly. However, as assessment items become more complex involving a range of content, types of thinking and other process skills, there is a need to carefully construct the marking scheme. In doing this it is firstly necessary to decide on the scope of the assessment item – as documented in the slide.

66 Decide format on the basis of whether the item involves High or Low Inference
Low inference items are those where the performances being tested are clearly visible and there is a widely established correct answer (e.g., conducting a fire drill, setting up an experiment) Here a Checklist is most appropriate High inference items involve performances that are less directly visible and/or more open to subjective judgement (e.g., creative writing, managing a team) Here a rating scale/scoring rubric is most appropriate A major challenge to test design is to produce tasks that require low inference scoring systems. Unfortunately, many worthwhile student outcomes reflecting higher order thinking lend themselves more to high inference scoring.

67 Developing a checklist
Identify the important components - procedures, processes or operations - in an assessment activity for example, in conducting an experiment one important operation is likely to be the generation of a viable hypothesis For each component, write a statement that identifies competent performance for this procedure, process or operation in the above example, the following may be pertinent: A clear viable hypothesis is described Allocate a mark distribution for each component - if appropriate this is likely to reflect its importance or level of complexity If you follow this process carefully, you will produce a valid and user-friendly checklist. Checklist are most useful in situations where the assessment decision seeks to identify a competent/ not competent judgement on performance. Note: Checklist are most useful for low inference items –where the performance evidence is clearly agreed and there is little disagreement relating to effective or ineffective performance (e.g., observable steps)

68 Table of Specifications
Assessment checklist for Assignment 1: Design and conduct a small experiment to test the Halo Effect Performance Areas: The context of the experiment is accurately described  A clear viable hypothesis is presented  The method/procedure is appropriate  There is no infringement on persons  Findings are clearly collated and presented  Valid inferences and interpretations are drawn from the data and comparison is made with existing data  7. The write-up of the experiment meets required conventions  These are the main operations and processes to be assessed form the item. Note the performance criteria for each of the seven areas is not provided. The allocation of marks for each performance area will reflect the weighting allocated in the Table of Specifications

69 Developing a rating scale/scoring rubric
Define the performance areas for an assessment for example, ‘Valid inferences and interpretations are drawn from the data and comparison is made with existing data’ Identify the key constructs/elements that underpin competence for each performance area Using the above example: inference and interpretation comparison and contrast Write a concise description of performance at a range of levels from very good to very poor for example, 5 = very good; 1 = very poor The rating scale/scoring rubric is an adaptation of the checklist system in that it produces a range of descriptions of performance, typically 5 levels from very poor to very good. This enable levels of qualitative decision making in the assessment process. The following four slides show the relationship between a set of learning outcomes, a performance-based learning task/assessment item and these two formats for a marking scheme. Note: Rating Scales/Scoring Rubrics are most for useful for high inference items – where the performance evidence requires considerable professional judgement in making an assessment decision

70 Scoring Rubric Valid inferences and interpretations are drawn from the data and comparison is made with existing data Score Description 5 All valid inferences have been derived from the data. Interpretations are consistently logical given the data obtained. All essential similarities and differences are identified between this data and existing data. The significance of these similarities and differences is fully emphasized. 4 Most of the valid inferences have been derived from the data. Interpretations are mainly logical given the data obtained. Most of the essential similarities and differences are identified between this data and existing. The main significance of these similarities and differences is emphasized. 3 Some valid inferences have been derived from the data. Some logical interpretations are made from the data obtained. Some essential similarities and differences are identified between this data and existing data. The significance of these similarities and differences is only partly established. 2 Few valid inferences derived and limited interpretation of findings. Comparison and contrast with existing data is partial and the significance is not established 1 Failure to make valid inferences and interpretations An example of a scoring rubric for the area ‘Valid inferences and interpretations are drawn from the data and comparison is made with existing data’. It is important to remember that the rubric does not make the assessment decision, it is guide for focusing the professional judgement of the assessor.

71 Key checks for your scoring system
Irrespective of the format chosen, the system must: incorporate the most important performance criteria that underpin competence in each performance area identified marks allocated reflect the cognitive activities and skills which the assessment activity requires the learner to demonstrate adequate provision is made for acceptable alternative answers the scheme is sufficiently broken down and organized to allow the marking to be as objective as possible This slides highlights the important criteria to bear in mind in constructing your marking scheme. Note these criteria apply to constructing any marking scheme – not solely in relation to assessing thinking.

72 What grading and recording system is to be employed?
Unless you are using a pass-fail system, it is likely that marks from various assessments need to be collated and translated into a final mark and/or grade. Furthermore, summative grades need to be carefully recoded and secured. Ensure that you carefully follow the grading and recoding system employed by the institution/department.

73 Pitfalls in Assessment
This final set of slides focuses on some common pitfalls in conducting assessment – be careful to avoid them: The Halo Effect The Contrast Effect Assessing effort and progress rather than achievement Lack of clarity with the marking scheme Discriminatory practices

74 The Halo Effect This is where our existing conceptions of a learners work affect subsequent marking. For example, if we are used to a high standard of work from a student we may develop a tendency to over-mark future poorer work. The converse is also true in the case of students who are generally perceived as less able

75 The Contrast Effect This arises when the outcomes of an assessment are affected by comparing a particular learner with the preceding one, whether the work is good or bad. For example, if we have just assessed several weak assignments and are then presented with a quite well presented one, there is a danger of giving it more marks than it perhaps really deserves.

76 Assessing effort and progress rather than achievement
This occurs when an assessor is distracted by the efforts and progress a learner has made, rather than focusing on the actual attainments in relation to the learning outcomes and assessment criteria. Remember, you must assess in relation to the performance areas identified in the marking scheme. Only if there is an allocation of marks for effort and progress can you then give marks accordingly.

77 Lack of clarity with the marking scheme
This is a common problem, resulting from not being sure about what to assess and the allocation of appropriate marks to parts of the assessment activity. Ensure that you are familiar with the learning outcomes, the marking scheme and what constitutes appropriate standards for different levels of performance.

78 Discriminatory practices
“I really think that people who are not purple And lack antenna cannot master engineering” A Lien 4007 In assessment, as in other situations, this occurs when the assessor discriminates – either positively or negatively – in relation to a learner because of race, gender, creed, sexual preference or special needs. Care needs to be taken to ensure that learners receive fair and equal opportunities during their assessments.


Download ppt "Session Objectives Explain the principles of good assessment"

Similar presentations


Ads by Google