VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

Slides:



Advertisements
Similar presentations
Quality Control in Evaluation and Assessment
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
VALIDITY AND RELIABILITY
Professor Gary Merlo Westfield State College
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.
RESEARCH METHODS Lecture 18
Chapter 4 Validity.
Test Validity: What it is, and why we care.
VALIDITY.
Concept of Measurement
Beginning the Research Design
Teaching and Testing Pertemuan 13
VALIDITY & RELIABILITY Raja C. Bandaranayake. QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability.
Research Methods in MIS
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Understanding Validity for Teachers
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
 Rosseni Din  Muhammad Faisal Kamarul Zaman  Nurainshah Abdul Mutalib  Universiti Kebangsaan Malaysia.
Measurement and Data Quality
Validity and Reliability
EDRS6208 Lecture Three Instruments and Instrumentation Data Collection.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Technical Adequacy Session One Part Three.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Principles in language testing What is a good test?
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
MODULE 3 INVESTIGATING HUMAN AND SOCIL DEVELOPMENT IN THE CARIBBEAN.
Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Chap. 2 Principles of Language Assessment
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
Measurement Validity.
Reliability, Validity, and Bias. Reliability Reliability Reliability is the extent to which an experiment, test, or any measuring procedure yields the.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.
Chapter 6 - Standardized Measurement and Assessment
TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Measurement and Scaling Concepts
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Ch. 5 Measurement Concepts.
Reliability and Validity
Concept of Test Validity
Outline the steps خطوات in the selection اختيار process عملية
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
Reliability and Validity of Measurement
Analyzing Reliability and Validity in Outcomes Assessment Part 1
VALIDITY Ceren Çınar.
RESEARCH METHODS Lecture 18
Measurement Concepts and scale evaluation
Analyzing Reliability and Validity in Outcomes Assessment
Presentation transcript:

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena

QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability How representative is the measurement?  Practicality Is it easy to construct, administer, score and interpret?  Backwash What is the impact of the test on the teaching/learning process?

VALIDITY The term validity refers to whether or not a test measures what it intends to measure. On a test with high validity the items will be closely linked to the test’s intended focus. For many certification and licensure tests this means that the items will be highly related to a specific job or occupation. If a test has poor validity then it does not measure the job-related content and competencies it ought to. There are several ways to estimate the validity of a test, including content validity, construct validity, criterion- related validity (concurrent & predictive) and face validity.

VALIDITY  Content”: related to objectives and their sampling.  “Construct”: referring to the theory underlying the target.  “Criterion”: related to concrete criteria in the real world. It can be concurrent or predictive.  “Concurrent”: correlating high with another measure already validated.  “Predictive”: Capable of anticipating some later measure.  “Face”: related to the test overall appearance.

1. CONTENT VALIDITY Content validity refers to the connections between the test items and the subject-related tasks. The test should evaluate only the content related to the field of study in a manner sufficiently representative, relevant, and comprehensible.

2. CONSTRUCT VALIDITY It implies using the construct correctly (concepts, ideas, notions). Construct validity seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a test of intelligence nowadays must include measures of multiple intelligences, rather than just logical-mathematical and linguistic ability measures.

3. CRITERION-RELATED VALIDITY Also referred to as instrumental validity, it states that the criteria should be clearly defined by the teacher in advance. It has to take into account other teachers´ criteria to be standardized and it also needs to demonstrate the accuracy of a measure or procedure compared to another measure or procedure which has already been demonstrated to be valid.

4. CONCURRENT VALIDITY Concurrent validity is a statistical method using correlation, rather than a logical method. Examinees who are known to be either masters or non- masters on the content measured by the test are identified before the test is administered. Once the tests have been scored, the relationship between the examinees’ status as either masters or non-masters and their performance (i.e., pass or fail) is estimated based on the test. This type of validity provides evidence that the test is classifying examinees correctly. The stronger the correlation is, the greater the concurrent validity of the test is.

5. PREDICTIVE VALIDITY This is another statistical approach to validity that estimates the relationship of test scores to an examinee's future performance as a master or non- master. Predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is based on the test results and the examinee’s later performance. This type of validity is especially useful for test purposes such as selection or admissions.

6. FACE VALIDITY Like content validity, face validity is determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures. Instead, anyone who looks over the test, including examinees, may develop an informal opinion as to whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to have the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure.

QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability How representative is the measurement?  Practicality Is it easy to construct, administer, score and interpret?  Backwash What is the impact of the test on the teaching/learning process?

RELIABILITY Reliability is the extent to which an experiment, test, or any measuring procedure shows the same result on repeated trials. Without the agreement of independent observers able to replicate research procedures, or the ability to use research tools and procedures that produce consistent measurements, researchers would be unable to satisfactorily draw conclusions, formulate theories, or make claims about the generalizability of their research. For researchers, four key types of reliability are:

RELIABILITY  “Equivalency”: related to the co-occurrence of two items  “Stability”: related to time consistency  “Internal”: related to the instruments  “Inter-rater”: related to the examiners’ criterion  “Intra-rater”: related to the examiners’ criterion

1. EQUIVALENCY RELIABILITY Equivalency reliability is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association. For example, a researcher studying university English students happened to notice that when some students were studying for finals, they got sick. Intrigued by this, the researcher attempted to observe how often, or to what degree, these two behaviors co- occurred throughout the academic year. The researcher used the results of the observations to assess the correlation between “studying throughout the academic year” and “getting sick”. The researcher concluded there was poor equivalency reliability between the two actions. In other words, studying was not a reliable predictor of getting sick.

2. STABILITY RELIABILITY Stability reliability (sometimes called test, re- test reliability) is the agreement of measuring instruments over time. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability. This method of evaluating reliability is appropriate only if the phenomenon that the test measures is known to be stable over the interval between assessments. The possibility of practice effects should also be taken into account.

3. INTERNAL CONSISTENCY Internal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables. For example, analyzing the internal reliability of the items on a vocabulary quiz will reveal the extent to which the quiz focuses on the examinee’s knowledge of words.

4. INTER-RATER RELIABILITY Inter-rater reliability is the extent to which two or more individuals (coders or raters) agree. Inter-rater reliability assesses the consistency of how a measuring system is implemented. For example, when two or more teachers use a rating scale with which they are rating the students’ oral responses in an interview (1 being most negative, 5 being most positive). If one researcher gives a "1" to a student response, while another researcher gives a "5," obviously the inter-rater reliability would be inconsistent. Inter- rater reliability is dependent upon the ability of two or more individuals to be consistent. Training, education and monitoring skills can enhance inter-rater reliability.

4. INTRA-RATER RELIABILITY Intra-rater reliability is a type of reliability assessment in which the same assessment is completed by the same rater on two or more occasions. These different ratings are then compared, generally by means of correlation. Since the same individual is completing both assessments, the rater's subsequent ratings are contaminated by knowledge of earlier ratings.

SOURCES OF ERROR  Examinee (is a human being)  Examiner (is a human being)  Examination (is designed by and for human beings)

RELATIONSHIP BETWEEN VALIDITY & RELIABILITY Validity and reliability are closely related. A test cannot be considered valid unless the measurements resulting from it are reliable. Likewise, results from a test can be reliable and not necessarily valid.

QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability How representative is the measurement?  Practicality Is it easy to construct, administer, score and interpret?  Backwash What is the impact of the test on the teaching/learning process?

PRACTICALITY It refers to the economy of time, effort and money in testing. In other words, a test should be…  Easy to design  Easy to administer  Easy to mark  Easy to interpret (the results)

QUALITIES OF MEASUREMENT DEVICES  Validity Does it measure what it is supposed to measure?  Reliability How representative is the measurement?  Practicality Is it easy to construct, administer, score and interpret?  Backwash What is the impact of the test on the teaching/learning process?

BACKWASH EFFECT Backwash effect (also known as washback) is the influence of testing on teaching and learning. It is also the potential impact that the form and content of a test may have on learners’ conception of what is being assessed (language proficiency) and what it involves. Therefore, test designers, delivers and raters have a particular responsibility, considering that the testing process may have a substantial impact, either positive or negative.

LEVELS OF BACKWASH It is believed that backwash is a subset of a test’s impact on society, educational systems and individuals. Thus, test impact operates at two levels:  The micro level (the effect of the test on individual students and teachers)  The macro level (the impact of the test on society and the educational system) Bachman and Palmer (1996)

THANKS THANKS