Presentation is loading. Please wait.

Presentation is loading. Please wait.

CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY

Similar presentations


Presentation on theme: "CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY"— Presentation transcript:

1 CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY

2 Scaling Scaling is a procedure for the assignment of numbers (or other symbols) to a property of objects in order to import some of the characteristics of numbers to properties in question

3 Methods of Scaling Rating scales Ranking scales
Have several response categories and are used to elicit responses with regard to the object, event, or person studied. Ranking scales Make comparisons between or among objects, events, persons and elicit the preferred choices and ranking among them.

4 Rating Scales Dichotomous scale Is used to elicit a Yes or No answer.
Nominal scale

5 Dichotomous Scale Do you own a car? Yes No
This scale is also called a dichotomous scale. It offers two mutually exclusive response choices. In the example shown in the slide, the response choices are yes and no, but they could be other response choices too such as agree and disagree.

6 Rating Scales (Cont’d)
Category scale Uses multiple items to elicit a single response. Nominal scale

7 Category Scale Where in northern California do you reside? North Bay
South Bay East Bay Peninsula Other (specify:_____________) When there are multiple options for the rater but only one answer is sought, the multiple-choice, single-response scale is appropriate. The other response may be omitted when exhaustiveness of categories is not critical or there is no possibility for an other response. This scale produces nominal data.

8 Rating Scales (Cont’d)
Likert scale Is designed to examine how strongly subjects agree or disagree with statements on a 5-point scale. Interval scale

9 Likert Scale My work is very interesting Strongly disagree Disagree
Neither agree nor disagree Agree Strongly agree The Likert scale was developed by Rensis Likert and is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favorable or unfavorable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favorableness and the scores may be summed to measure the participant’s overall attitude. Likert scales may use 5, 7, or 9 scale points. They are quick and easy to construct. The scale produces interval data. Originally, creating a Likert scale involved a procedure known as item analysis. Item analysis assesses each item based on how well it discriminates between those people whose total score is high and those whose total score is low. It involves calculating the mean scores for each scale item among the low scorers and the high scorers. The mean scores for the high-score and low-score groups are then tested for statistical significance by computing t values. After finding the t values for each statement, they are rank-ordered, and those statements with the highest t values are selected. Researchers have found that a larger number of items for each attitude object improves the reliability of the scale.

10 Rating Scales (Cont’d)
Semantic differential scale Several bipolar attributes are identified at the extremes of the scale, and respondents are asked to indicate their attitudes. Interval scale

11 Semantic Differential
The semantic differential scale measures the psychological meanings of an attitude object using bipolar adjectives. Researchers use this scale for studies of brand and institutional image. The method consists of a set of bipolar rating scales, usually with 7 points, by which one or more participants rate one or more concepts on each scale item. The scale is based on the proposition that an object can have several dimensions of connotative meaning. The meanings are located in multidimensional property space, called semantic space. It is efficient and easy for securing attitudes from a large sample. Attitudes may be measured in both direction and intensity. The total set of responses provides a comprehensive picture of the meaning of an object and a measure of the person doing the rating. It is standardized and produces interval data. Exhibit 13-6 provides basic instructions for constructing an SD scale.

12 Rating Scales (Cont’d)
Numerical scale Similar to the semantic differential scale, with the difference that numbers on a 5-point or 7-point scale are provided, with bipolar adjectives at both ends. Interval scale

13 Numerical Scale How pleased are you with your new real estate agent?
Extremely Extremely Pleased Displeased

14 Rating Scales (Cont’d)
Itemized rating scale A 5-point or 7-point scale with anchors, as needed, is provided for each item and the respondent states the appropriate number on the side of each item, or circles the relevant number against each item. Interval scale

15 Itemized Rating Scale Very Unlikely Unlikely Neither Unlikely Likely Very Likely Nor Likely 1. I will be changing my job within the next 12 months

16 Rating Scales (Cont’d)
Fixed or constant sum scale The respondents are here asked to distribute a given number of points across various items. Ordinal scale

17 Fixed or Constant-Sum Scales
The constant-sum scale helps researchers to discover proportions. The participant allocates points to more than one attribute or property indicant, such that they total a constant sum, usually 100 or 10. Participant precision and patience suffer when too many stimuli are proportioned and summed. A participant’s ability to add may also be taxed. Its advantage is its compatibility with percent and the fact that alternatives that are perceived to be equal can be so scored. This scale produces interval data.

18 Rating Scales (Cont’d)
Stapel scale This scale simultaneously measure both the direction and intensity of the attitude toward the items under study. Interval data

19 Stapel Scales The Stapel scale is used as an alternative to the semantic differential, especially when it is difficult to find bipolar adjectives that match the investigative question. In the example, there are three attributes of corporate image. The scale is composed of the word identifying the image dimension and a set of 10 response categories for each of the three attributes. Stapel scales produce interval data.

20 Rating Scales (Cont’d)
Graphic rating scale A graphical representation helps the respondents to indicate on this scale their answers to particular question by placing a mark at the appropriate point on the line. Ordinal scale

21 Graphic Rating Scales The graphic rating scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings is possible if participants are sophisticated enough to differentiate and record them. They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length from either endpoint. The results are treated as interval data. The difficulty is in coding and analysis. Other graphic rating scales use pictures, icons, or other visuals to communicate with the rater and represent a variety of data types. Graphic scales are often used with children.

22 Ranking Scales Paired Comparison
Used when, among a small number of objects, respondents are asked to choose between two objects at a time.

23 Paired-Comparison Scale
Using the paired-comparison scale, the participant can express attitudes unambiguously by choosing between two objects. The number of judgments required in a paired comparison is [(n)(n-1)/2], where n is the number of stimuli or objects to be judged. Paired comparisons run the risk that participants will tire to the point that they give ill-considered answers or refuse to continue. Paired comparisons provide ordinal data.

24 Ranking Scales (Cont’d)
Forced Choice Enable respondents to rank objects relative to one another, among the alternatives provided.

25 Forced Choice The forced ranking scale lists attributes that are ranked relative to each other. This method is faster than paired comparisons and is usually easier and more motivating to the participant. With five item, it takes ten paired comparisons to complete the task, but the simple forced ranking of five is easier. A drawback of this scale is the number of stimuli that can be handed by the participant. This scale produces ordinal data.

26 Ranking Scales (Cont’d)
Comparative Scale Provides a benchmark or a point of reference to assess attitudes toward the current object, event, or situation under study.

27 Comparative Scale When using a comparative scale, the participant compares an object against a standard. The comparative scale is ideal for such comparisons if the participants are familiar with the standard. Some researchers treat the data produced by comparative scales as interval data since the scoring reflects an interval between the standard and what is being compared, but the text recommends treating the data as ordinal unless the linearity of the variables in question can be supported.

28 Goodness of Measures Reliability
Indicates the extent to which it is without bias (error free) and hence ensures consistent measurement across time and across the various items in the instrument.

29 Reliability Stability of measures: Internal consistency of measures:
Test-retest reliability Parallel-form reliability Correlation Internal consistency of measures: Interitem consistency reliability Cronbach’s alpha Split-half reliability

30 Goodness of Measures (Cont’d)
Validity Ensures the ability of a scale to measure the intended concept. Content validity Criterion related validity Construct validity

31 Validity Content validity
Ensures that the measure includes an adequate and representative set of items that tap the concept. A panel of judges

32 Validity (Cont’d) Criterion related validity
Is established when the measure differentiates individuals on a criterion it is expected to predict Concurrent validity: established when the scale differentiates individuals who are known to be different Predictive validity: indicates the ability of measuring instrument to differentiate among individuals with reference to future criterion Correlation

33 Validity (Cont’d) Construct validity
Testifies to how well the results obtained from the use of the measure fit the theories around which the test is designed. Convergent validity: established when the scores obtained with two different instrument measuring the same concept are highly correlated Discriminant validity: established when, based on theory, two variables are predicted to be uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so Correlation, factor analysis, convergent-discriminant techniques, multitrait-multimethod analysis

34 Understanding Validity and Reliability
Exhibit 12-6 illustrates reliability and validity by using an archer’s bow and target as an analogy. High reliability means that repeated arrows shot from the same bow would hit the target in essentially the same place. If we had a bow with high validity as well, then every arrow would hit the bull’s eye. If reliability is low, arrows would be more scattered. High validity means that the bow would shoot true every time. It would not pull right or send an arrow careening into the woods. Arrows shot from a high-validity bow will be clustered around a central point even when they are dispersed by reduced reliability.


Download ppt "CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY"

Similar presentations


Ads by Google