Presentation is loading. Please wait.

Presentation is loading. Please wait.

SUMBER KESALAHAN DALAM ………. FOTO: smno.kampus.ub.febr2012.

Similar presentations


Presentation on theme: "SUMBER KESALAHAN DALAM ………. FOTO: smno.kampus.ub.febr2012."— Presentation transcript:

1 SUMBER KESALAHAN DALAM ………. FOTO: smno.kampus.ub.febr2012

2 SUMBER KESALAHAN Social desirability – Giving politically correct answers Response sets – All yes, or all no responses Acquiescence – Telling you what you want to hear Personal bias – Wants to send a message Diunduh dari: ………….. 23/8/2012 Response order – Recency - Respondent stops reading once s/he gets to the response s/he likes – Primacy - Remember better the initial choices – Fatigue Item order – Answers to later items may be affected by earlier items (simple, factual items first) – Respondent may not know how to answer earlier questions

3 MENILAI INSTRUMENT Three issues to consider – Validity: Does the instrument measure what its supposed to measure – Reliability: Does it consistently repeat the same measurement – Practicality: Is this a practical instrument Sumber: Dr.Ir. Pudji Muljono, Msi. Disampaikan pada Lokakarya Peningkatan Suasana Akademik Jurusan Ekonomi FIS-UNJ tanggal 5 sampai dengan 9 Agustus 2002 Diunduh dari: https://docs.google.com/viewer?a=v&q=cache:k1SsN7H88fAJ:repository.ipb.ac.id/bitstream/handle/

4 MENILAI INSTRUMENT Proses Validasi Konsep Melalui Panel 1. Memeriksa instrumen mulai dari konstruk sampai penyusunan butir Dalam kaitan ini, beberapa hal yang perlu diperhatikan antara lain : 1.Apakah dimensi yang dirumuskan sudah merupakan jabaran yang tepat dari konstruk yang telah dirumuskan dan sesuai untuk mengukur konstruk dari variabel yang hendak diukur ? 2.Apakah indikator yang dirumuskan sudah merupakan jabaran yang tepat dari dimensi yang telah dirumuskan dan sesuai untuk mengukur konstruk dari variabel yang hendak diukur ? 3.Apakah butir-butir instrumen yang dibuat telah sesuai untuk mengukur indikator-indikator dari variabel yang hendak diukur ? 2. Menilai butir Item Butir yang sudah dibuat diberikan kepada sekelompok panel untuk dinilai dengan tetap mengacu pada tolok ukur di atas. Metode penilaian butir dapat dilakukan dengan beberapa cara, misalnya dengan Metode Thurstone dan Pair Comparison. Sumber: Dr.Ir. Pudji Muljono, Msi. Disampaikan pada Lokakarya Peningkatan Suasana Akademik Jurusan Ekonomi FIS-UNJ tanggal 5 sampai dengan 9 Agustus 2002 Diunduh dari: https://docs.google.com/viewer?a=v&q=cache:k1SsN7H88fAJ:repository.ipb.ac.id/bitstream/handle/

5 TIPE-TIPE VALIDITAS Face validity – Does the instrument, on its face, appear to measure what it is supposed to measure Content validity – Degree to which the content of the items adequately represent the universe of all relevant items under study – Generally arrived at through a panel of experts

6 TIPE-TIPE VALIDITAS Content validity “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999). Content validity refers to the degree to which the content of the items reflects the content domain of interest (APA, 1954) Content validity addresses the adequacy and representativeness of the items to the domain of testing purposes Content validity is not usually quantified possibly due to : 1.) subsuming it within construct validity; 2.) ignoring it as important; and/or 3.) relying on accepted expert agreement procedures Diunduh dari: plaza.ufl.edu/.../CONTENT%20VALIDITY.p... 25/8/2012

7 Criterion related – Degree to which the predictor is adequate in capturing the relevant aspects of criterion – Uses Correlation analysis – Concurrent validity Criterion data is available at the same time as predictor score- requires high correlation between the two – Predictive validity Criterion is measured after the passage of time Retrospective look at the validity of the measurement Known-groups TIPE-TIPE VALIDITAS Diunduh dari: ………….. 23/8/2012 Criterion related – Degree to which the predictor is adequate in capturing the relevant aspects of criterion – Uses Correlation analysis – Concurrent validity Criterion data is available at the same time as predictor score- requires high correlation between the two – Predictive validity Criterion is measured after the passage of time Retrospective look at the validity of the measurement Known-groups

8 Stability – Test-retest: Same test is administered twice to the same subjects over a short interval (3 weeks to 6 months) – Look for high correlation between the test and retest – Situational factors must be minimized TIPE-TIPE RELIABILITAS Diunduh dari: ………….. 23/8/2012 Equivalence – Degree to which alternative forms of the same measure produce same or similar results – Give parallel forms of the same test to the same group with a short delay to avoid fatigue – Look for high correlation between the scores of the two forms of the test – Inter-rater reliability Internal Consistency – Degree to which instrument items are homogeneous and reflect the same underlying constructs – Split-half testing where the test is split into two halves that contain the same types of questions – Uses Cronbach’s alpha to determine internal consistency. Only one administration of the test is required – Kuder-Richardson (KR 20 ) for items with right and wrong answers

9 PRAKTIKALITAS Is the survey economical Cost of producing and administering the survey Time requirement Common sense! Convenience Adequacy of instructions Easy to administer Can the measurement be interpreted by others Scoring keys Evidence of validity and reliability Established norms Diunduh dari: ………….. 23/8/2012 A comparison of Likert scale and traditional measures of self-efficacy. By Maurer, Todd J.; Pierce, Heather R. Journal of Applied Psychology, Vol 83(2), Apr 1998, This study addressed whether a Likert-type measurement format can be used as an alternative to the traditional format for measuring self-efficacy. Classical reliability, observed correlations with relevant criteria, and confirmatory factor analyses were used to assess the similarity of the two formats in a sample of 128 college students. The results indicated that Likert-type and traditional measures of self-efficacy have similar reliability–error variance, provide equivalent levels of prediction, and have similar factor structure and similar discriminability. Overall, considering both practicality and the apparent similarity of empirical results from the two methods, a Likert scale seems to offer an acceptable alternative method of measuring self-efficacy. Limitations and suggestions for future research are discussed.

10 Development of a Multi-item Scale Develop Theory Generate Initial Pool of Items: Theory, Secondary Data, and Qualitative Research Collect Data from a Large Pretest Sample Statistical Analysis Develop Purified Scale Collect More Data from a Different Sample Final Scale Select a Reduced Set of Items Based on Qualitative Judgement Evaluate Scale Reliability, Validity, and Generalizability Diunduh dari: ………….. 23/8/2012

11 EVALUASI SEKALA DiscriminantNomologicalConvergent Test/ Retest Alternative Forms Internal Consistency Content Criterion Construct GeneralizabilityReliabilityValidity Scale Evaluation Diunduh dari: ………….. 23/8/2012

12 Transformasi data ordinal ke interval dengan Method of Succesive Interval (MSI) Diunduh dari: ………….. 24/8/2012 Untuk dapat diolah menjadi analisis regresi, data ordinal yang biasanya didapat dengan menggunakan skala likert, dll (skor kuesioner), maka terlebih dahulu data ini harus ditrasformasikan menjadi data interval salah satu cara yang dapat digunakan adalah Method of Succesive Interval (MSI). Sepintas memang terlihat sangat susah karena kita harus membuat frekuensi, kemudian menentukan proporsi, membuat proporsi komulatif dst. Langkah-langkah Method of Succesive Interval (MSI).sebagai berikut: 1.Membuat ferkuensi dari setiap butir jawaban pada masing-masing kategori pertanyaan. 2.Membuat proporsi dengan cara membagi frekuensi dari setiap butir jawaban dengan seluruh jumlah responden. 3.Membuat proporsi kumulatif 4.Menentukan nilai z untuk setiap butir jawaban berdasarkan nilai frekuensi yang telah diperoleh dengan bantuan tabel z riil. 5.Menghitung nilai skala, dengan rumus: 6. Penyertaan nilai skala Nilai penyertaan inilah yang disebut skala interval dan dapat digunakan dalam perhitungan analisis regresi.

13 TRANSFORMASI DATA ORDINAL MENJADI INTERVAL Diunduh dari: myunanto.staff.gunadarma.ac.id/.../Transformasi+Data+Ordinal+Men... ………….. 24/8/2012 Data primer adalah data yang direspon langsung oleh responden berdasarkan wawancara ataup daftar pertanyaan yang dirancang, disusun, dan disajikan dalam bentuk skala; baik skala nominal, ordinal, interval maupun ratio. Teknik pengumpulan data seperti ini lazim digunakan karena selain bisa langsung menentukan skala pengukuranya, juga dapat melengkapi hasil wawancara yang dilakukan dengan responden. Melakukan manipulasi data dengan cara transformasi “skala” dari ordinal menjadi interval, selain bertujuan untuk tidak melanggar kelaziman, juga untuk mengubah agar syarat distribusi normal dapat dipenuhi ketika menggunakan statistika parametrik. Menurut Sambas Ali Muhidin dan Maman Abdurahman, “salah satu metode transformasi yang sering digunakan adalah metode succesive interval (MSI)”. Ada dua pendapat berbeda tentang bagaimana skor-skor yang diberikan terhadap alternatif jawaban pada skala pengukuran Likert. Pendapat pertama mengatakan bahwa skor 1, 2, 3, 4, dan 5 adalah data interval. Pendapat ke dua, menyatakan bahwa jenis skala pengukuran Likert adalah ordinal. Alasannya skala Likert merupakan Skala Interval adalah karena skala sikap merupakan dan menempatkan kedudukan sikap seseorang pada kesatuan perasaan kontinum yang berkisar dari sikap “sangat positif”, artinya mendukung terhadap suatu objek psikologis terhadap objek penelitian, dan sikap “sangat negatif”, yang tidak mendukung sama sekali terhadap objek penelitian. Ciri spesifik yang dimiliki oleh data yang diperoleh dengan skala pengukuran ordinal, adalah bahwa, data ordinal merupakan jenis data kualitatif, bukan numerik, berupa kata-kata atau kalimat, seperti misalnya sangat setuju, kurang setuju, dan tidak setuju, jika pertanyaannya ditujukan terhadap persetujuan tentang suatu event. Atau bisa juga respon terhadap keberadaan suatu Bank “PQR” dalam suatu daerah yang bisa dimulai dari sangat tidak setuju, tidak setuju, ragu-ragu, Setuju, dan sangat setuju. Data interval adalah termasuk data kuantitatif, berbentuk numerik, berupa angka, bukan terdiri dari kata-kata, atau kalimat. Peneliti melakukan penelitian dengan menggunakan pendekatan kuantitatif, termasuk di dalamnya adalah data interval, data yang diperoleh dari hasil pengumpulan data bisa langsung diolah dengan menggunakan model statistika. Akan tetapi data yang diperoleh dengan pengukuran skala ordinal, berbentuk kata-kata, kalimat, penyataan, sebelum diolah, perlu memberikan kode numerik, atau simbol berupa angka dalam setiap jawaban.

14 PERLUKAH DATA ORDINAL DI TRANSFORMASI KE INTERVAL DENGAN MSI? Posted by: Muji Gunarto on: 25 Desember 2008 Diunduh dari: msi/………….. 24/8/2012 Data ordinal dengan Skala Likert STS(1), TS(2), R(3), S(4), SS(5) jika diubah skalanya menjadi interval maka skore interval akan mirip sama urutannya dengan skore asli ordinal dan berkorelasi sebesar 99%. Jadi data asli ordinal sama dengan interval dan dapat dianggap interval. Hal yang membedakan adalah interpretasi model dari hasil analisis anatara data ordinal dengan data interval. Misalkan ada model regresi sebagai berikut: Y = a + b1X1 +b2X2 Y = X X2 Jika data interval misal Y = Produksi padi (ton/Ha), X1 = Pupuk UREA (kg/Ha) dan X2 = Bibit (kg/Ha), maka interpretasinya adalah kalau pupuk dinaikan 10% maka produksi padi akan naik 2.5%, kalau bibit naik 10%, maka produksi padi naik 3%. Kalau data ordinal (kualitatif) misalnya Y= kepuasan kerja, X1=Komitmen, X2=motivasi, maka tidak bisa diinterpretasikan jika komitmen naik 10% maka kepuasan naik 2.5% (karena datanya kualitatif) jadi hanya bisa dikatakan bahwa komitmen berpengaruh (signifikan) terhadap kepuasan kerja seberapa besar pengaruhnya tidak tahu (karena kualiatif). Walaupun data ordinal tadi sudah menjadi interval tetap saja kita tidak bisa interpretasi seperti data kuantitatif karena data aslinya adalah kualitatif.

15 Diunduh dari: ………….. 25/8/2012 For a questionnaire to fulfill a researcher’s purposes, the questions must meet the basic criteria of relevance and accuracy. To achieve these ends, a researcher who is systematically planning a questionnaire’s design will be required to make several decisions— typically, but not necessarily, in the following order: 1.What should be asked? 2.How should questions be phrased? 3.In what sequence should the questions be arranged? 4.What questionnaire layout will best serve the research objectives? 5.How should the questionnaire be pretested? Does the questionnaire need to be revised? Questionnaire design

16 What Should Be Asked? Certain decisions made during the early stages of the research process will influence the questionnaire design. The preceding chapters stressed good problem definition and clear research questions. This leads to specific research hypotheses that, in turn, clearly indicate what must be measured. Different types of questions may be better at measuring certain things than are others. In addition, the communication medium used for data collection—that is, telephone interview, personal interview, or self-administered questionnaire—must be determined. This decision is another forward linkage that influences the structure and content of the questionnaire. Therefore, the specific questions to be asked will be a function of previous decisions made in the research process. At the same time, the latter stages of the research process will also have an important impact on questionnaire wording and measurement. For example, when designing the questionnaire, the researcher should consider the types of statistical analysis that will be conducted. Diunduh dari: ………….. 25/8/2012 Questionnaire design

17 A survey is only as good as the questions it asks Diunduh dari: 24/8/2012 Langkah-Langkah Pembuatan Quesioner: Langkah 1: Menentukan Hipotesis Menentukan tipe survey yang akan digunakan Menentukan pertanyaan-pertanyaan survey Menentukan kategori jawaban mendesain letak survey Langkah 2: Rencanakan bagaimana data akan dikumpulkan Uji awal alat pengukuran Langkah 3: tentukan target populasi tentukan teknik sampling (random sampling, non random sampling) tentukan ukuran sampel pilih sampel Langkah 4: Temukan responden lakukan interview/wawancara kumpulkan data dengan teliti Langkah 5: Masukkan data kedalam komputer periksa ulang seluruh data lakukan analisis statistik pada data yang diperoleh Langkah 6: Jelaskan metode dan penemuan dalam laporan penelitian Presentasikan untuk mendapatkan masukan dan evaluasi

18 What should you ask? The questions asked are a function of previous decisions The questions asked are a function of future decisions (such as statistical analysis) Diunduh dari: ………….. 25/8/2012 Ecosystem services (also called environmental services or nature’s services) are benefits provided by ecosystems to humans, that contribute to making human life both possible and worth living. Many of these goods and services are traditionally viewed as free benefits to society, or "public goods" - wildlife habitat and diversity, watershed services, carbon storage, and scenic landscapes, for example. Lacking a formal market, these natural assets are traditionally absent from society’s balance sheet; their critical contributions are often overlooked in public, corporate, and individual decision-making.

19 Key criteria Questionnaire relevancy – No unnecessary information is collected and only information needed to solve the problem is obtained. Be specific about your data needs; tie each question to an objective Questionnaire accuracy – Information is both reliable and valid Diunduh dari: ………….. 25/8/2012 What is LCA? In the context of environmental challenges and the need for more sustainable production modes, Life Cycle Assessment (LCA) has been brought forward as an important and comprehensive method for analyzing the environmental impact of products and services. While its has long been used in the industry, LCA has only been applied to agricultural systems for the last 10 years. (http://lca- rice.cirad.fr/what_is_lca) LCA is defined and framed by ISO standards. It involves 4 typical phases: 1.Goal and scope definition (where system is delineated, indicators are chosen, functional unit is selected, ways of presenting results are decided upon, etc.) 2.Inventory analysis (where all inputs and resources used are inventoried and quantified, related to the given functional unit; it is a kind of mass and energy balance, focused on environmentally relevant flows) 3.Impact assessment (where environmental impact indicators are calculated, involving classification and characterization stages) 4.Interpretation and presentation of results (with necessary caution regarding indicators -uncertainty and errors should be considered, sensitivity analysis should be carried out).

20 Key criteria Diunduh dari: ………….. 25/8/2012

21 Questionnaire Relevancy A questionnaire is relevant to the extent that all information collected addresses a research question that will help the decision maker address the current business problem. Asking a wrong question or an irrelevant question is a common pitfall. If the task is to pinpoint store image problems, questions asking for political opinions are likely irrelevant. The researcher should be specific about data needs and have a rationale for each item requesting information. Irrelevant questions are more than a nuisance because they make the survey needlessly long. In a study where two samples of the same group of businesses received either a one-page or a three-page questionnaire, the response rate was nearly twice as high for the one-page survey. Conversely, many researchers, after conducting surveys, find that they omitted some important questions. Therefore, when planning the questionnaire design, researchers must think about possible omissions. Is information on the relevant demographic and psychographic variables being collected? Would certain questions help clarify the answers to other questions? Will the results of the study provide the answer to the manager’s problem? Diunduh dari: ………….. 25/8/2012

22 Questionnaire Accuracy Once a researcher decides what should be asked, the criterion of accuracy becomes the primary concern. Accuracy means that the information is reliable and valid. While experienced researchers generally believe that questionnaires should use simple, understandable, unbiased, unambiguous, and nonirritating words, no step-by-step procedure for ensuring accuracy in question writing can be generalized across projects. Obtaining accurate answers from respondents depends strongly on the researcher’s ability to design a questionnaire that will facilitate recall and motivate respondents to cooperate. Respondents tend to be more cooperative when the subject of the research interests them. When questions are not lengthy, difficult to answer, or ego threatening, there is a higher probability of obtaining unbiased answers. Question wording and sequence also substantially influence accuracy, which can be particularly challenging when designing a survey for technical audiences. The Department of Treasury commissioned a survey of insurance companies to evaluate their offering of terrorism insurance as required by the government’s terrorism reinsurance program. But industry members complained that the survey misused terms such as “contract” and “high risk,” which have precise meanings for insurers, and asked for policy information “to date,” without specifying which date. These questions caused confusion and left room for interpretation, calling the survey results into question. Diunduh dari: ………….. 25/8/2012

23 Phrasing Questions Open ended response versus fixed alternative questions “?” Decision criteria: type of research; time; method of delivery; budget; concerns regarding researcher bias Diunduh dari: ………….. 25/8/2012 Open-ended response questions pose some problem or topic and ask respondents to answer in their own words. If the question is asked in a personal interview, the interviewer may probe for more information, as in the following examples: 1.What names of local banks can you think of? 2.What comes to mind when you look at this advertisement? 3.In what way, if any, could this product be changed or improved? I’d like you to tell me anything you can 4.think of, no matter how minor it seems. 5.What things do you like most about working for Federal Express? What do you like least? 6.Why do you buy more of your clothing in Nordstrom than in other stores? 7.How would you describe your supervisor’s management style? 8.Please tell us how our stores can better serve your needs. Open-ended response questions are free-answer questions. The fixed-alternative questions—sometimes called closed-ended questions—which give respondents specific limited-alternative responses and ask them to choose the one closest to their own viewpoints. For example: Did you use any commercial feed or supplement for livestock or poultry in 2010? Yes No Would you say that the labor quality in Japan is higher, about the same, or not as good as it was 10 years ago? Higher About the same Not as good

24 Avoid Leading questions (pertanyaan yang “menggiring”) Overly complex questions Use of jargon Loaded questions (can use a counterbiasing statement) Ambiguity Double barreled questions Making assumptions Diunduh dari: ………….. 25/8/2012 Avoid Leading and Loaded Questions leading question = A question that suggests or implies certain answers. Leading and loaded questions are a major source of bias in question wording. A leading question suggests or implies certain answers. A study of the dry cleaning industry asked this question: Many people are using dry cleaning less because of improved wash-and-wear clothes. How do you feel wash-and- wear clothes have affected your use of dry cleaning facilities in the past 4 years? Use less No change Use more It should be clear that this question leads the respondent to report lower usage of dry cleaning. The potential “bandwagon effect” implied in this question threatens the study’s validity. loaded question = A question that suggests a socially desirable answer or is emotionally charged. A loaded question suggests a socially desirable answer or is emotionally charged. Consider the following question from a survey about media influence on politics: What most influences your vote in major elections? 1.My own informed opinion 2.Major media outlets such as CNN 3.Newspaper endorsements 4.Popular celebrity opinions 5.Candidate’s physical attractiveness 6.Family or friends 7.Video advertising (television or Web video) 8.Other

25 Order? Order bias results from an alternative answer’s position in a set of answers or from the sequencing of questions – Funneling technique: general to specific helps understand the frame of reference first Anchoring effect: the first concept measured tends to become a comparison point from which subsequent evaluations are made COUNTERBIASING STATEMENT An introductory statement or preamble to a potentially embarrassing question that reduces a respondent’s reluctance to answer by suggesting that certain behavior is not unusual. An introductory counterbiasing statement or preamble to a question that reassures respondents that their “embarrassing” behavior is not abnormal may yield truthful responses: Some people have time to brush three times daily but others do not. How often did you brush your teeth yesterday? If a question embarrasses the respondent, it may elicit no answer or a biased response. This is particularly true with respect to personal or classification data such as income or education. The problem may be mitigated by introducing the section of the questionnaire with a statement such as this: To help classify your answers, we’d like to ask you a few questions. Again, your answers will be kept in strict confidence. Diunduh dari: ………….. 25/8/2012

26 AVOID AMBIGUITY: BE AS SPECIFIC AS POSSIBLE Items on questionnaires often are ambiguous because they are too general. Consider such indefinite words as often, occasionally, regularly, frequently, many, good, and poor. Each of these words has many different meanings. Diunduh dari: ………….. 25/8/2012 For one consumer, frequent reading of Fortune magazine may be reading all 25 issues in a year, while another might think 12, or even 6 issues a year is frequent. Earlier, we used the following question as an example of a checklist question: Please check which, if any, of the following sources of information about investments you regularly use. What exactly does regularly mean? It can certainly vary from respondent to respondent. How exactly does hardly any differ from occasionally? Where is the cutoff? It is much better to use specific time periods whenever possible. A brewing industry study on point-of-purchase advertising (store displays) asked their distributors: How often does the company shut down production for sanitary maintenance? 1.Annually (once a year) 2.Semiannually (once every six months) 3.Quarterly (about every three months) 4.At least once monthly 5.Less frequently (less often than once a year) Here the researchers clarified the terms permanent, semipermanent, and temporary by defining them for the respondent. However, the question remained somewhat ambiguous. Beer marketers often use a variety of point-of-purchase devices to serve different purposes—in this case, what is the purpose? In addition, analysis was difficult because respondents were merely asked to indicate a preference rather than a degree of preference. Thus, the meaning of a question may not be clear because the frame of reference is inadequate for interpreting the context of the question. A student research group asked this question: What media do you rely on most? 1. Television 2. Radio 3. Internet 4. Newspapers This question is ambiguous because it does not provide information about the context. “Rely on most” for what—news, sports, entertainment? When—while getting dressed in the morning, driving to work, at home in the evening? Knowing the specific circumstance can affect the choice made.

27 Ranking, sorting, rating or choice? How many categories or response positions? Balanced or unbalanced? Forced choice or nonforced choice? Single measure or index? Decisions Diunduh dari: ………….. 25/8/2012 The Air Quality Index (AQI) is an index for reporting daily air quality. The Environmental Protection Agency calculates the AQI for five major air pollutants regulated by the Clean Air Act: ground-level ozone, particle pollution (also known as particulate matter), carbon monoxide, sulfur dioxide and nitrogen dioxide. The higher the AQI value, the greater the level of air pollution and the greater the health concern.

28 Single dichotomy or dichotomous-alternative questions “Are you currently registered in a course at the University of Lethbridge? Yes____ No____” Respondent chooses one of two alternatives (yes/no; male/female) What scale would this data create? Types of fixed alternative questions… Multi-choice alternative – Respondent chooses from several alternatives – Many types… Diunduh dari: ………….. 23/8/2012

29 Determinant choice – Choose only one from several possible responses “Which faculty are you currently registered in at the University of Lethbridge? Management ___ Education ____ Arts/Science____ Health sciences____ Combined degree____ What type of scale would these data create? Multi-choice alternative questions… Frequency determination – Asks for an answer about frequency of occurrence In a typical week, how often do you purchase chocolate chip cookies? __never __ once __ 2 or more times What type of scale would these data create? Diunduh dari: ………….. 23/8/2012

30 Check list – Provide multiple answers to a single question – Should be mutually exclusive and exhaustive “What brands of chocolate chip cookies have you, to the best of your memory, purchased in the past month (check all that apply?)” __ Dare __ Chips A’hoy __ Presidents Choice Decadent etc. etc. What type of scale would these data create? Diunduh dari: ………….. 23/8/2012 CHECK LIST

31 Diunduh dari: ………….. 23/8/2012 CHECK LIST The checklist question allows the respondent to provide multiple answers to a single question. The respondent indicates past experience, preference, and the like merely by checking off items. In many cases the choices are adjectives that describe a particular object. A typical checklist question might ask the following: Please check which, if any, of the following sources of information about investments you regularly use. 1.Personal advice of your broker(s) 2.Brokerage newsletters 3.Brokerage research reports 4.Investment advisory service(s) 5.Conversations with other investors 6.Web page(s) 7.None of these 8.Other (please specify) __________

32 Attitude: An enduring disposition to consistently respond to various aspect of the world, including persons, events and objects Typically seen as having three components: – Cognitive – Affective – Behavioural Diunduh dari: 24/8/2012 ATTITUDE RATING SCALES Scaling Techniques for Measuring Data Gathered from Respondents The term scaling is applied to the attempts to measure the attitude objectively. Attitude is a resultant of number of external and internal factors. Depending upon the attitude to be measured, appropriate scales are designed. Scaling is a technique used for measuring qualitative responses of respondents such as those related to their feelings, perception, likes, dislikes, interests and preferences. Nominal Scale This is a very simple scale. It consists of assignment of facts/choices to various alternative categories which are usually exhaustive as well mutually exclusive. These scales are just numerical and are the least restrictive of all the scales. Instances of Nominal Scale are - credit card numbers, bank account numbers, employee id numbers etc. It is simple and widely used when relationship between two variables is to be studied. In a Nominal Scale numbers are no more than labels and are used specifically to identify different categories of responses. How do you stock items at present? [ ] By product category [ ] At a centralized store [ ] Department wise [ ] Single warehouse. Ordinal Scale Ordinal scales are the simplest attitude measuring scale used in Marketing Research. It is more powerful than a nominal scale in that the numbers possess the property of rank order. The ranking of certain product attributes/benefits as deemed important by the respondents is obtained through the scale.Marketing Research Rank the following attributes (1 - 5), on their importance in a microwave oven. a.Company Name b.Functions c.Price d.Comfort e.Design

33 Affective The feelings or emotions toward an object Diunduh dari: ………….. 23/8/2012 Cognitive Knowledge and beliefs Behavioral Predisposition to action Intentions Behavioral expectations

34 Attitude Scales: Scaling Defined The term scaling refers to procedures for attempting to determine quantitative measures of subjective and sometimes abstract concepts. It is defined as a procedure for the assignment of numbers to a property of objects in order to impart some of the characteristics of numbers to the properties in question. Diunduh dari: ………….. 23/8/2012 Unidimensional Scaling Multidimensional Scaling Procedures designed to measure only one attribute of a respondent or object Procedures designed to measure several dimensions of a respondent or object

35 PROSES MENGUKUR ATTITUDE Ranking Rating Sorting Choice Diunduh dari: ………….. 24/8/2012 A ranking is a relationship between a set of items such that, for any two items, the first is either 'ranked higher than', 'ranked lower than' or 'ranked equal to' the second. In mathematics, this is known as a weak order or total preorder of objects. It is not necessarily a total order of objects because two different objects can have the same ranking. The rankings themselves are totally ordered. For example, materials are totally preordered by hardness, while degrees of hardness are totally ordered. By reducing detailed measures to a sequence of ordinal numbers, rankings make it possible to evaluate complex information according to certain criteria. Thus, for example, an Internet search engine may rank the pages it finds according to an estimation of their relevance, making it possible for the user quickly to select the pages they are likely to want to see. Analysis of data obtained by ranking commonly requires non-parametric statistics. In statistics, "ranking" refers to the data transformation in which numerical or ordinal values are replaced by their rank when the data are sorted. For example, the numerical data 3.4, 5.1, 2.6, 7.3 are observed, the ranks of these data items would be 2, 3, 1 and 4 respectively. For example, the ordinal data hot, cold, warm would be replaced by 3, 1, 2. In these examples, the ranks are assigned to values in ascending order. (In some other cases, descending ranks are used.) Ranks are related to the indexed list of order statistics, which consists of the original dataset rearranged into ascending order. Some kinds of statistical tests employ calculations based on ranks: Friedman test Kruskal-Wallis test Rank products Spearman's rank correlation coefficient Wilcoxon rank-sum test Wilcoxon signed-rank test. Ranks can sometimes have non-integer values for tied data values. Thus, in one way of treating tied data values, when there is an even number of copies of the same data value, the statistical rank (being the median rank of the tied data) can end in ½.

36 Types of attitude scales Simple attitude scales Most basic form – respondent responds to a single question Do not allow for fine distinctions or placement on continua – You are at a company party and are feeling nervous, but you are obligated to be there. Do you: __ find someone you know to buddy up with __ take it as an opportunity to meet new people What type of scale would these data create? Diunduh dari: ………….. 24/8/2012 Attitude Scales An attitude scale is a special type of questionnaire designed to produce scores indicating the intensity and direction (for or against) of a person's feelings about an object or event. There are several types of scales that can be constructed, but the most common is the Likert -type. The scale is constructed so that all its questions concern a single issue. Attitude scales are often used in attitude change experiments. One group of people is asked to fill out the scale twice, once before some event, such as reading a persuasive argument, and again afterward. A control group fills out the scale twice without reading the argument. The control group is used to measure exposure or practice effects. The change in the scores of the experimental group relative to the control group, whether their attitudes have become more or less favorable, indicates the effects of the argument. Likert-type Scale A Likert -type scale, named for Rensis Likert (1932) who developed this type of attitude measurement, presents a list of statements on an issue to which the respondent indicates degree of agreement using categories such as : Strongly Agree, Agree, Undecided, Disagree, and Strongly Disagree.

37 Category scales – More sensitive; provides more information – Overall, how satisfied are you with the high speed performance of your Mercedes: __ very satisfied __ somewhat satisfied __ neither satisfied nor dissatisfied __ somewhat dissatisfied __ very dissatisfied If you could choose, how long would each term be? ___26 weeks __ 13 weeks __ 6 weeks ___4 weeks What type of scale would these data create? Diunduh dari: 24/8/2012 CATEGORY SCALES

38 Diunduh dari: 24/8/2012 CATEGORY SCALES RATIO SCALES AND CATEGORY SCALES OF ODOUR INTENSITY J. R. PIGGOTT and R. HARPER. Chem. Senses (1975) 1 (3): doi: /chemse/ The relation between a ratio scale obtained by magnitude estimation and a category scale of the odour intensity of 1-butanol was studied, together with individual variations in the ratio scale. Series of solutions of butanol in water in small bottles were presented to a panel for judgement, half using the method of magnitude estimation, the other half a category scale. Plots were made of the category scale against the ratio scale, and the ratio scales of individual members of the panel were analysed. A power function exponent of 0.48 was found for the panel's ratio scale, with individual values ranging from 0.25 to The category scale was curved relative to the ratio scale; variability of the magnitude estimates was approximately proportional to the magnitude estimates; and a small time-order error was found. Odour intensity exhibits the three tested characteristics of a prothetic continuum, and the variability of individual exponents was not as great as sometimes suggested.

39 Summated rating scales – the Likert scale – Respondents indicate their attitudes by checking how strongly they agree or disagree with statements – Chocolate chip cookies are my preferred variety of cookie Strongly disagree Disagree UncertainAgreeStrongly Agree (1) (2) (3) (4) (5) What type of scale would these data create? Diunduh dari: 24/8/2012 Summated rating scales – the Likert scale Ratio scales, category scales, and variability in the production of loudness and softness. Bruce Schneider, and Harlan Lane. J. Acoust. Soc. Am. Volume 35, Issue 12, pp (December 1963). Several studies have shown that category scales are nonlinearly related to ratio scales of subjective magnitude. A variability model has been proposed previously to account for this departure from linearity. This article examines the model in the light of the empirical relations that enter into it: the ratio scale of subjective magnitude, the corresponding category scale, and the variability of judgments in both physical and psychological units. These relations are determined, through repeated measurement with a single observer, for the psychological continuum, loudness, and its inverse, softness. The ratio scales are shown to be reciprocals, and the category scales complements. The category scale of softness is more concave downward, relative to its magnitude scale, than is the category scale of loudness. This outcome is also derived mathematically from the empirical equations relating the four scales to physical magnitude. Variability is found to increase with increasing stimulus magnitude at the same rate for both loudness and softness productions, expressed either in physical units or in psychological units. Hence, the variability model is found not to accord with the observed difference in concavity between softness and loudness category scales relative to their respective psychological magnitude scales.

40 Semantic Differential Rating scale – An attitude measure consisting of a series of seven-point bipolar rating scales allowing response to a “concept” Think of your favorite type of cookie. Rate it on each of the following continua: Hard Soft Lots of chips Fewer chips Crispy chewy What type of scale would these data create? Diunduh dari: 24/8/2012 SEMANTIC DIFFERENTIAL RATING SCALE Journal of Marketing Management, Vol. 9:3, Winter 1999, ©1999 RATING THE RATING SCALES Hershey H. Friedman, and Taiwo Amoo Rating scales are used quite frequently in research, especially in surveys. Typically, an itemized rating scale asks subjects to choose one response category from several arranged in hierarchical order. Dishonest researchers can, of course, purposefully manipulate the outcome of their research, if they wish, but such biasing may also be totally unintentional. This paper examines issues involved in creating a relatively unbiased rating scale. These include: (1) Connotations of category labels; (2) Response alternative effects; (3) Implicit assumptions of the question; (4) Forced-choice vs. non-forced- choice rating scales; (5) Unbalanced and balanced rating scales; (6) Order effects; (7) Direction of comparison; (8) Optimal number of points; (9) Context effects; (10) Rating approach, e.g., improvement needed, performance, comparison to expectations, comparison to ideal, etc.

41 Numerical Rating scale – Similar to a semantic differential except that it uses numbers as response options to identify response positions instead of verbal descriptions Think of your favorite type of cookie. Rate it on each of the following continua: Hard Soft This scale is called an 8 point numerical scale, why? What type of scale would these data create? Diunduh dari: ………….. 24/8/2012 NUMERICAL RATING SCALE Numerical rating scale “A scale used for the subjective measurement of a clinical sign/syndrome, in which numerical scores are given (e.g. 0-4). A description is given for each score. The observer chooses, for each individual observed, the number on the scale which they consider most closely matches that individual.““ This system groups information in discrete units, which may place a constraint on the observer. The NRS can also be used without a descriptor for each score, but is improved by the addition of the descriptions.

42 Diunduh dari: ………….. 24/8/2012 NUMERICAL RATING SCALE Validation of the numerical rating scale for pain intensity and unpleasantness in pediatric acute postoperative pain: sensitivity to change over time Pagé, M. Gabrielle; Katz, Joel; Stinson, Jennifer; Isaac, Lisa; Martin-Pichora, Andrea L.; Campbell, Fiona. Journal of Pain, 13(4), (2012) Date: This study evaluates the construct validity (including sensitivity to change) of the numerical rating scale (NRS) for pain intensity (I) and unpleasantness (U) and participant pain scale preferences in children/adolescents with acute postoperative pain. Eighty-three children aged 8 to 18 years (mean = 13.8, SD = 2.4) completed 3 pain scales including NRS, Verbal Rating Scale (VRS), and faces scales (Faces Pain Scale-Revised [FPS-R] and Facial Affective Scale [FAS], respectively) for pain intensity (I) and unpleasantness (U) 48 to 72 hours after major surgery, and the NRS, VRS and Functional Disability Index (FDI) 2 weeks after surgery. As predicted, the NRSI correlated highly with the VRSI and FPS-R and the NRSU correlated highly with the VRSU and FAS 48 to 72 hours after surgery. The FDI correlated moderately with the NRS at both time points. Scores on the NRSI and NRSU at 48 to 72 hours were significantly higher than at 2 weeks after surgery. Children found the faces scales the easiest to use while the VRS was liked the least and was the hardest to use. The NRS has adequate evidence of construct validity including sensitivity for both pain intensity and unpleasantness. This study further supports the validity of the NRS as a tool to measure both intensity and unpleasantness of acute pain in children. Diunduh dari:

43 Constant Sum Scales – Attributes based on their importance to the person. Respondents are asked to divide a constant sum to indicate the relative importance of attributes Example: Suppose the photocopy budget per professor was $100 per month. How much should be allocated to the following. Divide the $100 according to your preference: ____ photocopying for student needs; ____ photocopying for research needs; ____ photocopying for committee needs. ==== $100 TOTAL Diunduh dari: ………….. 24/8/2012 CONSTANT SUM SCALES Constant-Sum Scales A scale that helps the researcher discover proportions is the constant-sum scale. 1.With this scale, the participant allocates points to more than one attribute or propertyindicant, such that they total a constant sum, usually 100 or In the Exhibit 13-2 example, two categories are presented that must sum to 100. In the restaurant example, the participant distributes 100 points among four categories to indicate the relative importance of each attribute: _____ Food Quality _____ Atmosphere _____ Service _____ Price 100 TOTAL 3.Up to 10 categories may be used, but both participant precision and patience suffer when toomany stimuli are proportioned and summed. 1.A participant’s ability to add is also taxed in some situations; this is not a responsestrategy that can be effectively used with children or the uneducated. 2.The advantage of the scale is its compatibility with percent (100 percent) and the fact thatalternatives that are perceived to be equal can be so scored—unlike most ranking scales. 3.The scale is used to record attitudes, behavior, and behavioral intent. 4.The constant-sum scale produces interval data.

44 Graphic Rating Scales – An attitude measure consisting of a graphic continuum that allows respondents to rate an object by choosing any point on the continuum Diunduh dari: 24/8/2012 GRAPHIC RATING SCALES 1.The graphic rating scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings are possible if participants are sophisticatedenough to differentiate and record them. 2.They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length (millimeters) from either endpoint. The results are treated as interval data. 3.The difficulty is in coding and analysis; this scale requires more time than scales with predetermined categories. Never __X___________ Always 4.Other graphic rating scales use pictures, icons, or other visuals to communicate with the rater and represent a variety of data types. 5.Graphic scales are often used with children, whose more limited vocabulary prevents the useof scales anchored with words

45 Rank-Order Scales Scales in which the respondent compares one item with another or a group of items against each other and ranks them. Diunduh dari: ………….. 23/8/2012 A Rank Order scale gives the respondent a set of items and asks them to put the items in some form of order. The measure of 'order' can include such as preference, importance, liking, effectiveness and so on. The order is often a simple ordinal structure (A is higher than B). It can also be done by relative position (A scores 10 whilst B scores 6). Example Please write a letter next to the four evening activities below to show your preference. Use A for your most preferred activity, B for the next preferred, then C for the next and then D for the least preferred. __ Staying in and watching television __ Going bowling __ Going out for a meal __ Going to a bar with a friend Discussion Sorting of ordinal data can be done in several ways:ordinal 1.Priority sorting looks for the most important first, then the next most important and so on. 2.Block sorting sorts items in to sub groups and then sorts the sub-groups (this is more important, that is less important - - then sort the 'more important' group). 3.Score sorting gives an absolute score to each item. 4.Pairwise sorting compares pairs of items, moving the more important item higher or giving it a higher score. 5.Q-Sorting is done by writing items on cards (Q-cards) and asking the subject to place these in order. 6.Swap-sorting uses pairwise comparison on cards or Post-It Notes in a vertical column, swapping each pair in turn until the whole column is in order. Rank order items are analyzed using Spearman or Kendall correlation.SpearmanKendall The Rank Order scale is also known as the Ranking scale.

46 LIKERT SCALE Diunduh dari: ………….. 24/8/2012 The Likert scale is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favorable or anunfavorable attitude toward the object of interest. 1.The participant is asked to agree or disagree with each statement. 2.Each response is given a numerical score to reflect its degree of attitudinal favorableness,and the scores may be summed to measure the participant’s overall attitude. 3.Summation is not necessary and in some instances may actually be misleading. The participant chooses one of five levels of agreement. 1.The numbers indicate the value to be assigned to each possible answer, with 1 the leastfavorable impression of Internet superiority and 5 the most favorable. 2.Likert scales also use 7 and 9 scale points. The Likert scale has many advantages that account for its popularity. 1.It is easy and quick to construct. 2.It is more reliable and provides more data than many other scales. 3.It produces interval data.

47 LIKERT SCALE Diunduh dari: ………….. 24/8/2012 Originally, creating a Likert scale involved a procedure know as item analysis. In the first step, a large number of statements were collected that met two criteria: Each statement was relevant to the attitude being studied; Each reflected a favorable or unfavorable position on that attitude. People similar to those who are going to be studied were asked to read each statementand to state the level of their agreement with it, using a 5-point scale. A scale value of 1 indicated a strongly unfavorable attitude (strongly disagree). Theother intensities were 2 (disagree), 3 (neither agree nor disagree), 4 (agree), and 5(strongly agree), a strongly favorable attitude. To ensure consistent results, the assigned numerical values are reversed if thestatement is worded negatively (1 is always strongly unfavorable and 5 is alwaysstrongly favorable). Each person’s responses are then added to secure a total score. The next step is to array these total scores and select some portion representing thehighest and lowest total scores (generally the top and bottom 10 to 25 percent). The middle group (50 to 80 percent of participants) are excluded from the subsequentanalysis.

48 Using Angler Characteristics and Attitudinal Data to Identify Environmental Preference Classes: A Latent-Class Model EDWARD MOREY, JENNIFER THACHER, and WILLIAM BREFFLE Environmental & Resource Economics (2006) 34: 91–115 Diunduh dari: ………….. 25/8/2012 A latent-class model of environmental preference groups is developed and estimated with only the answers to a set of attitudinal questions. Economists do not typically use this type of data in estimation. Group membership is latent/unobserved. The intent is to identify and characterize heterogeneity in the preferences for environmental amenities in terms of a small number of preference groups. The application is to preferences over the fishing characteristics of Green Bay. Anglers answered a number of attitudinal questions, including the importance of boat fees, species catch rates, and fish consumption advisories on site choice. The results suggest that Green Bay anglers separate into a small number of distinct classes with varying preferences and willingness to pay for a PCB-free Green Bay. The probability that an angler belongs to each class is estimated as function of observable characteristics of the individual. Estimation is with the expectation–maximization (E–M) algorithm, a technique new to environmental economics that can be used to do maximum-likelihood estimation with incomplete information. As explained, a latent-class model estimated with attitudinal data can be melded with a latent-class choice model.

49 Relating Environmental Ethical Attitudes and Contingent Valuation Responses Using Cluster Analysis, Latent Class Analysis, and the NEP: A Comparison G. Aldrich, K. Grimsrud, J. THACHER, and M. Kotchen September 1, 2005 Diunduh dari: ………….. 25/8/2012 Environmental ethics and attitudes may be an important source of heterogeneity when considering the welfare e ff ects and equity implications of policy changes dealing with environment and natural resources. The New Ecological Paradigm (NEP) Scale is a set of 15 likert questions and is intended to indicate whether an individual holds pro-environmental or anti-environmental beliefs. This paper provide an overview and comparison of three methodologies that may be applied to NEP survey data to detect environmental ethics groups: total NEP score, latent class analaysis, and cluster analysis methods. We find that while environmental attitudes do not significantly a ff ect average willingness to pay measures, there are significant di ff erences in willingness to pay across environmental attitude groups. The willingness to pay estimates for each attitudinal group are consistent across the di ff erent analystical measures.

50 Environmental and Resource Economics 14: 95–117, The Validity of Environmental Benefits Transfer: Further Empirical Testing ROY BROUWER and FRANK A. SPANINKS. 1 Diunduh dari: ………….. 25/8/2012 This paper provides further empirical evidence of the validity of environmental benefits transfer based on CV studies by expanding the analysis to include control factors which have not been accounted for in previous studies. These factors refer to differences in respondent attitudes. Questionnaires complying with Dillman’s (1978) ‘total design method’formail surveys were sent to randomly selected households. Since management agreements in peat meadow areas usually concentrate on the protection of meadow birds and ditch-side vegetation, these elements received most attention in the questionnaires. Except for some minor differences in wording, both studies used the same valuation scenarios. Traditional population characteristics were taken into account, but these variables do not explain why respondents from the same socio-economic group may still hold different beliefs, norms or values and hence have different attitudes and consequently state different WTP amounts. The test results are mixed. The function transfer approach is valid in one case, but is rejected in the 3 other cases investigated in this paper. We provide further evidence that in the case of statistically valid benefits transfer, the function approach results in a more robust benefits transfer than the unit value approach. We also show that the equality of coefficient estimates is a necessary, but insufficient condition for valid benefit function transfer and discuss the implications for previous and future validity testing.


Download ppt "SUMBER KESALAHAN DALAM ………. FOTO: smno.kampus.ub.febr2012."

Similar presentations


Ads by Google