Presentation is loading. Please wait.

Presentation is loading. Please wait.

Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization.

Similar presentations


Presentation on theme: "Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization."— Presentation transcript:

1 Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization

2 Why the Social Researcher Uses Statistics The Nature of Social Research Why Test Hypotheses? The Stages of Social Research Using Series of Numbers to Do Social Research Functions of Statistics

3 The Nature of Social Research. Social Researchers Test hypotheses A hypothesis is a prediction about the relationship between variables. It is usually based upon theoretical expectations about how things work. At minimum any hypothesis involves two variables: an independent variable (IV) and a dependent variable (DV). X  Y What are independent and dependent variables? Presumed cause and effect notion.

4 Why Do We Test Hypotheses? Hypothesis testing is a foundation of science. In statistical inference, hypotheses generally take one of the two forms: substantive and null. A substantive hypothesis represents an actual expectation. E.g.: higher education increases the likelihood of upward mobility.) To decide whether a substantive hypothesis is supported by the evidence it is necessary to test a related hypothesis called the null hypothesis. (E.g.: education has no effect on upward mobility.)

5 Hypothesis Testing Testing hypotheses involving: (a) Standard-comparisons: Is the mean value in our sample the same as in the population? Is the observed distribution of X a normal distribution? IV: standard/non-standard (b) Group-comparisons: Do men earn (on the average) more than women? Is the variance in the GPA greater among sociology students than among mathematics students? IV: one group vs. another (c) Assessment of correlation: Are diet preferences (X1) associated with political preferences (X2)? IV: X1 or X2, undetermined (d) Impact assessment: To what extend does education (X) influence earnings (Y)? IV: X

6 The Stages of Social Research 1) Specify research goals. What you want to investigate and why? 2) Review the literature. Place your question in the context 3) Formulate hypotheses. Provide a theoretical model (a set of propositions). Chose variables and specify hypotheses. 4) Measure and record. (A) Define population and select sample. B) Develop instruments. C) Collect data. 5) Analyze the data: Test hypotheses. Draw conclusions 6) Invite scrutiny. Make decisions about the fit of data and theory. Results are communicated to an audience. (Confirm or reject your initial theory)

7 Statistics Quantitative methods in sociology involve statistics - a set of mathematical techniques used to organize, analyze, and interpret information that takes the form of numbers. Full measurement procedure specifies values for each variable across all observation units. Data (plural) are specific measurements across all observation units. A datum (singular) is a single measurement (= variable value).

8 A Framework for Statistical Work Units of observation/analysis (cases): - individuals, households, families, groups - communities, cities - countries, regions Variables: data characterizing units of observation

9 Matrix form of data, X ij, i = unit of analysis (1,…,N), j = variable (1,…,K) Cases Variables Age j =1 Gender j = 2 Education j =3 …Political Party j = k i =1 (john)21015…1 i =2 (jil)27116…2 i =3 (beth)18112…0 i =4 (alicia)23116…1 i =5 (josh)34021…2 ….…………… i = N - 117011…2 i = N (last person)36117…3

10 Labels Variable names vs. variable labels (values) The variable name is a mnemonic. The variable name is a descriptive phrase, usually only a few words long that captures the essence of what the variable is about.

11 Labels Assigning value labels Continuous variables usually do not need value labels. Examples: income, results of complicated tests, age, year in the labor force. STATA researchers’ advise: If a variable has limited number of values k (k < 10), it is better to label them all or at least a subset, independently of the level of measurement.

12 Missing Values Meaning: Lack of information. Erroneous information. Non-interpretable information. Don’t know as a special category – Is this missing datum?

13 Surveys as Data Sources for Hypotheses Testing A survey is a method of gathering information from a sample of a population, through: -written questionnaires that respondents fill out themselves; -face-to-face interviews, -telephone interviews, -e-mail

14 Functions of Statistics - to describe the data - to infer from a sample about the population

15 Levels of Measurement The level of measurement of a variable refers to the type of information that the numbers assigned to units of observation contain. Four levels of measurement: nominal (categorical; discrete) ordinal (rank-order) interval (distance) ratio (zero-reference)

16 Level of Measurement: Nominal Nominal Names are assigned to units of observation as labels. The name in the set is the "value" For practical data processing the names are numerals, but in that case the numerical values is irrelevant. Nominal Variables (qualitative): observations consist of separate categories that are labeled (we cannot order them).

17 Level of Measurement: Nominal Ex: “Dummy” (dichotomous) variables; Religious affiliation (1, 2, 3, 4, 5 where 1 = Catholic, 2 = Protestant, 3 = Jewish, 4= Muslim, 5 = Other) Party affiliation Social Class Gender (0, 1 – where 0 = female; 1 = male)

18 Level of Measurement: Ordinal The numbers assigned to objects represent the rank order (1st, 2nd, 3rd, n-th) of the entities measured. The numbers are called ordinals. The variables are called ordinal variables or rank variables. Comparisons of greater and less can be made, in addition to equality and inequality. However, operations such as conventional addition and subtraction are still meaningless.

19 Level of Measurement: Interval The numbers assigned to objects have all the features of ordinal measurements, and the differences between arbitrary pairs of values can be meaningfully compared. Operations such as addition and subtraction are therefore meaningful. The zero point on the scale is arbitrary; negative values can be used. Ratios between numbers on the scale are not meaningful, so operations such as multiplication and division cannot be carried out directly.

20 Levels of Measurement: Ratio The numbers assigned to objects have all the features of interval measurement and also have meaningful ratios between arbitrary pairs of numbers. Operations such as multiplication and division are therefore meaningful. The zero value on a ratio scale is non-arbitrary. Variables measured at the ratio level are called ratio variables.

21

22 Level of measurement and the purpose of the study Example 2: Education 1. Elementary, 2. Some High School, 3. High School Completed, 4. Community College, 5. Liberal Arts College Incomplete, 6. College Completed, 7. Above College Nominal Scale (Labels? See 4, 5) Ordinal Scale? Interval? After recoding into years of schooling: 1=8, 2=10, 3=12, 4=14, 5=14, 6=16, 7=18

23 Specifying Levels of Measurement STATA distinguishes between the scale level (that is numerical: interval and ratio) from ordinal and nominal.

24 Recoding Recoding into “metric variables” Any nominal variable can be recoded into a set of 0,1 variables, called also dummies. Ordinal variables can be recoded into interval variables, if (a) ranks are interpretable as having a property of equal distances between them (e.g. Likert scale); (b) we can assign some know values to the ranks (e.g. years of schooling); (c) we can derive the values from the distribution properties (e.g. mid-points of the cumulative distribution).

25 Measurement The measure should be: Valid Reliable Exhaustive Mutually Exclusive Validity refers to the extent to which an empirical measure adequately reflects the real meaning of the concept under consideration. Reliability refers to the likelihood that a given measurement procedure will yield the same description of a given phenomenon if that measurement is repeated. Reliability is the consistency of measurement.

26 Measurement A good measuring device must meet the condition of exhausting the possibilities of what it is intended to measure. Mutually exclusive means that each observation fits one and only one of the scale values (categories). __________________________________________ Missing values. Lack of information. Erroneous information. Non-interpretable information __________________________________________


Download ppt "Warsaw Summer School 2011, OSU Study Abroad Program Fundamentals of Research Design Data organization."

Similar presentations


Ads by Google