# Quantifying Data.

## Presentation on theme: "Quantifying Data."— Presentation transcript:

Quantifying Data

Data Entry Define variables, enter case data, conduct runs
Coding and Recoding If numeric values not pre-assigned, decide on coding system If there is open-ended data, would need to decide how to deal with responses Defining your variables

Data Cleaning Reread each set of responses back (immediately) to confirm accuracy “Possible-code cleaning” easiest way to check is to run a frequency distribution Contingency cleaning On the “if” questions “Sort” by response do you recycle… then check the “what do you recycle” variable Can also run cross tabs and make sure cells are empty

Basic Analysis – Measures of Central Tendency
Mean: sum of values divided by the number of cases simple average Median: middle attribute in a list of observed attributes extreme cases eliminated Mode: most frequently occurring attribute used with nominal variables, i.e.. sex most respondents were women usually report with percentage, 60% were women

Cross Tabs Used often with Bivariate data Convention usually places
“independent variables” across top in columns “dependent variables” in rows below

Coding and data entry options
Transfer sheets are special forms ruled off in 80 columns Edge coding involves recording code #'s in margins of questionnaires Direct data entry involves entering data directly into computer; eliminating transfer sheets Data entry by interviewer (CATI) Optical scan sheets

Coding What is it? It is the assignment of numerical values to information or responses gathered by a research instrument Codebook: describes the locations of variables and lists the codes assigned to the attributes of the variables

Data Management Process
concerned with the process by which raw data gathered by some instrument are converted into numbers for analysis purposes

Collect information with data gathering instrument
Use codebook to transfer this information to a transfer sheet or code sheet (optional) Create data file from information on code sheet by entering data from a computer keyboard Check/clean up data file for accuracy Data cleaning done by Computer edit programs Examine distributions Contingency cleaning

Read through responses a create a preliminary code based on responses If more than 10% of responses fall into "other" category, code needs to be revised to include many of these responses

Elementary Quantitative Analyses
To understand the meaning of univariate, bivariate, and multivariate analysis To become familiar with the meaning of several univariate and bivariate statistics

Analysis Strategies Why do we have to have them?
People who read our ‘research’ are interested in the highlights Should try to communicate findings in an understandable and ‘painless fashion’

Three types of analysis
Univariate analysis the examination of the distribution of cases on only one variable at a time (e.g., college graduation) Bivariate analysis the examination of two variables simultaneously (e.g., the relation between gender and college graduation) Multivariate analysis the examination of more than two variables simultaneously (e.g., the relationship between gender, race, and college graduation)

“Purpose” Univariate analysis Purpose: description Bivariate analysis
Purpose: determining the empirical relationship between the two variables Multivariate analysis Purpose: determining the empirical relationship among the variables

Types of Statistics Techniques that summarize and describe characteristics of a group or make comparisons of characteristics between groups are knows as descriptive statistics. Inferential statistics are used to make generalizations or inferences about a population based on findings from a sample. The choice of a type of analysis is based on the evaluation questions, the type of data collected, and the audience who will receive the results.

Univariate Analysis Involves examination of the distribution of cases on only ONE variable at a time Frequency distributions are listings of the number of cases in each attribute of a variable Ungrouped frequency distribution Grouped frequency distribution Proportions express number of cases of the criterion variable as part of the total population; frequency of criterion variable divided by N

Percentages are simple 100 X proportion
Or [100 X (frequency of criterion variable divided by N)] Rates make comparisons more meaningful by controlling for population differences

Measures of Central Tendency
Measures of central tendency reflect the central tendencies of a distribution Mode reflects the attribute with the greatest frequency Median reflects the attribute that cuts the distribution in half Mean reflects the average; sum of attributes divided by # of cases

Measures of Dispersion
Measures of dispersion reflect the spread or distribution of the distribution Range is the difference between largest & smallest scores; high – low Variance is the average of the squared differences between each observation and the mean Standard deviation is the square root of variance

Types of Variables Continuous: increase steadily in tiny fractions
Discrete: jumps from category to category

Subgroup Comparisons Somewhere between univariate & bivariate, are Subgroup Comparisons Present descriptive univariate data for each of several subgroups Ratios: compare the number of cases in one category with the number in another

Bivariate Analysis Bivariate analysis focus on the relationship between two variables

Contingency Tables Format: attributes of independent variable are used as column headings and attributes of the dependent variable are used as row headings Guidelines for presenting & interpreting contingency tables Contents of table described in title Attributes of each variable clearly described Base on which percentages are computed should be shown Norm is to percentage down & compare across Table should indicate # of cases omitted from analysis

Multivariate Analysis
Multivariate Analysis allow the separate and combined effects of the independent variable to be examined