Presentation is loading. Please wait.

# Quantitative Tools for Qualitative Data Richard Bell University of Melbourne.

## Presentation on theme: "Quantitative Tools for Qualitative Data Richard Bell University of Melbourne."— Presentation transcript:

Quantitative Tools for Qualitative Data Richard Bell University of Melbourne

For copies of this presentation -130 slides -(about 500kb in a zipped file) email: rcb@unimelb.edu.au

What kind of Qualitative Data can be Analysed? Not raw continuous text data Discrete text units that are replicated Any kind of coding that has been made

What does the data have to look like It must be able to be represented by a table not necessarily a two-way table for example –Giegler & Klein coding of personal advertisements –a four-way table: magazine, sex, concept, category

Categorization MagazineSexConceptFitnessCompassionFigureValuesErotic ZFSelf44995011101 ZFSeeking411291185 ZFRelationship601253 ZMSelf67976718207 ZMSeeking80911937 ZMRelationship10341 WNFSelf8141718107 WNFSeeking19143859 WNFRelationship200030 WNMSelf974342 WNMSeeking1126319 WNMRelationship10100 Giegler & Klein data as a four-way table

Here, data is stored as a table, the first 4 columns define the cells, the last column gives the frequency in the cell. To analyse this data at a case level Use the SPSS WEIGHT BY function ie WEIGHT BY FREQ.

Kinds of tables Rows are participants, columns are categories Rows are categories, columns are participants Rows are one set of categories, columns are another set of categories

Data in cells of table Indicator to indicate present/absence of relationship between rows and columns Frequencies or counts of indicators Values of categories

Indicator of present/absence of relationship between rows and columns

Data from Huber (1997) Site A B C D E F G H I SIN 0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12 SGR 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2 TCO 0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2 TIN 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16 TGR 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13 TOP 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18 SIN: Students learn as self-regulated individuals SGR: Students learn in autonomous groups TCO: Teacher is in control TIN: Teacher dominates, but allows some individual autonomy TGR: Teacher dominates, but allows some small group autonomy TOP: Teacher dominates, but is open to students' initiatives Proportions of Activities by Site (Frequencies)

Values of categories

Getting data into statistical packages such as SPSS Transfer data directly from qualitative packages such as Nvivo Use SPSS text-import wizard (best with precoded data, ie numbers) Enter data by hand

Transfer data from qualitative packages Need to be able to export tables. Should only be done for tables where rows (or columns) are units of analysis (ie documents or respondents) should be saved as a text file (ie has the extension.txt as in table1.txt)

Transfer data from printed table Type into SPSS Transfer table from Word document –Word document to Excel spreadsheet –Excel spreadsheet to SPSS spreadsheet

The Table in Word 1. Remove the heading row

1. Remove the headings Move subheadings into a column

Insert a new column into the table

Copy subheading into empty column cells that subheading applied to

Shorten text and insert headings in columns that will become SPSS variable names (ie < 9 characters no spaces Select table and copy to clipboard

Open Excel Paste from clipboard

Save spreadsheet

Open SPSS Under the File pull down to Open new Data

Change the file type to Excel files [.xls] And open the saved Excel spreadsheet

If you have names as column headings in the first row of the Excel spreadsheet SPSS can read them as its variable names

SPSS opens the file (the variable view)

The data view Notice there are dud lines in this file -they need to be edited out

The file fixed up

Now we need to change our a) repeated phrases (variable type) b) symbols (variables p1 to p8) into numbers Do this thru Automatic Recode under the Transform tab

Need to create a numeric variable into which values of alphanumeric variable are transformed (alphanumeric values saved as labels)

Transferring Cross-Category tables into SPSS [where Rows are one set of categories, columns are another set of categories] Three types of table: –Cells of the table contain frequencies –Cells of the table contain other data –Cells of the table contain binary indicator (yes/no, true/false, present/absent etc)

Transferring Frequency Tables: 1 If only two dimensions to table (rows are categories of one variable, columns are categories of another) –can feed table straight in as table easy but wont have labelled output –feed table in cell by cell (as for more complex tables) more complex but allows for labelled output and other possibilities

Feeding table in as table Only have cells of table as data Can only run one procedure (correspondence analysis) via syntax.

Feeding table in cell by cell Have to use syntax (data list function) data list free / block slice row column frequency. begin data. 1 1 1 287 1 1 2 143 1 2 1 94 1 2 2 23 end data.

Data list FREE / EMS PMS GENDER MARSTAT FREQ. Weight by freq. Begin data. 1 1 1 1 17 1 1 1 2 4 1 1 2 1 28 1 1 2 2 11 1 2 1 1 36 1 2 1 2 4 1 2 2 1 17 1 2 2 2 4 2 1 1 1 54 2 1 1 2 25 2 1 2 1 60 2 1 2 2 42 2 2 1 1 214 2 2 1 2 322 2 2 2 1 68 2 2 2 2 130 end data. Var labels EMS, 'Extramarital Sex'/ PMS, 'Premarital Sex' / GENDER, 'Gender' / MARSTAT,'Marital Status'. Value labels EMS, PMS, 1 'Yes' 2 'No' / GENDER, 1 'Women' 2 'Men' / MARSTAT, 1 'Divorced' 2 'Still Married'.

Traditional Quantitative Methods for Qualitative Data Miles & Huberman (1994) –hierarchical cluster analysis Giegler & Klein (1994) –correspondence analysis Bazely (2002) –cluster analysis –correspondence analysis

Cluster Analysis Figure 9.11 (p.203) from Graham Gibbs (2002) Qualitative data Analysis: Explorations with Nvivo as an SPSS data file

Cluster Analysis: Solution I Dendrogram using Average Linkage (Between Groups): Chi-square measure Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Worklink 10 òø Youth Training 11 òôòòòòòòòòòø Adult training 1 ò÷ ùòòòòòø Redundancy Counselli 6 òûòòòòòòòòò÷ ó Start Up Business un 7 ò÷ ùòòòòòòòòòòòòòòòòòø Training Access Poin 8 òø ó ó Workers Coops 9 òôòòòòòòòòòòòòòòò÷ ùòòòòòø Business Access Sche 3 ò÷ ó ùòòòòòòòø Careers & Education 4 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó ó BCETA 2 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis: Solution II Dendrogram using Average Linkage (Between Groups): Anderbergs D Measure Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø Training Access Poin 8 ò÷ ó BCETA 2 òûòòòòòòòòòòòòòòòòòòòòòòòø ó Start Up Business un 7 ò÷ ùòòòòòòòø ó Careers Information 5 òòòòòòòòòòòòòòòòòòòòòòòòò÷ ùòòòòòòòòòòòòòòòú Adult training 1 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷ ó Worklink 10 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Business Access Sche 3 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis: Solution III Dendrogram using Single Linkage Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ Careers & Education 4 òûòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòø Training Access Poin 8 ò÷ ó Worklink 10 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Youth Training 11 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Workers Coops 9 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Business Access Sche 3 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú BCETA 2 òø ó Start Up Business un 7 òú ó Careers Information 5 òôòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòú Adult training 1 ò÷ ó Redundancy Counselli 6 òòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòòò÷

Cluster Analysis Varies according with coefficient chosen as measure of association between rows (or columns) Varies according to method of clustering Use with extreme caution

Other Quantitative Methods Find weights for categories of variable that maximize relationships between variables correspondence analysis –finds weights for categories of row and categories of column also traditional least-squares procedures –eg regression, principal components & others

Correspondence Analysis Similar to principal components Originally derived for tables of frequencies –[for statistics to apply need one respondent per cell, but can be used with multiple responses across cells] but can be used with indicator data Can produce separate maps of relationships between categories of rows or columns Can produce a joint map of categories of rows or columns

Giegler & Klein Examined personal advertisements in a number of German magazines eg Young man, 35 y, 176cm, slim with car, good income, looks for a lovely high- bosomed and well-developed partner for a common future.

CI IM AP HEC FB CLC SEX BA HIP IV SBE SB FO HT NAT 30Y 45Y 60Y OLD PO 1001 2 2 1 0 1 3 0 1 0 1 2 0 1 1 0 1 0 0 0 2 1002 2 1 0 0 1 0 0 0 0 1 1 1 0 1 1 0 0 1 0 1 Data: One row per ad Each column contains number of instances for each coding category ie Each ad will appear a number of times in the cell of any table – total frequency of table is the number of codings not the number of ads

Cut-down version of Giegler & Klein example CategoryMagazine ZWNWAZTIPEXPH&W High SES2343927142927 Fitness2396858445544 Compassion2172443193513 Sex499525471127 Figure1523257498546 Image434125303690 Values58651614645 Erotic434227303313268374 Friend303104132182197224 Family515197291282344353 Travel2601491119890130 30yo20857135116143283 45yo13220585254116 60yo37101132931 Old3610885689744 Hedonist165124187146156127 Wowser701099134113160 Social141324529148 Single56132391218 Separated541526221681

Correspondence Analysis In SPSS one of the data reduction options (like factor analysis) as Correspondence Analysis [can be run as syntax or point- and-click] also a syntax-only option called ANACOR which is more limited but can analyse a table directly when the only data in the SPSS spreadsheet is the table frequencies.

data list free / A B C D E F G H J. begin data. 0.11 0.07 0.09 0.13 0.15 0.16 0.15 0.12 0.12 0.07 0.05 0.09 0.13 0.18 0.16 0.2 0.18 0.2 0.5 0.46 0.43 0.34 0.33 0.28 0.26 0.25 0.2 0.17 0.21 0.19 0.09 0.13 0.2 0.13 0.16 0.16 0.07 0.09 0.11 0.14 0.1 0.07 0.13 0.16 0.13 0.09 0.12 0.09 0.17 0.11 0.13 0.12 0.14 0.18 end data. do repeat xs = A to J. compute xs = xs * 100. end repeat. ANACOR TABLE = ALL (6,9). ANACOR syntax example: Huberman proportions table shown earlier Indicates data values separated by spaces Identifies columns } Changes data values from proportions to percentages Simplest ANACOR syntax (just identifies numbers of rows & columns)

Correspondence Analysis The point-and-click way

Dimension Singular ValueInertiaChi SquareSig. Proportion of Inertia Confidence Singular Value Accounted forCumulative Standard Deviation Correlation 2 1.299.089.576.008.161 2.198.039.252.828.009 3.141.020.129.956 4.062.004.024.981 5.055.003.0191.000 Total.1551948.580.000(a)1.000 a.Five possible dimensions b.Singular value – square root of eigenvalue c.Inertia – eigenvalues (variance) d.Chi-square – could be partitioned between dimensions (only valid if cells in table are independent) a.b.c.d. -How many dimensions? -Fit of Solution

Score in Dimension Inertia Contribution Of Points to Inertia of Dimension Of Dimension to Inertia of Point Mass 121212Total Z.264-.779.380.056.537.193.863.136.999 WN.106-.349-.939.030.043.473.130.625.755 WAZ.143.137-.276.007.009.055.123.331.454 TIP.138.391-.166.011.071.019.559.067.625 EXP.149.213-.218.010.023.036.198.137.335 H&W.199.691.472.042.318.224.676.208.884 Active Total 1.000.1551.000 Details for Magazines Location in spatial representation Different ways of describing fit of each magazine

ContentMassScore in Dimension1Contribution 12 Of Point to Inertia of DimensionOf Dimension to Inertia of Point 212Total High SES.029-1.463.670.022.211.067.873.121.995 Fitness.040-.939.125.011.119.003.985.011.996 Compassion.028-1.407.626.019.185.055.859.113.971 Sex.029.830.438.007.066.028.801.147.949 Figure.034-.418.085.004.020.001.487.013.501 Image.021.470.012.004.016.000.360.000.360 Values.016-.456-.639.010.011.034.105.137.242 Erotic.153.109-.172.002.006.023.262.437.699..091.040.061.001.000.002.034.050.084 Family.158-.004-.063.001.000.003.001.109.110 Travel.067-.367-.278.005.030.026.493.188.681 30yo.075.384.006.037.056.548.363.911 45yo.034.079.582.002.001.059.026.944.970 60yo.010.130.350.002.001.006.030.145.175 Old.035.181-1.417.015.004.354.023.955.978 Hedonist.072.118-.578.006.003.122.048.756.804 Wowser.047.814.160.012.103.006.767.020.786 Social.0211.482.970.020.157.102.721.204.926 Single.010-.676.275.002.016.004.742.081.823 Separated.017.379.717.004.008.044.192.455.647 Active Total1.000.1551.000 Similar Fit information for ad categorizations

More Complex versions… Sometimes known as Multiple Correspondence Analysis HOMALS HOMogeneity analysis by Alternating Least Squares For example The complete data structure of Giegler & Klein

Categorization MagazineSexConceptFitnessCompassionFigureValuesErotic ZFSelf44995011101 ZFSeeking411291185 ZFRelationship601253 ZMSelf67976718207 ZMSeeking80911937 ZMRelationship10341 WNFSelf8141718107 WNFSeeking19143859 WNFRelationship200030 WNMSelf974342 WNMSeeking1126319 WNMRelationship10100 Giegler & Klein data as a four-way table

Some other questions How well could we predict magazine usage from the other factors? Could use –multinomial regression if cells independent (and sample size very large) –categorical regression if just want to look at effects

A new issue: The kind of transformation to be chosen

Kinds of tranformations Depends on what we want to assume Not inherent in the data Basic Kinds –Nominal - Categorical (unordered categories) –Ordinal (Assumes data are ordered) –Numeric -Interval (Assumes data on a scale with equal intervals) Recent advance –Spline (smoothes ordinal & nominal transformations)

Model Summary Multiple RR Square Adjusted R Square.338.115.113 Dependent Variable: MAGAZINE Predictors: SEX CONCEPT CATEGORY Standardized Coefficients BetaStd. Error df F-ratio Prob SEX-.195.0082535.707.000 CONCEPT-.034.008316.377.000 CATEGORY.273.008201052.26 7.000

Another example: How do characteristics distinguish among groups? Famous example (Not real)

GROUP Interaction Intensity Interaction Frequency Feeling of Belonging Physical Proximity Relationship Formality Crowdslight nonecloseformal Audiencelownonrecurringslightcloseformal Publicslight distantno relationship Mobhighnonrecurringhighcloseinformal Familyhighfrequenthighcloseinformal Relativesmoderateinfrequentvariabledistantformal Communitylowinfrequentvariablecloseformal Summary of a qualitative analysis of the characteristics of groups as postulated by Gutman from Bell & Sirjamaki (1962)

Category Quantifications Here the data were all treated as nominal Dimensions were quantification values Different quantifications for different dimensions Only possible for nominal data Other (ordinal, numeric) must have same quantification on each dimension. Nominal can also be similarly restricted.

For example: Using regression Make the group the dependent variable Other nominal variables cannot be multiple- nominal because regression coefficients are unidimensional Use other variables to predict group –Artificial example few cases relatively many variables will give perfect prediction –Can still compare prediction & evaluate categories

Predictors of Group Standardized Coefficients Beta Interaction Intensity-1.084 Interaction Frequency.689 Feeling of Belonging1.219 Physical Proximity-.209 Relationship Formality.060

Principal Components: Demographics Age Group [treat as ordinal] Education Level [treat as ordinal] Marital Status [ nominal ] Work Status [ nominal – allow different quantications for different dimensions]

Combining Qualitative & Quantitative Data The availability of numeric and other transformations makes the combining of quantitative & qualitative data simple

Combining Qualitative & Quantitative Data Use Categorical Regression setting measurement levels appropriately Use Categorical Principal Components setting measurement levels appropriately Save transformed variables and use ordinary regression or factor analysis for better options (eg hierarchical regression or factor rotation)

Combining Qualitative & Quantitative Data Preserve independence of sets of data Generalized (more than two sets) non- linear canonical variate analysis OVERALS

A tool for relating sets of variables Variant that is a common statistical model is canonical variate analysis (producing a canonical correlation between two sets of variables OVERALS –Allows for more than two sets –Allows variables to be numeric, categorical or ordinal

A current data set PhD project by Simone Pica People with psychosis featuring social withdrawal –19 young people suffering from psychosis with symptoms of social withdrawal –Unstructured interviews –Standard psychiatric measures also completed

Data Interviews transcribed, categories formed from content, coding made Diagnosis (DSM III-R) Scores on quantitative measures –Premorbid Adjustment Scale (PAS) –Symptoms of Negative Schizophrenia (SANS)

Raw material Um, when I got home I thought it was probably a good thing I didnt go because um, it sort of relates to motivation as well, I wasnt really that motivated to go out and deal with people and stuff. If more of my friends were there, Id probably would have gone, if it was a party and all my friends were there I would have thought cool you know, Id have to go even if I only had a few dollars, thats cool, I can go without drinks, cigarettes, Id just want to be there you know but probably because there would have been only a couple of people I would have known there and the rest of them I wouldnt have known. I sort of thought no, I wouldnt have a good time because if I wanted to meet people, I like meeting people, but when I meet people I always have to talk about my psychosis, and whenever I have to talk about my psychosis, its like everyone is listening you know, and they all just stop what they are doing and they listen, psychosis, what is that? and then I have to explain everything about it and they are all listening type of thing, honing in type of thing.

Classified material 3. EXPERIENCED DIFFICULTY COMMUNICATING He couldnt talk because he became jumbled, he couldnt focus on one thing he kept thinking about whether his ex-friend was going to mention the letter to other people there He stayed in small groups of people throughout the evening in order to avoid saying something inappropriate that would draw attention to him When he felt comfortable he found it easier to talk He found that the comfortable feeling didnt last, it wore off when the wall came and he found it difficult to think of things to talk about When he was with the group of people he didnt know what to talk to people about so he remained silent He didnt know what to talk about because he couldnt think of anything intelligent to say When he was with people and he didnt know what to talk about his mind was blank, he didnt think anything

felt different stressed uncomfortable difficulty communicating concern about others views of them 1 AbsentPresentAbsent 2 Present 3 Absent Present 4 Absent Present 5 6 AbsentPresent Qualitative Data: eg Presence of categories in interview transcripts

DSM-IIIR diagnosisFrequencyPercentCumulative Percent Schizophrenic1155.057.9 Schizophreniform315.073.7 Schizoaffective210.084.2 Delusional210.094.7 Bipolar15.0100.0 Qualitative measures: eg DSM diagnosis

PAS ChildPAS AdolescPAS Adult 1 469 2 168 3 5811 4 654 5 455 6 467 7 445 Quantitative Measures: eg Premorbid Adjustment Scales

Fit of Solution Summary of Analysis DimensionSum 12 LossSet 1.220.545.764 Set 2.359.267.626 Set 3.284.302.585 Set 4.119.326.445 Mean.245.360.605 Eigenvalue.755.640 Fit 1.395

Summary of Analysis Dimension 1 2 Sum LossPAS.220.545.76431.6% SANS.359.267.62625.9% Text.284.302.58524.1% DSM.119.326.44518.4% Mean.245.360.605 (Loss) 30% Eigenvalue.755.640 1.395 (Fit) 70% Total 1.000 1.000 2.000 100% Fit of Solution

Some pointers for Optimal Scaling for SPSS optimal scaling –CATREG & CATPCA have most sophisticated options –CATREG produces standard regression output –Both CATREG & CATPCA can save transformed variables (for repeating analysis in ordinary mode eg for rotating components) Eliminate need to specify range (unlike HOMALS & OVERALS which must have range 1 to n specified)

Some pointers for Optimal Scaling Cautions In general category quantifications only hold for the set of variables in the analysis (Incredibly) there is little published experience with these techniques Remember to use in exploratory mode –Change transformations and see what happens –Delete outlying variables/categories

the end for more information, email rcb@unimelb.edu.au

Similar presentations

Ads by Google