Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of.

Similar presentations


Presentation on theme: "Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of."— Presentation transcript:

1 Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of Sociology Manchester Metropolitan University g.pollock@mmu.ac.uk

2 Longitudinal processes start and end times (EHA) competing risk, multi-episode (EHA) contiguous states as a single DV (SA) ie. SA offers an alternative (complementary) approach to EHA

3 Sequence analysis using OMA 1.Sequences of statuses are processed by…. 2.Optimal Matching Analysis (OMA) which results in … 3.A distance matrix representing the closeness (proximity) of each sequence with all others which can then be processed by… 4.Cluster analysis which leads to the construction of… 5.A typology of sequence categories

4 Single Sequences social class (S/N/M) eg. case 1:SSSSSSSSSS case 2:NNNNNSSSSS case 3:NNNNNMMMMM etc. Case Analysis: resulting typology is an end-in-itself Variable Analysis: typology as a predictor or a dependent variable Class, employment status, qualifications, housing, marital status, housing.. can all be analysed in this way – a range of typologies…but these dont account for interactions as they are each independently arrived at why not combine sequence data prior to analysis in order to capture interactions?

5 Analysis: process Create sequence data file Determine what to do with internal gaps (fill, delete or skip) Determine the costs to be used in the OMA (indel and substitution). These are the parameters which define the distances between the sequences. They work by giving low distance scores to similar sequences and high scores to dissimilar sequences Perform the OMA (though there other SA techniques) Weight the distances scores to account for different sequence lengths Perform cluster analysis Analyse clusters (i. sequence progression ii. covariates)

6 Indel and substitution costs case 1:SSSSSSSSSS case 2:NNNNNSSSSS case 3:NNNNNMMMMM If INDEL = 1 and SUBS = 2 (often a default setting) 1,2 = 10 1,3 = 20 2,3 = 10 If INDEL = 1 and SUBS = 2 (NM, MN, SM,MS) and 1.5 (NS,SN) 1,2 = 7.5 1,3 = 17.5 2,3 = 10

7 Data: BHPS 1991-2007 born 1970-1975 tracked from age 21 to 29 data shifted to a common time axis class and qualifications examined here (housing, marriage, employment status and fertility status also processed) All internal gaps filled All sequence lengths included Year on year transitions used to inform substitution costs

8 Sequence gaps over waves A to N

9 Data: BHPS 1991-2007

10 Single Sequences: class C 21 22 23 24 25 26 27 28 29 1 3 3 3 3 3 -1 -1 -1 -1 2 3 3 3 3 3 3 2 3 3 3 4 3 3 3 3 3 3 2 3 4 3 4 4 4 3 3 2 1 -1 5 6 6 4 4 6 -1 -1 -1 -1 6 5 5 5 5 5 6 6 6 6 7 0 0 3 3 3 3 3 3 3 8 0 0 0 2 4 2 2 2 2 9 3 3 3 3 3 3 3 1 3 10 2 2 4 4 2 2 5 6 1 0 = no job yet 1 = Service class Higher 2 = Service class Lower 3 = Non-manual 4 = Self 5 = Skilled 6 = unskilled

11 Proportions of time spent in a particular class

12 Year on year class transitions

13 Year on year class transitions: off diagonal proportions (N = 1512)

14 Class substitution costs None sch scl nm self skil unsk None 0.0, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, sch 1.8, 0.0, 1.2, 1.3, 1.8, 1.7, 1.7, scl 1.8, 1.2, 0.0, 1.1, 1.7, 1.3, 1.3, nm 1.8, 1.3, 1.1, 0.0, 1.7, 1.6, 1.3, Self 1.8, 1.8, 1.7, 1.7, 0.0, 1.6, 1.6, Skil 1.8, 1.7, 1.3, 1.6, 1.6, 0.0, 1.2, unsk 1.8, 1.7, 1.3, 1.3, 1.6, 1.2, 0.0;

15 Cluster analysis of class sequences An eight cluster solution produces the following: Clus % cases description 1 17 non manual, little if any mobility 2 12 service class, lower, little mobility 3 13 unskilled, little mobility 4 12 moving from unskilled to skilled work 5 15 mixed 6 6 skilled, little mobility 7 19 upwards mobility, NM, SCL, SCH 8 5 self employed, little mobility

16 Single Sequences: highest qualification C 21 22 23 24 25 26 27 28 29 1 2 2 2 2 2 -1 -1 -1 -1 2 2 2 2 2 2 2 2 1 1 3 2 2 2 2 2 2 2 2 2 4 2 1 1 1 1 1 1 -1 -1 5 5 5 5 5 6 3 3 3 3 3 3 3 3 3 7 2 2 2 2 2 2 2 2 2 8 3 3 2 2 2 2 2 2 2 9 3 3 3 3 3 3 3 3 2 10 2 2 2 2 2 2 2 1 1 1 = HE 2 = Post GCSE/O grade 3 = GCSE / O grade 4 = Other 5 = None/at school

17 Proportions of time spent in highest qualification statuses

18 Year on year changes in HEQ

19 Year on year changes in HEQ: off diagonal proportions (N = 248)

20 HEQ substitution costs None HE A O oth none None 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, HE 1.8, 0.0, 2.0, 2.0, 2.0, 2.0, A 1.8, 1.1, 0.0, 2.0, 2.0, 2.0, O 1.8, 1.8, 1.2, 0.0, 2.0, 2.0, Other 1.8, 1.7, 1.6, 1.7, 0.0, 1.8, None 1.8, 1.8, 1.6, 1.7, 1.7, 0.0;

21 Cluster analysis of HEQ sequences A seven cluster solution produces the following: Clus % cases description 117from GCSE to post-GCSE 2 7late post GCSE to HE 330post GCSE, stable 413early post GCSE to HE 5 6no qualifications 614GCSE, stable 7 11other, stable

22 Multiple Sequence Analysis (MSA) combine different sequences prior to OMA processing eg. class, qualifications, (housing, marital and fertility statuses) are combined in a single measure the sequences represent a narrative of change (or stability) on the measured dimensions the resulting typology can be analysed using case and variable methods as before but is in itself a representation of complex time embedded associations between the source variables

23 Multiple Sequences: class and highest qualification C 21 22 23 24 25 26 27 28 29 1 23 23 23 23 23 -1 -1 -1 -1 2 23 23 23 23 23 23 22 13 13 3 24 23 23 23 23 23 23 22 23 4 23 14 14 14 13 13 12 -1 -1 5 56 56 54 54 56 -1 -1 -1 -1 6 35 35 35 35 35 36 36 36 36 7 20 20 23 23 23 23 23 23 23 8 30 30 20 22 24 22 22 22 22 9 33 33 33 33 33 33 33 31 23 10 22 22 24 24 22 22 25 16 11 1 st Digit: 1 = HE 2 = Post GCSE/O grade 3 = GCSE / O grade 4 = Other 5 = None/at school 2 nd Digit: 0 = no job yet 1 = Service class Higher 2 = Service class Lower 3 = Non-manual 4 = Self 5 = Skilled 6 = unskilled

24 Year on year changes This is a large (35 by 35 ) matrix Calculation of substitution costs as for single sequence structure Frequent transitions: 12 11 (2.9%) 13 12 (2.3%) 21 22 (2.6%) 22 21 (2.6%) 22 23 (4.5%) 23 22 (5.9%) 26 25 (2.4%)

25 Sequence analysis of class- HEQ data Clus%description 111post GCSE, NM, stable 2 8post GCSE HE, NM SCL 3 5no quals, self emp unsk 410 GCSE, mixed emp (self,sk,unsk) 5 7post GCSE, NM SCL 6 7GCSE, NM both stable 7 4post GCSE skilled, both stable 8 6from unsk and sk SCH, HE 915mixed 10 4other quals and SCL, SCH 11 3post GCSE, SCH/SCLswitching 12 6other and sk/unsk, stable 13 8post GCSE, unsk stable 14 2post GCSE, self, stable

26 Advantages of MSA Is not limited to a single sequence measure Is not limited to a single event type Articulates the full scope of related sequences together

27 Issues Increasing complexity of the measure as new variables drawn in computing time / software switching Lack of formal rules in executing the OMA and clustering processes Largely exploratory: scope to develop in relation to EHA


Download ppt "Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of."

Similar presentations


Ads by Google