Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008.

Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008

Leeds, 22 March 2007 Treating longitudinal data longitudinally The objective of this fellowship is to gain experience and proficiency in the secondary analysis of longitudinal data sets. Motivation Large amount of resources spent on collecting longitudinal data Inadequate utilisation of the “longitudinality” of the data

Leeds, 22 March 2007 The substantive research question The objectives of the fellowship will be pursued by investigating the interrelationship of trajectories of employment status and health in British Household Panel Survey. English Longitudinal Study of Ageing will also be used. In this presentation I present results from the first two months of work done.

Leeds, 22 March 2007 Complexity of longitudinal data Structure of longitudinal data - like an arrow in flight, which ‘is paradoxically at one stage while also pursuing its path to the target’. Challenge - resolve that paradox by utilising the whole information included in the stage and the track of the longitudinal variable.

Leeds, 22 March 2007 Importance of trajectories In the data, trajectories are represented as an array of time indexed variables. Every point in the trajectory contains information about the current value at the point of measurement the direction how that point was reached where it might go. Longitudinal data are underused if only the magnitude of the time- indexed variables are used without taking into account the possible interrelationship between them.

Leeds, 22 March 2007 Examples of research meeting the challenge Mainland European demographers, used linked information from Norwegian national decennial censuses to construct social trajectories through the life course. They used state-order-time models to predict mortality: State model: logit P(Y = 1) = d + Σ(a i A i + b i B i + c i C i ) Order model: logit P(Y = 1) = a + Σb i O i -Wunsch et al. 1996

Leeds, 22 March 2007 Combining states and orders: sequences Order of states can be expressed as sequences, which can represent longitudinal trajectories. Sequences are clustered according to their similarity in all to all comparisons or in comparisons against ideal types. Cluster membership is used as dependent or independent variable in analyses. A method becoming popular for this purpose is optimal matching

Leeds, 22 March 2007 Optimal matching: a short primer A measure of dissimilarity between two sequences is d = (L 1 + L 2 ) – 2*L CS Where L 1 and L 2 are total lengths of first and second sequences and L CS, the length of the largest common sequence they share. e.g. LONDON LEEDS L 1 = 6; L 2 = 5; L CS (LD) = 2; d= 11-4 = 7 The matching is optimal – when d has the smallest possible value, which depend on identifying highest possible L CS

Leeds, 22 March 2007 A special case If sequences are made of two states and of equal length (L): 1. AAAAAA = 111111 Σ 1 =6 =L 2. ABAABA = 101101 Σ 2 =4 3. BBAABB = 001100 Σ 3 =2 Σ is the L CS d1 →2 = (6+6)-2*4 = 4 = 2*(L- Σ 2 ) d1 →3 = (6+6)-2*2 = 8 = 2*(L- Σ 3 ) d2 →3 cannot be extrapolated from these relations d is often standardised by dividing with L (or longer length in case of unequal lengths)

Leeds, 22 March 2007 Developing methods to compare trajectories of two different variables: progress to-date I used BHPS wave 1 to 14. 2 trajectories were selected: People in labour force (1= in LF) People with limiting illness in the previous 12 months (0 = no limiting illness) Reference sequences were being in labour force for all waves; and no limiting illness for all waves Hypothesis was people who are ill will not be working Research question is how trajectories of labour force participation vary with in a given pattern of illness?

Leeds, 22 March 2007 Methods Data restricted to all those who had information on both variables in all waves. STATA commands used to match and produce the standardised distances against reference sequences. As there are only 14 waves, there were only 14 discrete values for distances (small enough to look at each value separately but large enough for treating as continuous)

Leeds, 22 March 2007 Method to describe a pattern graphically Traditionally, area charts are used to describe patterns. Disadvantage: uncertainty at each time point in the pattern is not reflected. Information content of a distribution of states at a time point can be calculated using Shannon’s information measure. Information at a position, R = Where a is state (0,1) and f a is frequency of state a. This method is based on ‘sequence logos’ used to describe genetic sequences Schneider, 1999

Leeds, 22 March 2007 Method to analyse variations in patterns To study variations in distribution of patterns I used the Gini coefficients. The Gini coefficient can be decomposed as between groups, within groups and overlap. It has no distributional assumptions except for the variable should be monotonically increasing. Similarity of this procedure with ANOVA has lead to it being called ANoGi (Frick et al. 2004)

Leeds, 22 March 2007 Results Sample size: 4796 Sequence with only one state: Limiting illness: 2924 (60.9%) In labour force: 2477 (51.7%) Number of episodes:

Leeds, 22 March 2007 Limiting illness: distribution of patterns according to distance from reference pattern (No illness in all waves)

Leeds, 22 March 2007 Limiting illness patterns at distance 1: Traditional graphic representation

Leeds, 22 March 2007 Limiting illness patterns at distance 1: Information theoretic (sequence logo) representation L limiting illness N No limitations

Leeds, 22 March 2007 Pattern for labour force participation : whole sample E In labour force N Not in labour force

Leeds, 22 March 2007 Pattern for labour force participation : in those with no limiting illness E In labour force N Not in labour force

Leeds, 22 March 2007 Pattern for labour force participation : in those with limiting illness in half the waves E In labour force N Not in labour force

Leeds, 22 March 2007 Pattern for labour force participation : in those with limiting illness in all waves E In labour force N Not in labour force

Leeds, 22 March 2007 Analysis of Gini: Patterns of employment grouped by patterns of limiting ill health

Leeds, 22 March 2007 Relationship of patterns of labour force participation and patterns of limiting illness r=0.43

Leeds, 22 March 2007 In conclusion… Work in progress. Need to explore using more complex patterns and full optimal matching Other methods Fellowship mentored by: Professor David Blane, Imperial College Professor Mel Bartley, UCL Professor Richard Wiggins, City University Professor Nicky Best, Imperial College

Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008.

Similar presentations

Presentation on theme: "Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008.

Similar presentations

Presentation on theme: "Leeds, 22 March 2007 A SECONDARY ANALYSIS OF DATA MID CAREER FELLOWSHIP Gopalakrishnan Netuveli Imperial College London 1 Jan 2007 – 31 March 2008."— Presentation transcript:

Similar presentations

About project

Feedback