Presentation on theme: "Matching PLASC and ALSPAC PLASC/NPD User Group Workshop 13 th September 2006 Andy Boyd David Herrick"— Presentation transcript:
Matching PLASC and ALSPAC PLASC/NPD User Group Workshop 13 th September 2006 Andy Boyd (firstname.lastname@example.org) David Herrick (email@example.com)
What is ALSPAC? Avon Longitudinal Study of Parents and Children Cohort study of children and their parents, based in south-west England Designed to determine ways in which the individuals genotype combines with environmental pressures to influence health and development
Study design Eligibility criteria: Mothers had to be resident in Avon and have an expected date of delivery between 1 st April 1991 and 31 st December 1992 Avon was broadly representative of the UK as a whole and has a relatively stable population Enrolled sample of 14,541 pregnancies resulting in 14,062 live born children
Data Self Completion Questionnaires Hands on Measurements Biological Samples Health Records Education Records Direct School Contact
Educational Data - Primary Contact with ~350 primary schools in the four local LEAs: Bristol South Gloucestershire North Somerset Bath and North East Somerset Private & special schools included Parental contact for out of area cases
Educational Data - Primary Questionnaires in Year 3 & Year 6: School (Head teacher) Class (Class teacher) Child (Class teacher) Year 4 test: Maths Year 6 tests: Maths, Spelling, Science
Educational Data - Secondary Questionnaire for maths teachers in 2002/3 (Year 7) & 2004/5 (Years 7, 8 & 9) and associated class lists Year 6 maths test repeated in Year 8 Moving away from direct school contact
Educational Data - SATS Entry Assessment & KS1 data on eligible children at local schools acquired directly from the LEAs Linkage to NPD: Increased coverage Easier linking (UPN) PLASC as well
Study Approval & Cohort Matching Ethics & study approval The Fischer Trust Validating the cohort match Anonymizing the data set Issues encountered
Ethics & Study Approval ALSPAC Ethics & Law committee LREC (NHS research ethics committee) Eligible vs. Enrolled cohort Final research file to be anonymous DfES commissioned a third party, The Fischer Trust, to conduct the cohort/data match
The Fischer Trust An intermediately between ALSPAC and the DfES FT received both ALSPAC and NPD datasets and conducted the cohort match. FT created its own ID (however we were also provided with UPN)
Cohort match variables Details for 20551 children provided: Child Surname Child Forename Child Date of Birth Home Postcode School Indicator (name & address) from ALSPAC schools data collection
Validating the cohort match For our methodology, study requirements we wanted to reverse check the match FT matched 86% cases provided (17671 cases) Very few errors found (<0.5%)
Problems with the match variables Child Surname (change over time) Child Forename (familiar names) Child Date of Birth Home Postcode (out of date and lost cases) School Indicator (name & address) from ALSPAC schools data collection (depended on school participation and out of date information)
Anonymizing the data set UPN transferred to new internal ID and then to new collaborator ID Personal variables dropped (DoB, names, postcode, age at census) Identifying variables dropped (care authority) Variables recoded (ethnicity, SEN) LEA & Estab Ids recoded into our own unique ALSPSCHL_ID
Issues encountered Cases not covered by NPD REE – not including old schools Primary to junior succession Children who resit years or are in a non natural school year Historical records of school movement
Issues - UPN We discovered that the U in UPN isnt that unique! 215 ALSPAC cases have multiple UPNs (with no clear pattern as to why) PLASC 2004 has two ALSPAC children with the same UPN
Sample At least 1 PLASC return identified for 11,997 (85%) of the 14,062 enrolled live births: 2002 - 11,850 (84%) 2003 - 11,731 (83%) 2004 - 11,473 (82%) Balance: Private schools Home educated Outside England Not identified
Editing (1) Convert string variables to numeric, label and sort missing values and write documentation. Calculate age at census. From date of entry derive age on starting at current school and length of time at current school. Derive expected NCYG (National Curriculum Year Group).
Editing (2) Ethnicity: 39 cases had new ethnicity codes in 2002 – these were mapped back to old codes and an equivalent to main category derived. Also derive white/non- white indicators. Care: In 2003 17 of the 34 cases marked as currently in care were marked as N for ever in care. Did not occur in 2004.
Unanswered questions 6.6% of children were not in the expected NCYG in 2002 compared with 0.7% in 2003 and 2004. Large increase in use of code T for ethnicity source between 2003 & 2004, even if restricted to Year 7 only.
Illegal Values (1) Numeric codes in Boarder field (should be only B or N) – 2 cases in 2002, 7 in 2003 and 13 in 2004. Code 1 in for NCYG in 2003 for child in secondary school who was expected to be in Year 7 and who was recorded as in Year 6 in 2002 and Year 8 in 2004.
Illegal Values (2) X in NCYG in 2004 – 2 cases. A small number of cases are missing important fields like date of entry, NCYG. 3 cases had the same code for primary and secondary SEN types.
Uses Identifying Developmental Impairments: Investigating the use of early life parental questionnaires to predict later problems. SEN types used to identify autism, speech/language problems and possible learning difficulties. Twin approach with medical database searches. Autism project. Ethnicity.
Wish List Detailed documentation describing how different fields relate (especially for SATs). Numeric fields supplied as numeric rather than string.