Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chris Dibben University of Edinburgh Linking historical administrative data.

Similar presentations


Presentation on theme: "Chris Dibben University of Edinburgh Linking historical administrative data."— Presentation transcript:

1 Chris Dibben University of Edinburgh Linking historical administrative data

2 Context History of very important contributions: –Dutch Famine Birth Cohort Study – epigenetics, thrifty phenotype –Överkalix study – epigenetics, sex differences –UK Longitudinal Study – health inequalities

3 Two new developmental projects Scottish Mental Surveys 1932 and 1947 Scottish civil registration data New cohorts for people now in old age

4 The ‘Scottish Mental Survey’

5 1947 Scottish Mental Survey 1939 register Birth 1936 ED code, address, household members: marital status, occupation The Scottish Longitudinal study Scottish morbidity records 1939 books recorded the date of death (up to 1980) linkage to the death database (1974 onwards) Education Employment

6 Early life environment 1970 34 Hospitalisation Mortality Birth 1936 0Age Year Mental ability 11 School Achievement (time estimated) 1947 Occupation (estimated) 1991 55 Detailed household/ individual information 20012011 6575

7 Background – Scottish vital events Civil registration of births, deaths and marriages in Scotland began on 1 January 1855 All historical vital events records have been converted into digital image format with a supporting index Modern vital events data (from 1974 onwards) are available electronically

8 Digitising Scotland Approximately 50 million occupation strings, 8 million causes of death Classify occupations to Historical International Standard Classification of Occupations (HISCO) Cause of death to a modified ICD10 Each with a location

9 Historical Geocoding GEOCODING TOOL + = + GEOMETRY FEATURES YearHistorical address 2010Ladywell House, Ladywell Road, Edinburgh, EH12 7T 1910Ladywell House, Ladywell Street, Edinburgh 1810Ladywell House, Ladywell Street, Edinburgh 1710Ladywell House, Lady[vv]ell Street, Edinburgh  Postcode change  Without postcode  Interpretation error 1710 1810 1910 2010 Change of road networks (new road replace old) over time Change of road names over time Interpretation errors from the address digitisation GEOMETRY FEATURES GEOMETRY FEATURES GEOMETRY FEATURES 1710 1810 1910 2010

10

11

12 Challenges Significant methodological issues: –How can we consistently code occupational data so that researchers can explore changing patterns and trends? –How can we automate this process so that the majority of records do not need to be manually coded? digitisingscotland@lscs.ac.uk12

13 Digitising Scotland Records of births, marriages and deaths recorded in Scotland from 1855 to present day. digitisingscotland@lscs.ac.uk

14 14

15 15

16 16

17 17

18 18

19 Experimental Dataset Use a dataset with similar content for experiments 60,000 records from the Cambridge Family History Study (records from 1800-1990) Occupation descriptions and associated HISCO codes HISCO coding done by historians Dataset contains 330 different HISCO codes 19

20 20 HISCO Hierarchy Example

21 Classification Example String from recordGold Standard Classification Automatic Classification Output Farm horseman62460 Horse Worker Shoe maker80110 Shoemaker, General Fireman (railway)98330 Railway Steam- Engine Fireman Fireman58100 Fire-Fighter Stationer41000 Working Proprietors (Wholesale and Retail Trade) 91000 Paper and Paperboard product makers 21

22 Classification Example String from recordGold Standard Classification Automatic Classification Output Farm horseman62460 Horse Worker Shoe maker80110 Shoemaker, General Fireman (railway)98330 Railway Steam- Engine Fireman Fireman58100 Fire-Fighter Stationer41000 Working Proprietors (Wholesale and Retail Trade) 91000 Paper and Paperboard product makers 22

23 Approach Text analysis Supervised machine learning –Apache Mahout framework. Combination of these techniques. 23

24 Supervised Machine Learning Training DataMachine Learning Unseen Data Prediction Model Predicted Classification 24 Prediction Model

25 Supervised Machine Learning Training Data Machine Learning Unseen Data Prediction Model Predicted Classification 25 Prediction Model Farm horseman62460 Shoe maker80110 Fireman58100 Stationer41000

26 Supervised Machine Learning Training DataMachine Learning Unseen Data Prediction Model Predicted Classification 26 Prediction Model Farm horseman62460 Shoe maker80110 Fireman58100 Stationer41000 Farm horseman Boot maker Fireman Painter

27 Supervised Machine Learning Training DataMachine Learning Unseen Data Prediction Model Predicted ClassificationPrediction Model Farm horseman62460 Shoe maker80110 Fireman58100 Stationer41000 Farm horseman Boot maker Fireman Painter ? Prediction Model

28 100% Asthma Miners asthma spasmodic collier's miner's miners asthma dropsy bronchial

29

30 Creation of a fully-linked vital events database for the whole Scotland back to 1855 1974 1855 Present Vital Events (24 million births, deaths and marriages) Digital Images + Index Vital Events Database Vital Events Database Fully-linked Vital Events Database

31 Large scale family reconstruction studies and Pedigrees

32 Gottfredsson, Magnús, et al. "Lessons from the past: familial aggregation analysis of fatal pandemic influenza (Spanish flu) in Iceland in 1918."Proceedings of the National Academy of Sciences 105.4 (2008): 1303-1308.

33

34 Acknowledgments The Digitising Scotland project is funded by ESRC; The support from National Records of Scotland is also gratefully acknowledged.


Download ppt "Chris Dibben University of Edinburgh Linking historical administrative data."

Similar presentations


Ads by Google