Presentation is loading. Please wait.

Presentation is loading. Please wait.

Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz.

Similar presentations


Presentation on theme: "Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz."— Presentation transcript:

1 Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC, February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania

2 EU-SILC data has a hierarchical structure more than one level of analysis is possible household & individual levels are represented by separate files data are stored in multiple data files 2 Introduction

3 3 Example of household level data

4 4 Example of individual level data

5 Decision on the appropriate unit of analysis for your research question, e.g. research interest in households or persons?  % of households /persons/men/women/children who live in poverty?  % of households with only 1 person or % of persons who live alone? Knowledge of procedures for manipulating the data 5 Working with this kind of data, requires

6 One-to-one matching Household Register to Household Data; Personal Register to Personal Data One-to-many matching Household variables to Individual data Many-to-one matching (‘aggregation’) e.g. adding information from the individual data to the household data 6 Types of Matching

7 7 EU-SILC – Types of matching Household- Register File (D) Household- Register File (D) Household- Data File (H) Household- Data File (H) Personal- Register File (R) Personal- Register File (R) Personal- Data File (P) Personal- Data File (P) 1:1 n:1 1:n n:1 1:n n:1 1:n n:1 1:n

8 Key variables provide links between the related records between household files between individual files between household and individual files Key variables (depending on the files) are household id (DB030; HB030; RX030; PX030) personal id (RB030; PB030) to be on the safe side: Use key variables always with ‘year of survey’ (DB010; HB010; RB010; PB010) & ‘country’ (DB020; HB020; RB020; PB020) 8 Linking EU-SILC files (cross-sectional)

9 Attach household register information (D-File) to household data file (H-File) e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too. 9 Example 1: one-to-one

10 10 One-to-One Match, e.g. household information Household Register ( separate file) DB010DB020DB030DB075 (…) DB100 2010AT23 (…) intermediate area 2010AT122 (…) thinly populated area 2010AT133 (…) thinly populated area 2010AT192 (…) thinly populated area 2010AT263 (…) thinly populated area 2010AT594 (…) densely populated area Household Data (separate file) HB010HB020HB030HS090HS120 (…)HX060 2010AT2no - cannot affordwith great difficulty (…) One person household 2010AT12yeswith difficulty (…) Other hhlds without dep. children 2010AT13no - other reasonfairly easily (…) One person household 2010AT19yesfairly easily (…) Other hhlds without dep. children 2010AT26yeseasily (…) Other hhlds without dep. children 2010AT59yeswith some difficulty (…) One person household

11 11 Result: Combined Household File Household Data (combined file) HB010HB020HB030HS090HS120(…)HX060DB100 2010AT2 no - cannot afford with great difficulty(…) One person householdintermediate area 2010AT12yes with difficulty(…) Other households without dependent children thinly populated area 2010AT13 no - other reasonfairly easily(…) One person household thinly populated area 2010AT19yesfairly easily(…) Other households without dependent children thinly populated area 2010AT26yeseasily(…) Other households without dependent children thinly populated area 2010AT59yes with some difficulty(…) One person household densely populated area

12 Attach household register information (D-File) to personal data file (P-File) Attach ‘Degree of urbanisation’ (again) to the personal data file 12 Example 2: one-to-many

13 13 Attaching household data to personal data (1:n) Personal Data (combined) PB010PB020PX30PB030PH010PH020PH030PX020DB100 2010AT2201fairyesyes, limited71intermediate area 2010AT121201fairnono, not limited32thinly populated area 2010AT121202fairyesyes, limited31thinly populated area 2010AT121203goodnono, not limited30thinly populated area 2010AT121204fairnono, not limited26thinly populated area (…) Household Register ( separate file) DB010DB020DB030DB075 (…) DB100 2010AT23 (…) intermediate area 2010AT122 (…) thinly populated area 2010AT263 (…) thinly populated area

14 e.g. number of persons in a households who are unemployed, full-time employed self-employed? such information is not included in the data => own computation 14 Example 3: many-to-one

15 15 Matching: many-to-one (summarizing information) Personal DataSummarized variables PB010PB020PX30PB030PL031 # unempl # employed full time # self employed 2010AT2201Unemployed (5)100 2010AT121201Empl. full time (1)021 2010AT121202Emp. full time (1)021 2010AT121203Emp. part time (2)021 2010AT121204Self-employed (3)021 (…) Household Data( combined file) HB010HB020HB030# unempl# employed# self employed 2010AT2100 2010AT12021 2010AT26..…

16 Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File) Open the EU-SILC training dataset – D-File *. Check the variables you are interested in. Sort your data according to key variables used für linkage *. Names of key variables in files to be matched must identical => Create new key variables (ID010, ID020, ID_HH) in such a way that DB010 = ID010 DB020 = ID020 DB030 = ID_HH Create a new file with only the key variables & the variable(s) you are interested in name the new file DB100.sav 16 Hands on – matching 1:1

17 **** Before you start ************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in. cross DB020 by DB100. 17 SPSS–Matching: one-to-one

18 * open the EU-SILC training dataset – D-File *. GET FILE='data_path/udb_c10d_silc_course.sav'. * check the variables you are interested in. cross DB020 by DB100. * Step 1- Sort your data according to key variables used für linkage *. sort cases by DB010 DB020 DB030. * Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH). * create a new file with the key variables & the variable(s) you are interested in *. save outfile = 'mydata_path/DB100.sav' /keep ID010 ID020 ID_HH DB100. 18 SPSS–Matching: one-to-one

19 GET FILE='data_path/udb_c10H_silc_course.sav'. sort cases HB010 HB020 HB030. * Key – Variables *. * either rename (like before) or better generate a new variable * STRING ID020 (A2). compute ID010 = HB010. compute ID020 = HB020. compute ID_HH = HB030. MATCH FILES FILE= * /file ='mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * check whether it worked. cross HB020 by DB100. 19 SPSS–Matching: one-to-one

20 Example 2: Combing household and personal data E.g. ‘Degree of Urbanisation’ (DB100) to personal data. GET FILE='data_path/udb_c10p_silc_course.sav'. * Sort key variables used für linkage *. sort cases by PB010 PB020 PX030. * PB020 = string variable - create a new string variable ID020 /or use the rename command * STRING ID020 (A2). compute ID010 = PB010. compute ID020 = PB020. compute ID_HH = PX030. 20 SPSS–Matching: One-to-many Match (1:n)

21 MATCH FILES FILE= * /table = 'mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute. * Check whether it worked *. cross pb020 by db100. save outfile = 'mydata_path/personal_data.sav'. 21 SPSS–Matching: One-to-many Match (1:n)

22 Create new summary variables for personal data (P-File) number of persons living in the same household number of unemployed persons living in a household number of full-time employed persons living in a household number of part-time employed persons living in a household number of self-employed persons living in a household sum of ‘pensions from individual private plans (PY080G) 22 Matching: many-to-one (n : 1)

23 23 *********************************************************. * many-to-one (n:1) * Personal Data * example 1 * number of persons living in the same household * number of unemployed persons living in a household *********************************************************. * specify the path where the EU-SILC training dataset is stored. FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'. * specify the path where you want to save your data. FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'. * open the EU-SILC training dataset. GET FILE='data_path/udb_c10p_silc_course.sav'.


Download ppt "Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz."

Similar presentations


Ads by Google