CMGPD-LN Methodological Lecture Day 4

CMGPD-LN Methodological Lecture Day 4
Households

Outline Existing household variables Creation of new variables
Identifiers Characteristics Dynamics Household relationship Creation of new variables Use of bysort/egen

Identifiers HOUSEHOLD_ID HOUSEHOLD_SEQ UNIQUE_HH_ID
Identifies records associated with a household in the current register HOUSEHOLD_SEQ The order of the current household (linghu) within the current household group (yihu) UNIQUE_HH_ID Identifies records associated with the same household across different registers New value assigned at time of household division Each of the resulting households gets a new, different

Characteristics HH_SIZE HH_DIVIDE_NEXT
Number of living members of the household Set to missing before 1789 HH_DIVIDE_NEXT Number of households in the next register that the members of the current household are associated with. 1 if no division 0 if extinction 2 or more if division

histogram HH_SIZE if PRESENT & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of individuals") xtitle("Number of members")

This isn’t particularly appealing A log scale on the x axis would help
In STATA, histogram forces fixed width bins, even when the x scale is set to log We can collapse the data and plot using twoway bar or scatter table HH_SIZE, replace twoway bar table1 HH_SIZE if HH_SIZE > 0, xscale(log) scheme(s1mono) xlabel( )

What if we would like to convert to fractions?
Compute total number of households by summing table1, then divide each value of table 1 by the total sum(table1) returns the sum of table 1 up to the current observation total[_N] returns the value of total in the last observation drop if HH_SIZE <= 0 generate total = sum(table1) generate hh_fraction = table1/total[_N] twoway bar hh_fraction HH_SIZE if HH_SIZE > 0, xscale(log) scheme(s1mono) xlabel( ) ytitle("Proportion of households")

Households as units of analysis
The previous figures all treated individuals as the units of an analysis Every household was represented as many times as it had members A household with 100 members would contribute 100 observations In effect, the figures represent household size as experienced by individuals Sometimes we would like to treat households as units of analysis So that each household only contributes one observation per register

Households as units of analysis
One easy way is to create a flag variable that is set to 1 only for the first observation in each household Then select based on that flag variable for tabulations etc. This leaves the original individual level data intact bysort HOUSEHOLD_ID: generate hh_first_record = _n == 1 histogram HH_SIZE if hh_first_record & HH_SIZE > 0, width(2) scheme(s1mono) fraction ytitle("Proportion of households") xtitle("Number of members")

Another approach to plotting trends
We can plot average household size by year of birth without ‘destroying’ the data with TABLE, REPLACE or COLLAPSE bysort YEAR: egen mean_hh_size = mean(HH_SIZE) if HH_SIZE > 0 bysort YEAR: egen first_in_year = _n == 1 twoway scatter mean_hh_size YEAR if first_in_year & YEAR >= 1775, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1775(25)1900)

Mean household size of individuals by age
keep if AGE_IN_SUI > 0 & SEX == 2 & YEAR >= 1789 & HH_SIZE > 0 bysort AGE_IN_SUI: egen mean_hh_size = mean(HH_SIZE) bysort AGE_IN_SUI: generate first_in_age = _n == 1 twoway scatter mean_hh_size AGE_IN_SUI if first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui") lowess mean_hh_size AGE_IN_SUI if first_in_age & AGE_IN_SUI <= 80, scheme(s1mono) ytitle("Mean household size of individuals") xlabel(1(5)85) xtitle("Age in sui") msize(small)

Household division Individuals by next register
. tab HH_DIVIDE_NEXT if PRESENT & NEXT_3 & HH_DIVIDE_NEXT >= 0 Number of | household in | the next | available | register | Freq. Percent Cum. 1 | , 2 | , 3 | , 4 | , 5 | 6 | 7 | 8 | 9 | 10 | Total | ,

Household division Example of a simple analysis
generate byte DIVISION = HH_DIVIDE_NEXT > 1 generate l_HH_SIZE = ln(HH_SIZE)/ln(1.1) logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh

. logit DIVISION HH_SIZE YEAR if HH_SIZE > 0 & NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Logistic regression Number of obs = LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = DIVISION | Coef. Std. Err. z P>|z| [95% Conf. Interval] HH_SIZE | YEAR | _cons |

. logit DIVISION l_HH_SIZE YEAR if NEXT_3 & HH_DIVIDE_NEXT >= 0 & first_in_hh
Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Logistic regression Number of obs = LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R = DIVISION | Coef. Std. Err z P>|z| [95% Conf. Interval] l_HH_SIZE | YEAR | _cons |

Creating household variables
bysort and egen are your friends Use household_id to group observations of the same household in the same register Let’s start with a count of the number of live individuals in the household bysort HOUSEHOLD_ID: egen new_hh_size = total(PRESENT) . corr HH_SIZE new_hh_size if YEAR >= 1789 (obs= ) | HH_SIZE new_hh~e HH_SIZE | new_hh_size |

Creating measures of age and sex composition of the household
bysort HOUSEHOLD_ID: egen males_1_15 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15) bysort HOUSEHOLD_ID: egen males_16_55 = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55) bysort HOUSEHOLD_ID: egen males_56_up = total(PRESENT & SEX == 2 & AGE_IN_SUI >= 56) bysort HOUSEHOLD_ID: egen females_1_15 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 1 & AGE_IN_SUI <= 15) bysort HOUSEHOLD_ID: egen females_16_55 = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55) bysort HOUSEHOLD_ID: egen females_56_up = total(PRESENT & SEX == 1 & AGE_IN_SUI >= 56) generate hh_dependency_ratio = (males_1_15+males56_up+females_1_15+females56_up)/HH_SIZE bysort AGE_IN_SUI: generate first_in_age = _n == 1 bysort AGE_IN_SUI: egen mean_hh_dependency_ratio = mean(hh_dependency_ratio) twoway line mean_hh_dependency_ratio AGE_IN_SUI if first_in_age & AGE_IN_SUI >= 16 & AGE_IN_SUI <= 55, scheme(s1mono) ylabel(0(0.1)0.5) xlabel(16(5)55) ytitle("Household dependency ratio (Prop. < 15 or >= 56 sui)") xtitle("Age in sui")

Numbers of individuals who co-reside with someone who holds a position
. bysort HOUSEHOLD_ID: egen position_in_hh = total(PRESENT & HAS_POSITION > 0) . tab position_in_hh if PRESENT & YEAR >= 1789 position_in | _hh | Freq. Percent Cum. 0 | 1,177, 1 | , 2 | , 3 | , 4 | , 5 | , 6 | 7 | 8 | 9 | Total | 1,305, . replace position_in_hh = position_in_hh > 0 (49183 real changes made) . tab position_in_hh if PRESENT & YEAR >= 1789 position_in | _hh | Freq. Percent Cum. 0 | 1,177, 1 | , Total | 1,305,

CMGPD-LN Methodological Lecture Day 4

Similar presentations

Presentation on theme: "CMGPD-LN Methodological Lecture Day 4"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMGPD-LN Methodological Lecture Day 4

Similar presentations

Presentation on theme: "CMGPD-LN Methodological Lecture Day 4"— Presentation transcript:

Similar presentations

About project

Feedback