Download presentation
Presentation is loading. Please wait.
Published byBelen Goldsmith Modified over 9 years ago
1
REDI 3x3 Presentation: Data projects, Wage Inequality and Top Incomes Martin Wittenberg DataFirst 4 November 2014
2
Overview DataFirst data projects Wage and Wage Inequality Trends Top earnings
3
DATAFIRST DATA PROJECTS REDI 3x3 Presentation
4
Data Projects What is DataFirst? A data service based at UCT Data dissemination – DataFirst portal (www.datafirst.uct.ac.za)www.datafirst.uct.ac.za Survey data Metadata Searchable – Secure Data Research Centre Data that is confidential/sensitive NIDS geospatial data, UCT admissions data, CT RSC levy data… Training Research – Data quality – Harmonising data
5
Data Projects REDI 3x3 data projects Secure data projects – Tax data – QES data – Key issue for both is how to do this within the current legal framework; trust; worry that secure facility is based in CT Harmonisation/data creation projects – SESE: Survey of Employers and the Self-employed, 4 surveys: 2001, 2005, 2009 and 2013 – PALMS: Post-Apartheid Labour Market Series, v2 Contains employment, wages, some infrastructure OHS: annual 1994-1999 LFS: biannual 2000-2007 QLFS: quarterly 2008-2012q.1 39 surveys, almost 3.8 million records
6
Data Projects PALMS: What did we add? Rename/redefine variables to be as consistent across time as possible A set of harmonised weights Real earnings series across time: – Changes in measurement – Dealing with outliers – Dealing with brackets/missing incomes
7
Data Projects Harmonising weights Why do we need to do this? Problems with Stats SA weights – Branson & Wittenberg (2014)
8
Data Projects Harmonising weights
9
Data Projects Measurement changes Lots of changes Biggest - break between OHSs and LFSs – Two questions in OHSs (wages and earnings from self-employment; could answer both) – Only one question in LFSs Coverage change between OHSs and LFSs – Big increase in low income earners Mainly self-employed agricultural workers
10
Data Projects Outliers –Millionaires (real terms) unweightedweighted unweightedweighted SurveynproptotalpropSurveynproptotalprop 1994 0 005:1 0 0 199520.0000971 8650.00021105:240.0002623 0520.00031 1997 0 006:1 0 0 1998100.0010898 9900.00104806:2 0 0 1999430.00357627 5700.00323507:120.0001178240.000079 00:110.0001746140.00007107:220.0001252 7940.000259 00:2200.00104914 3570.00152610:160.0003343 6780.000318 01:110.000059869.70E-0610:2100.000566 2770.000548 01:240.0002472 4660.00027610:3110.0006447 5110.000664 02:1 0 010:460.0003583 6110.000315 02:210.0000682 4410.00027611:110.0000611 0410.000091 03:1 0 011:240.0002433 7370.000327 03:2 0 011:330.0001731 9370.000166 04:1 0 011:460.0003352 6470.000224 04:2 0 0
11
Data Projects How do we deal with this? Run (“Mincerian”) wage regression – Generate residuals (i.e. deviations from the predicted wage) – “Studentize” these – Flag residuals that are bigger than 5 in absolute value – should have seen 0.3 cases on a dataset as big as PALMS Actually flagged 476 Outlier variable included with PALMS public release
12
Data Projects Brackets (LFS case) Salary category 00:100:201:101:202:102:203:103:2 None0.0010.000 R 1 - R 2000.8900.9390.8670.8920.8800.8620.8900.846 R 201 - R 5000.8770.9220.8890.8550.8640.8720.8730.857 R 501 - R 1 0000.8080.9130.8380.8450.8350.8210.8290.815 R 1 001 - R 1 5000.7030.8450.7650.7330.7170.7100.7370.680 R 1 501 - R 2 5000.6250.8490.7410.7500.7040.6950.7120.697 R 2 501 - R 3 5000.5260.8490.6620.6550.5940.6000.6090.577 R 3 501 - R 4 5000.4990.7730.5620.6070.5070.4930.4820.474 R 4 501 - R 6 0000.5130.7770.5800.6110.5180.5230.4920.455 R 6 001 - R 8 0000.4630.7620.5000.5620.5010.4490.4440.429 R 8 001 - R 11 0000.4730.6610.4640.4480.3980.3830.3720.336 R 11 001 - R 16 0000.4520.6460.4580.4360.3830.2940.3410.279 R 16 001 - R 30 0000.3360.6680.3980.3380.4010.2720.3030.297 R 30 000 or more0.7040.9180.7120.6490.5190.5350.6100.465
13
Data Projects How does one deal with this? 4 approaches: – Reweighting: Let those giving Rand amounts “represent” missing incomes in the same bracket – Deterministic imputations Midpoint, Mean, Conditional mean – Stochastic imputations Hot deck – Match individuals to “similar” individuals (on covariates like gender, education etc.), copy income – Multiple stochastic imputation Problem with stochastic imputation is that the value that is imputed is not actually measured, it is the true value plus some error We need to take the variability associated with this into account Do the stochastic imputation multiple times Can take the uncertainty arising from the imputation into account
14
Data Projects How does PALMS deal with this? “Bracket weights” – Does the reweighting of point values to take the brackets into account Multiple stochastic imputation – Released a dataset with 10 versions of real earnings
15
Data Projects What do the adjustments do? Point values onlyReweightedImputations (no outliers) outliersremovedoutliersremovedmeanmidpthotdeckmultiple (1)(2)(3)(4)(5)(6)(7)(8) 199526202620.32793.62793.9 2880.32815.63028.1 (54.73)(54.74)(59.33)(59.34)(53.15)(57.47)(54.32)(66.63) 19972049.22050.126602660.92660.82653.72664.12867.5 (42.5)(42.51)(95.37)(95.39)(52.77)(60.29)(55.41)(70.15) 19982174.52044.82826.82667.82684.825752675.32817.9 (90)(75.37)(111.01)(96.57)(68.33)(67.95)(72.03)(79.7) 19993150.71984.336142663.22698.72747.22689.63093.7 (327.01)(77.62)(259.53)(84.85)(66.26)(74.57)(68.73)(111.25) 2000:11904.318782355.72332.22331.82391.52474.22446.7 (80.22)(73.01)(90.96)(85.78)(69.45)(84.94)(74.63)(72.67) 2000:25095.12400.85105.12593.625942748.12640.12699.1 (1062.69)(74.85)(990.97)(78.26)(72.71)(85.54)(74.65)(79.74) 2001:11989.71980.124512442 2461.72538.92513.6 (43.67)(42.25)(61.42)(60.53)(51.24)(55.77)(54.46)(61.7) 2001:22137.32101.425862543.72544.5262426252683.8 (59.3)(50.3)(77.94)(69.3)(55.21)(65.37)(57.25)(60.77) Estimated standard errors in parentheses, correcting for clustering, but not correcting for imputations (except in the multiple imputations case)
16
USING THE DATA: WAGE AND WAGE INEQUALITY TRENDS REDI 3x3
17
Wage and Wage Inequality Trends Real wage trends
18
Wage and Wage Inequality Trends Looking at the wage distribution
19
USING THE DATA: TOP EARNINGS REDI 3x3
20
Top Earnings Preview Preliminary work done on PALMS v1 Core idea: fit a Pareto distribution to the top tail Estimation strategy – Nonparametric – Parametric Results
21
Top Earnings Why Pareto distribution? Seems to fit the top tail reasonably well Cowell & Flachaire (2007) suggest that in the presence of data quality issues, inequality might be estimated better by a hybrid approach: – Standard nonparametric estimates on the bulk of the distribution, combined with estimation of the Pareto coefficient at the top Pareto coefficient is a measure of how “heavy” the tails at the top are
22
Top Earnings Pareto distribution
23
Top Earnings Position of the top tail
24
Top Earnings Distribution within the top tail
25
Top Earnings Estimated Pareto coefficients Cutoff: R4501 (1996)Cutoff: R6001 (1996)Cutoff: R8001 (1996)Cutoff: R2501 (1996) alpha n n n n 95Oct1.950(0.0376)4,2362.003(0.0527)2,5872.088(0.0788)1,3451.659(0.0180)9,536 96Oct1.873(0.0639)1,4901.783(0.0841)8141.739(0.114)4751.557(0.0284)3,781 97Oct1.712(0.0451)2,4561.619(0.0556)1,3961.520(0.0671)8311.511(0.0224)5,999 98Oct1.471(0.0451)1,7631.373(0.0510)1,0751.373(0.0631)7031.535(0.0297)4,175 99Oct1.728(0.0540)2,1561.608(0.0657)1,2641.567(0.0850)7511.608(0.0282)4,990 00Sep1.805(0.0686)2,2991.818(0.0959)1,3521.625(0.124)7761.439(0.0282)5,048 01Sep2.138(0.0621)2,6642.163(0.0818)1,5121.893(0.0897)8531.600(0.0248)5,614 02Sep1.914(0.0584)2,1912.056(0.0871)1,3132.064(0.122)7181.576(0.0265)5,079 03Sep2.054(0.0549)2,5691.993(0.0706)1,4741.903(0.0911)7851.584(0.0240)5,442 04Sep2.097(0.0709)2,4902.099(0.0926)1,4042.050(0.126)7161.550(0.0306)5,088 05Sep1.808(0.0621)2,4962.004(0.0920)1,5491.850(0.109)7821.350(0.0271)5,024 06Sep1.857(0.0651)2,7251.776(0.0793)1,5992.002(0.117)8691.351(0.0282)5,354 07Sep1.628(0.0918)2,3571.687(0.119)1,8501.772(0.155)1,0091.334(0.0453)5,166 Pooled1.823(0.0140)53,1541.846(0.0186)31,5281.792(0.0238)17,4721.475(0.0064)117,647
26
Top Earnings Summary No evidence in the graphs or table that there is a systematic trend for the distribution to flatten out/steepen Above a cut-off of R4500 the parameter estimates are not that sensitive to the particular cut-off chosen
27
Top Earnings Implications
28
Top Earnings Example Illustrative probabilities in the tail cut-off (monthly)probnumbers 800011500000 160000.287175430762 300000.092628138942 1000000.01060615909 3000000.0014682202 10000000.000168252 30000002.33E-0535 100000002.66E-064
29
Top Earnings Tax statistics Cutoff100 001200 001300 001400 001500 001 20031.5841.1381.3031.4111.111 20041.5521.1451.3201.4341.129 20051.5191.1341.3141.4241.113 20061.4691.1081.2861.3911.096 20071.3811.0861.2681.3721.077 20081.3011.0661.2491.3591.072 20091.2351.0701.2661.3901.096
30
Top Earnings Discussion Results in this case are somewhat sensitive to the choice of the cut-off – For some choices there seems to be evidence for the tail to get “fatter” – Change in coverage? The range of the Pareto estimates (1.5 to 1.1) are noticeably smaller than in the case of labour earnings – Impact of returns on investments? Other forms of compensation? Some comparative figures for other countries (Levy & Levy): US 1.35, UK 1.06, France 1.83
31
WHERE TO NOW? REDI 3x3
32
Top Earnings PALMS We will update PALMS next year There seems to be a need for more extensive training – Use of the “bracket weights” – Use of the multiple imputation dataset Further work on data quality adjustments
33
Top Earnings TAX DATA Hopefully we’ll be able to redo the “top tails” analyses on unit record data Make a “synthetic” version available
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.