Presentation on theme: "On the Use of Latent Variables in Education and Health Systems Evaluation Giorgio Vittadini and Pietro Giorgio Lovaglio University of Bicocca-Milan. NTTS."— Presentation transcript:
On the Use of Latent Variables in Education and Health Systems Evaluation Giorgio Vittadini and Pietro Giorgio Lovaglio University of Bicocca-Milan. NTTS 2009 (New Techniques and Technologies for Statistics) Brussels, February 2009
2 1 SYNTHETIC COMPOSITE INDICATORS FOR THE UE Goal: transforming the EU into the most competitive and dynamic knowledge-based economy in the world within the year 2010 European Commission (2001): development of composite indicators for certain purposes within the Structural Indicators Exercise 15 quality indicators in four areas: skills, competencies and attitudes; access to participation; resources for lifelong learning; and strategies and system development. European Commission (2007):The European Lifelong Learning Index Methodology for the construction of synthetic composite indicators; particularly: use of latent variables within a causal models framework. i.e. 1) the Human Capital of a population of workers 2) The economic performances of Health structures. Empirical results based on Lombardy region administrative archives are provided.
3 A. HUMAN CAPITAL The Lisbon strategy places strong emphasis on knowledge, innovation, and the optimization of Human Capital (HC), which is closely linked to education, employment, and the health sector (Petty,1690;Cantillon,1755;A.Smith,1667;Marshall;I.Fisher) Quantitative estimation of earnings function:the amount of abilities possessed by an human being generated by investment in education(Mincer1958,1970),job training(Becker 1962,1964) Quantification approaches( aggregate results) Retrospective method: The monetary cost of production of an human being: the cost of rearing a child until working age. (Cantillon,1755;Engel,1883; Kendrick,1976; Eisner,1985) Actuarial-Prospective method: The monetary value of the expected flow of earned income in life cicle generated by investment in education and job training. (Farr (1853);Dublin,Lotka, (1900);Jorgenson, Fraumeni, (1989)).
4 A.2. A NEW METHOD FOR ESTIMATING THE LVs HC We propose a new definition for HC, compromise between retrospective and prospective definitions. It is simultaneously defined as a disaggregate LV ( defined on individuals or families) 1) UNOBSERVABLE COMPOSITE VARIABLE with respect to their formative indicators (combination of HC investment indicators) 2) TRUE LV with respect to reflective indicators (describing the effects of Human capital) -The LV HC consistent with the economic definition is transformed by an actuarial mathematical approach to estimate the HC at disaggregated level in monetary units
5 A.3 STATISTICAL APPROACH CONCOMITANT SET (Career Experience; Opportunities) REFLECTIVE SET (income,-…) HC FORMATIVE SET (Schooling years,..) (Age, Sex, Region Work experience Occupation, Contract type n° Children) HC is an LV, directly linked with a set of indicators that contribute to its formation (formative indicators, Z) and that simultaneously has causal impact on a set of dependent variables (reflective indicators, Y) Y may be affected by other factors, a set of Concomitant indicators W 1 has to be estimated as linear combination of Z that better fits Y, net of contribution of spurious effect of W 1 by means of a Reduced Rank Regression (RRR; van der Leeden, 1990), whose first aim is to explain the generalized variance of responses by linear components of predictor variables within a statistical model (structured errors)
6 A.4-REGIONAL ADMINISTRATIVE DATA INTEGRATION OF TWO DATABASES Dataset of the Employment Centers of the Province of Milan: every variation in the employment position for workers (subordinate contract) in the private sector between the years 2000 and 2005; educational, training and occupational workers characteristics) Individual income tax return filed from 2001 to 2005 (Gross earned Income; being before tax, provide better discrimination among incomes). Available only for workers with residence in the City of Milan POPULATION The population is composed by 95,896 workers with residence in the City of Milan, belonging to the labour force in the private sector, with vocational experiences recorded in the database of employment offices of the Province of Milan in the period
7 A.5- STATIC ANALYSIS: HC INDICATORS Reflective indicator: 2004 gross earned income (income filed in 2005), Investment indicators: Years of schooling (last certification awarded), days of full-time work in , days of training in the period (FSE courses, lifelong learning, NOT training on the job) Concomitant indicators: gender, age, number of children, nationality, marital status; type of contract, industry, type of occupation (those with the longer duration of days in 2004). Limitation: lack of information about worker career prior to 2000 (in particular, years of work experience) and parents socio-economical status. Information about wealth of origin household is available only for a very limited number of workers living with parents, and thus it is not considered.
8 A.6-MEASUREMENT MODEL: RESULTS Indicators affecting earned income HC Indicators Std coeff Sig. % weight Years of Schooling0.756 < % Days of FT Work (00-04)0.564 < % Days of training (00-03)0.333 < % CovariateF testSign. Human Capital4641.5<.0001 Gender1232.8<.0001 Age1141.5<.0001 Occupation932.27<.0001 Type of contract420.06<.0001 N of Children119.22<.0001 Industry73.93<.0001 Marital Status69.84<.0001 Nationality3.29<.0001 Estimated HC and concomitant indicators explain 56% of the earned income (only 1/3 is attributable to HC) Formative Indicators of HC Years of schooling contributes to a 60% of HC scores, Amount of training only 10%. QQ plot correlation test reflects normality of residuals (r =0.986)
9 A.7- HC DISTRIBUTION IN MONETARY VALUE …obtained by translating standardized ( ) HC distribution to have monetary mean ( ) by applying actuarial approach of Dagum-Slottje, 2000): is mean (over ages) of the amount of earned income expected by workers of age x in the life-cicle, actualised at a given discount rate, capitalized by rates of productivity (changing with age) and weighted by survival probability Monetary HC Earned income Median79,757 13,907 Average129,089 16,190 Gini ratio Monetary HC Earned income Years of school Days of FT work Monetary HC Earned income Years of school Days full time work Days of training Mean HC is more than eight times higher than income mean HC inequality is higher than income inequality confirming results in previous studies Validation: Correlations
10 A.8. RESULTS Empirical results have shown that …..although HC is the most significant factor of income variability, it explains only a little quota of labour income (20%), …..whereas an ample part is affected by dimensions linked to discrimination and career progression factors (gender, age, type of occupation) rather than education/training dimensions. This presentation has focused on a consistent technique for the estimation of the latent variable HC, specified in a realistic measurement, by utilizing routinely administrative archives.
11 Conceptual model of BSC for assessing corporate structures and nursing homes B HEALTH: BALANCE SCORE CARD The four dimensions are defined as Latent variables measured net to measurement errors on observed indicators
13 B.1 MULTILEVEL SIMULTANEOUS COMPONENT ANALYSIS (MSCA; Timmerman, 2006) Latent outcomes of BSC as Principal components of their observed indicators, separating the overall variability in the separate contribution due to the groups (variability among ASL) and due to individual contribution (variability among hospitals in the sameASL). B.2 PARTIAL LEAST SQUARES PATH ANALYS (PLSPM) The MSCA scores are integrated within PLSPM in order to estimate BSC model in a causal relationship framework applying the PLSPM algorithm separately to the within structure and to the between structure in order to obtain latent scores and causal parameters in two different domains: that involving hospitals and that involving ASL
14 The application of the MSCA in each block allows to obtain the scores of the four LVs (as principal components) which constitute the first step of the PLSPM algorithm (estimate of each LV as combination of its indicators) in a way more appropriated to the nature of BSC structure The final product of a PLSPM model in a MSCA perspective in the BSC framework is the estimation of two structural models, one for the between (ASL) and one for the within segment( hospitals), revealing possible different causal structures,
15 B.3 The PLSPM-MSCA FOR THE LOMBARDY HEALTH SYSTEM The research, involving 163 hospitals in the Lombardy Region, selected 24 indicators (Key performance index, KPI), representative of the four dimensions in the BSC structure The elimination of the units containing one or more missing values has determined a final presence of 129 hospitals In order to better compare the indicators, they have been normalized on a scale (indicators concerning costs indicate a 100 value for lowest cost). The two models to be estimated are that referring to the between structures (ASL, macro level units) in the BSC framework and within structures (hospitals, micro level units)
16 Apart from Economy (where ISST and ISAT have been eliminated), all the between and within blocks present highly: unidimensionality, for each LV, (all the MVs of a latent block must be image of a unique concept), mono-factorial validity (MVs of a block have to be higher correlated with their LV than other LVs), discriminant validity (the part of variability that each LV shares with its block of indicators, must be greater than the part of variability shared with the other LVs) justifying the estimation of causal relations between LVs and their MVs.
18 The BSC theoretical framework, specifying three causal relationships between four Key performance areas is not supported by enough empirical evidence to quantify the performance of the Lombardy Region Health System. This may be caused by the lack of useful indicators: Human Capital scores, collecting only two manifest indicators, do not bring a significant contribution to the causal relationships specified in the BSC structure However, the results concerning the measurement models and the Economy area structural equation suggest that a more simplified version of the theoretical model appears very promising in the presence of more complete data (above all in a longitudinal perspective).
19 What methodology will ensure that the composite of a particular set of indicators has the strongest possible relationship with the broader outcomes? It appears fundamental to create the specification of proper statistical models involving non-observable and multidimensional constructs, especially in the education and health fields.. C. THE PERSPECTIVES FOR EU ARCHIVES
20 The availability of official statistics at the EU level or administrative data bases can make it possible in order to utilize these models for building evaluation composite indicators on particular aspects of educational and health systems 1 Analyses concerning HC are performed only in a cross sectional framework (Census and survey data) while the availability of longitudinal information could enable the estimation of statistical earnings functions and education rate of return in a more consistent way,
21 The limitations posed by statistical information drawn from a census performed every ten years has stimulated countries such as Denmark, France, Switzerland and Canada to utilize administrative registers as the basis for reorganizing of their statistical systems for industries, (Statistics Canada, 1988; Eurostat, 1997), during the nineties. In Italy, in 1998, Istat was already integrating identifying micro data from the Statistical Archive of Active Firms (ASIA) with the partial data from analyses originating from the main statistical surveys conducted on industries, (Martini, 2000)..
22 2 In the Health Sector, the objective of developing national evaluation systems has stimulated various experimental efforts by national statistical institutions
In this perspective, a first european attempt was made within the European Community Health Indicators project, an international research carried out in the framework of the Health Monitoring Programme and the Community Public Health Programme In the ECHI project, an initial set of 40 indicators used by WHO-Europe, OECD and Eurostat in their international databases was proposed in the period After February 2004 ECHI proposed a longer list, containing a few items for which regular and comparable data collection is still possible within a very short period Nevertheless, the main limitation of this data deals with the disaggregation level.