# Data Envelopment Analysis with Unbalanced Data Timo Kuosmanen (Wageningen University, The Netherlands) INFORMS Annual Meeting, Atlanta 19-22 October 2003.

## Presentation on theme: "Data Envelopment Analysis with Unbalanced Data Timo Kuosmanen (Wageningen University, The Netherlands) INFORMS Annual Meeting, Atlanta 19-22 October 2003."— Presentation transcript:

Data Envelopment Analysis with Unbalanced Data Timo Kuosmanen (Wageningen University, The Netherlands) INFORMS Annual Meeting, Atlanta 19-22 October 2003

Unbalanced data? Suppose output j of DMU k is missing (unavailable).

Unbalanced data? Suppose output j of DMU k is missing (unavailable). Usual approach is to restore a balanced output matrix by excluding DMU k

Unbalanced data? Suppose output j of DMU k is missing (unavailable). Usual approach is to restore a balanced output matrix by excluding DMU k

Unbalanced data? Suppose output j of DMU k is missing (unavailable). Usual approach is to restore a balanced output matrix by excluding DMU k excluding output j

Problems Both approaches involve a loss of information about production possibilities in observed outputs of discarded DMU k observed values of excluded output j The choice to exclude either DMU or output influences the results Criteria for excluding rows/columns are typically not explicitly reported

Proposition Why dont we simply tolerate the missing piece of data and denote the missing output value by zero (0)? Zero is the theoretical lower bound for output values. No technical reason for including 0 outputs in DEA.

Notation Define the following production possibility sets: T DMU : exclude the DMU with missing value T Y : exclude the output with missing value T UB : denote missing output by 0 T IDEAL : ideal case where all data are available

Main Theorem Production possibility sets T UB, T IDEAL, T DMU, and T Y are nested in the sense that

Example (2 outputs, 5 DMUs)

Influence on efficiency scores Theorem 2: For DMU k with missing value of output j, using unbalanced data and eliminating output j yield equal DEA efficiency scores. Theorem 3: For DMU l with complete data, using unbalanced data can only yield worse efficiency score than excluding DMU k with missing data from the reference set.

Equity issues The unbalanced DEA model imposes DMUs with missing outputs more stringent efficiency criteria might be viewed unfair incentives for collecting & reporting data Even if we exclude DMUs with missing outputs from efficiency comparisons / rankings, there is no harm in including them in the reference technology! Might adjust the efficiency scores to take into account differences in dimensionality across DMUs?

Extensions Missing inputs can be handled analogously by labeling blank entries by some big M. Weight restrictions can interfere with the results in unintended way. We may relax weight restrictions by writing them as ----------->

Case study: Sustainable Development indices Cherchye & Kuosmanen (2002) use DEA to construct a meta-index of Sustainable Development (SD) from 14 (SD) indicators for 154 countries. The 14x143 data matrix contains 2156 elements, of which 18% (= 395 elements) were missing. Complete data available only for 14 countries.

Comparison of approaches

Conclusions A first systematic attempt to analyze the effects of eliminating missing values Keeping blank entries in the output data can only improve estimation of the production frontier. Differences in dimensionality across DMUs can be unfair for DMUs with good performance in missing outputs Research question: Can a fair handicap system be constructed for making efficiency scores better comparable if dimensionality differs across DMUs???

Want to read more? Full paper can be downloaded from my homepage: http://www.sls.wau.nl/enr/staff/kuosmanen/ Or send e-mail to: Timo.Kuosmanen@wur.nl

Download ppt "Data Envelopment Analysis with Unbalanced Data Timo Kuosmanen (Wageningen University, The Netherlands) INFORMS Annual Meeting, Atlanta 19-22 October 2003."

Similar presentations