# Data Envelopment Analysis

Charnes, Cooper, and Rhodes [CharnesCooperRhodesl978] described data envelopment analysis (DEA) as a mathematical programming model applied to observational data, providing a new way of obtaining empirical estimates of relationships among decision making units (DMUs) that take multiple inputs and produce multiple outputs. They were inspired by “relative efficiency” in combustion engineering. The definition of a DMU is generic and very flexible. Any object to be ranked can be a DMU, from individuals to government ministries. DEA has been formally defined as a methodology directed to frontier analysis, rather than to central tendencies. The technique is a “data input- output driven” approach for evaluating the relative performance of DMUs. DEA has been used to evaluate the performance or efficiencies of hospitals, schools, departments, university faculty, US Air Force Wings, armed forces recruiting agencies, universities, cities, courts, businesses, banking facilities, countries, regions, SOF airbases, key nodes in networks, ...; the list goes on. A Google Scholar search on “data envelopment analysis” returns over 330,000 results in under 0.1 seconds. According to Cooper ([CooperLSTTZ2001], the first item returned by the search), DEA has been used to gain insights into activities that were not able to be obtained by other quantitative or qualitative methods.

Data Envelopment Analysis as a Linear Program

A DEA model, in simplest terms, may be formulated and solved as a linear programming problem (Winston [Winston2002], Callen [Callenl991]). Although there are several representations for DEA, we’ll use the most straightforward formulation for maximizing the efficiency of the fcth DMU as constrained by inputs and outputs (shown in (8.1)). As an option, we may wish to normalize the metric inputs and outputs for the alternatives if the values are poorly scaled within the data. We will call this data matrix X with entries хц. Define each DMU or efficiency unit as E, for г = 1,2,... ,n for n DMUs. Let w, be the weights or coefficients for the linear combinations. Further, restrict any efficiency value from being larger than one (100%). Thus, the largest efficient DMU will have efficiency value 1. These requirements give the following linear programming formulation for DMUs with multiple inputs yielding a single output. For multiple inputs and outputs, we recommend using (8.2), the formulations provided by Winston ([Winston2002]) and Trick ([Trick1996]). Let X, be the inputs array and Y, be the outputs array for DMUj. Let Xq and Yq be for DMUo, the DMU being modeled, then Strengths and Limitations of DEA

DEA is a very useful tool when used wisely [Trickl996]. Strengths that make DEA very useful include:

• 1. DEA can handle multiple input and multiple output models;
• 2. DEA doesn’t require an assumption of a functional form relating inputs to outputs;
• 3. DMUs are directly compared against a peer or combination of peers; and
• 4. inputs and outputs can have very different units.

For example, X could be in units of lives saved, while X-2 could be in units of dollars spent without requiring any a priori tradeoff between the two.

The same characteristics that make DEA a powerful tool can also create limitations. An analyst should keep these limitations in mind when choosing whether or not to use DEA. Limitations include:

1. DEA is an extreme point technique, thus noise in the data, such as measurement error, can cause significant problems.

• 2. DEA is good at estimating relative efficiency of a DMU, but it does not directly measure absolute efficiency. In other words, DEA can show how well a DMU is doing compared to its peers, but not compared to a theoretical maximum.
• 3. Since DEA is a nonparametric technique, statistical hypothesis tests are difficult- they are the focus of ongoing research.
• 4. Since a standard formulation of DEA with multiple inputs and outputs creates a separate linear program for each DMU, large problems can be computationally extremely intensive.
• 5. Linear programming does not ensure all weights are considered. We find that the values for weights are only for those that optimally determine an efficiency rating. If having all criteria (all inputs and outputs) weighted is essential to the decision maker, then DEA is not appropriate.

Sensitivity Analysis

Sensitivity analysis is always an important element in every modeling project. According to Nerali ([Neralicl998]), an increase in any output cannot worsen an efficiency rating, nor can a decrease in inputs alone worsen an already achieved efficiency rating. As a result, in our examples we only decrease outputs and increase inputs. We will briefly illustrate sensitivity analysis, as applicable, in the examples.

Example 8.1. Manufacturing Units.

A manufacturing process involves three DMUs each having two inputs and three outputs. Management wishes to assess the efficiency of each DMU in order to target resources to improve performance. The data appears in Table 8.2.

TABLE 8.2: Manufacturing DMU Data

 DMU Input 1 Input 2 Output 1 Output 2 Output 3 I 5 14 9 4 16 II 8 15 5 7 10 III 7 12 4 9 13

Since no units are given and the values have similar scales, the data doesn’t have to be normalized.

Define the variables

t,, = value of a single unit of output of DMU,

Wi = cost or weights for one unit of inputs to DMU;

X = matrix of input data Y = matrix of output data

DMUi = objective function for DMU,’s linear program

Effi = relative efficiency of DMU;, with a vector of weights

all for i = 1, 2, and 3.

Assume that

• • No DMU can have an efficiency of more than 100%.
• • If any efficiency is less than 100%, then that DMU is inefficient.
• • The costs are scaled so that the costs of the inputs equals 1 for each linear program. For example, use 5w + 14w2 = 1 in the LP for DMUi.
• • All values and weights must be strictly positive. (We may have to use a constant such as 0.0001 in lieu of 0 in inequalities to help numeric routines converge.)

To calculate the efficiency of DMUi, use the linear program To calculate the efficiency of DMU2, use the linear program To calculate the efficiency of DMU3, use the linear program Use Maple to solve the three linear programs.

|j> with(Optimization) :

Define the input and output data matrices. Using the Matrix Palette makes entry easier and less error prone. Set the size, then click Insert Matrix. Define the decision variables and weights. Compute the LP’s objective functions. Set up the constraints.  Now solve the three linear programs. The linear program solutions show the relative efficiencies of DMUi and DMU3 are 100%, while DMLVs is 77.3%.

Interpretation. DMU2 is operating at 77.3% of the efficiency of DMUj and DMU3. Management could concentrate on improvements for DMU2 by taking best practices from DMUi or DMU3.

To compute the shadow prices for the linear programs, paying special attention to those of DMU2, solve the dual LPs. Maple's dual command in the simplex package does not handle equality constraints, so replace all equalities with two inequalities < and >. Examining the shadow prices from the dual linear program for DMU2 shows Л5 = 0.26. Л4 = 0.66, and ЛЗ = 0. The average output vector for DMU2 can be written as and its average input vector is Output 3 in Table 8.2 is 10 units. Thus, the inefficiency is in Output 3 where 12.785 units are required. We find that they are short 2.785 units (= 12.785— 10). This calculation helps focus on treating the inefficiency found for Output 3.

Sensitivity Analysis. In linear programming, sensitivity analysis is sometimes referred to as “what if” analysis. Assume that without management providing some additional training, DML^’s Output 3 value dips from 10 to 9 units, while Input 2 increases from 15 to 16. We find that these changes in the technology coefficients are easily handled when re-solving the LPs. Since DMU2 is affected, we might only modify and solve the LP for DMU2. With these changes, DMl^’s efficiency is now only 74% of DMU^ or DMU3.

Example 8.2. Ranking Five Departments in a College.

Five science departments in the College of Arts & Sciences are scheduled for review. The dean has provided the data in Table 8.3 and asked for relative efficiency ratings.2

TABLE 8.3: Arts & Sciences Department’s Data

 Department Inputs Outputs No. Faculty Student Cr. Hr. No. Students Total Degrees Biology 25 18,341 9,086 63 Chemistry 15 8,190 4,049 23 Comp. Sci. 10 2,857 1,255 31 Math. 33 22,277 6,102 31 Physics 12 6,830 2,910 19

Since the data values differ by orders of magnitude, divide both student credit hours and number of students by 1,000.

Follow the same sequence of Maple commands as in the previous example.  We see the DMUs are ranked as: Biology and Computer Science: 100%; Mathematics: 92%; Chemistry: 74%; and Physics: 71%.

Examine the results from the dual LPs. Comparing the values of the As. improving Output 2, Numbers of Students, will provide the largest gains in efficiency for both Chemistry and Physics.

Exercises

1. Table 8.4 lists data for three hospitals where inputs are number of beds and labor hours in thousands per month, and outputs, all measured in hundreds, are patient-days for patients under 14, between 14 and 65, and over 65. Determine the relative efficiency of the three hospitals.

TABLE 8.4: Three Hospitals’ Data

 Hospital Inputs Outputs No. Beds Labor Hr. < 14 14-65 > 65 I 5 14 9 4 16 II 8 15 5 7 10 III 7 12 4 9 13

2. The three hospitals of Exercise 1 have revised procedures. Reanalyze their relative efficiencies using the new data of Table 8.5.

TABLE 8.5: Three Hospitals’ Revised Data

 Hospital Inputs Outputs No. Beds Labor Hr. < 14 14-65 > 65 I 4 16 6 5 15 II 9 13 10 6 9 II 5 11 5 10 12

3. The First National Bank of Spruce Pine, NC, has four branches in the greater Spruce Pine metropolitan area. The CEO directed an efficiency study be undertaken. The data to be collected is:

INPUT 1: labor hours (hundred per month)

INPUT 2: space used for tellers (hundreds of square feet)

INPUT 3: supplies used (dollars per month)

OUTPUT 1: loan applications per month

OUTPUT 2: deposits (thousands of dollars per month)

OUTPUT 3: checks processed (thousands of dollars per month)

The data for the bank branches appears in Table 8.6.

TABLE 8.6: Bank Branch Data

 Branch Inputs Outputs Labor Hr. Space. Supplies Loans Deposits Checks I 15 20 50 200 15 35 II 14 23 51 220 18 45 III 16 19 51 210 17 20 IV 13 18 49 199 21 35
• (a) Determine the branches’ relative efficiencies.
• (b) What “best practices” might you suggest to the branches that are less efficient?