# Integer Programming

Integer Programming (IP; Dantzig et al. 1954; Gomory 1958) is a widely used and powerful optimization technique to solve a variety of decision-making problems (Wolsey 1998; Jünger et al. 2010). IP methods maximize or minimize a linear objective function subject to linear constraints (equalities or inequalities) when one or more variables are restricted to be integers. Theoretically solving IP is NP-hard. However, thanks to powerful solvers like CPLEX (2012) and GUROBI (Gurobi Optimization Inc. 2013), problems with thousands of variables and constraints can be solved optimally within reasonable time (Jünger et al. 2010; and references therein).

The first application of IP in conservation problems goes back to (Cocks and Baird 1989), who solved the reserve selection (Problem 4) under species richness. Such IP formulations have been extended to more realistic scenarios (Underhill 1994; Church et al. 1996; Possingham et al. 2000), to maximize PD (Rodrigues and Gaston 2002; Rodrigues et al. 2005), and to maximize SD (Minh et al. 2010).

Here, we show how to model biodiversity optimization problems 1–5 in IP parlance, which allows available IP software packages to solve the problem. We first introduce some notations and definitions and further exemplify IP formulations for Problems 1–5 using the pheasant data set.

#### IP for Taxon Selection Problems

Given a set of *n* taxa, we encode a subset *S* by an *n*-element binary vector with entries of 0 and 1 indicating the absence and presence of the corresponding taxa in

*S*. The elements of this vector are called taxon variables. For the pheasant data set there are ten taxon variables *xPB*, *xPC*, *xPE*, *xPG*, *xPI*, *xPM*, *xGG*, *xGL*, *xGS*, *xGV* (indices follow initials of species names). We say that a split *σ* = *A*|*B* is preserved in *S* if *A* and *B* each contain at least one taxon from *S*. For each split *σ* we introduce a binary split variable *yσ*, where *yσ* = 1 if *σ* is *preserved* in *S* and 0 otherwise.

Each *yσ* is fully identified from taxon variables by two split constraints as follows. *σ*1 is a trivial split that separates *P. bicalcaratum* from the remaining taxa. *σ*1 is preserved (i.e. *y*1 = 1) if *P. bicalcaratum* and at least another taxon are preserved (see Fig. 2a for the definition of the splits). This condition is expressed by two inequalities:

*y*1 £ *xPB* , *y*1 £ *xPC* + *xPE* + *xPG* + *xPI* + *xPM* + *xGG* + *xGL* + *xGS* + *xGV*

In fact, the second inequality always holds because *k* ≥ 2 and thus is ignored. Now consider the non-trivial split *σ*17, which separates *G. gallus*, *G. lafayetii*, and *G. varius* from the remaining taxa. *σ*17 is preserved if at least one of *G. gallus*, *G. lafayetii*, and *G. varius* and one of the remaining taxa are preserved. Therefore,

*y*17 £ *xGG* + *xGL* + *xGV* , *y*17 £ *xPB* + *xPC* + *xPI* + *xPG* + *xPE* + *xPM* + *xGS*

The remaining split constraints are listed in Table 3.

Based on split variables one can rewrite SD of *S* as:

where *λσ* is the weight of split *σ*. This is the objective function that we want to maxi-

mize for all problems (1–5).

In the taxon selection Problem 1 the size of an optimal subset is constrained by a

predefined number *k*, meaning that:

*xPB* + *xPC* + *xPE* + *xPG* + *xPI* + *xPM* + *xGG* + *xGL* + *xGS* + *xGV* £ *k* (2)

We also require that taxon and split variables are binary

Suppose we are given a total budget *B*. Let *ci* denote conservation costs for taxon

*i*. We can then substitute constraint (2) by the budget constraint

Together with previous constraints we have the IP formulation of Problem 2 by:

We now model viability constraints that operate on taxon variables as follows.

*G. sonneratii* depends on *P.malacense* and *P.germaini* (Fig. 4). Therefore, the viability constraint for *G. sonneratii* is simply

*xPM* + *xPG* ³ *xGS*

This ensures that *xGS* is 1 (i.e., *G. sonneratii* is selected for conservation) only if at least one of *xPM* and *xPG* is also 1. Viability constraints for all the other taxa are listed in Table 3. Now, the IP formulation for viable taxon selection can be obtained by simply including viability constraints to Problem 1:

#### IP for Reserve Selection Problems

For reserve selection we encode a subset *W* of *m* areas by a binary vector (*z*1,*z*2,…,*zm*), where *zr* is 1 if area *r* is present in *W*, and 0 otherwise. We call *zr* area variables. For the pheasant habitat (Table 1) we have eight area variables *zID*, *zLK*, *zBT*, *zIN*, *zPH*, *zMY*, *zTH*, *zVN* (indices follow two-letter country codes). We now redefine split constraints

in terms of area variables instead of taxon variables as follows.

Split *σ*18, which separates *G. lafayetii* and *G. varius* from the others, is preserved if (1) *G. lafayetii* or *G. varius* is preserved and (2) at least one of the remaining taxa is preserved. Because *G. lafayetii* or *G. varius* occur in Indonesia and Sri Lanka, condition 1 is equivalent to: *y*18 £ *zID* + *zLK*

Similarly condition 2 is equivalent to:

*y*18 £ *zBT* + *zID* + *zIN* + *zPH* + *zMY* + *zTH* + *zVN*

since the remaining taxa are found in all areas except Sri Lanka. Such area-split

constraints for all other splits are listed in Table 4.

The subset size constraint has to be rewritten for countries:

*zID* + *zLK* + *zBT* + *zIN* + *zPH* + *zMY* + *zTH* + *zVN* £ *k* (8)

We keep binary constraints for split variables and also include such for area variables

Reserve selection problem is then formulated as follows:

For budgeted reserve selection we are given a total budget *B*. Let *cID*, *cLK*, *cBT*, *cIN*, *cPH*, *cMY*, *cTH*, *cVN* denote conservation costs for each country. Then a budget constraint for areas is

To obtain the IP formulation for Problem 5 we simply substitute subset size constraint (8) by the budget constraint (11).

# Other Algorithms

While greedy algorithms and IP are general strategies for all Problems 1–5, other algorithms have been applied to solve special cases. For example, simulated annealing algorithms (Possingham et al. 2000) were introduced to solve the reserve selection Problems 4 and 5 under species richness with an opportunity to minimize the connectivity between the areas such as the boundary lengths. Dynamic programming algorithms (DPA) have been applied to solve Problem 2 under PD (Pardi and Goldman 2007). DPA was further extended to maximize SD on circular split networks (Minh et al. 2009a, b). Other special types of split networks were exploited to solve Problem 1 (Spillner et al. 2008; Bordewich et al. 2009).