The Measure of Split Diversity

Given a split set Σ, the SD of a taxon subset Y is defined as the sum of the weights λ of all splits separating taxa in Y. Here, a split A | B Î S separates Y if Y Ç A and Y Ç B are both non-empty. Thus, we get

To illustrate, given Σ in Fig. 2, for Y={P. malacense, P. germaini, P. emphanum,

G. lafayetii} we have SD(Y ) = l3 + l4 + l6 + l8 + l13 + l14 +¼+ l19 , where li = ls


is defined as the average of the corresponding branch lengths in TCYB and TDCoH3.

Here, contradicting splits such as σ17 and σ19 are considered in the SD computation.

If the split set Σ corresponds to a tree (i.e. no conflicting splits exist in Σ), then SD is equivalent to PD. The definition of SD therefore generalizes PD. For this reason we focus on SD for the remaining of the chapter.

Biodiversity Optimization Problems

Conservation problems mainly fall into two categories: taxon selection and reserve selection (Fig. 3), where the conservation targets are either taxa or geographical areas, respectively. Under PD, the simplest taxon selection problem (Faith 1992) is

Fig. 3 The “network” of biodiversity optimization problems

to identify a subset of k taxa that maximizes PD on a phylogenetic tree of n taxa (2 ≤ k < n). For reserve selection we define PD on a subset of areas as the PD of the union taxon set of the areas. The simplest reserve selection problem is analogously to identify a subset of k areas that maximizes PD over all subsets of k areas. In the following, we reformulate these problems using SD and further integrate economical and ecological constraints into the extensions.

Taxon Selection Problems

We start with the simplest taxon selection problem formally defined as:

As an illustration, given the split set for ten pheasants (Fig. 2) we want to select

four taxa maximizing SD. By doing so we yield an optimal subset (highlighted in bold-face; Fig. 2b), which shares three taxa (P. emphanum, P. malacense, and

G. lafayetii) with the CYB-based subset (left panel of Fig. 1) and only two taxa (P. emphanum and P. germaini) with DCoH3-based subset (right panel of Fig. 1). The SD approach therefore provides a “consensus” solution over the two independent PD analyses. Problem 1 is known to be NP-hard (Spillner et al. 2008), which means that to find an optimal set it may, in the worst case, necessary to compute the SD for the exponentially many subsets n.

Problem 1 implicitly assumes that each taxon requires the same amount of resources for conservation. If we knew the preservation costs for each taxon and were provided with a finite budget, then a more realistic scenario is to allocate this budget among the taxa so as to obtain the highest diversity. This process is known as conservation triage (Bottrill et al. 2008) and formally defined as:

Problem 1 and 2 ignore ecological relationships between taxa. In real life species interact with each other within a dependency network such as predator-prey relationships (Witting et al. 2000; van der Heide et al. 2005; Moulton et al. 2007). In general, a dependency network is, typically, an acyclic directed graph, where nodes in the graph represent taxa and edges represent dependencies between nodes. Figure 4 shows an artificial example of such a network for the pheasants. Here, G. sonneratii depends on P.malacense and P.germaini, depicted by two edges connecting G.sonneratii with these two taxa. We note that this is a purely fictional example, but it illustrates the major principles of including a dependency structure in conservation decisions.

A taxon is called viable in a subset of taxa if this taxon does not depend on any other taxon, or if it does depend on some taxa, then at least one of them is also present in the subset. For example, G.sonneratii is viable in a subset if this subset also contains P.malacense or P.germaini. P.emphanum and G.gallus are viable in any (sub)set since they do not depend on any other species.

A subset is called viable if all its taxa are viable in this set. For example,

{P. emphanum, P. bicalcaratum, P. germaini, G. sonneratii} is a viable subset, whereas {P.emphanum, P.bicalcaratum, G.lafayetii, G.sonneratii} is not viable.

We now formally define the viable taxon selection problem as

Fig. 4 Artificial example of dependency network for the pheasant data set

Reserve Selection Problems

For reserve selection we define the SD of a subset of areas as the SD of the union set of taxa present in these areas. The reserve selection is formalized as:

To illustrate the problem consider the geographical distribution of the ten pheasants (Table 1). The data were obtained from the global biodiversity information facility (; accessed on December 1st, 2013), where a country is listed as habitat only if there are at least three observations for the species. Table 1 shows that these pheasants occur in eight countries in South Asia. G. gallus and P. bicalcaratum occur in seven and two countries, respectively, whereas the remaining species are endemic to one country. Indonesia and Malaysia each host three species, Sri Lanka only one species, and the remaining fi e countries are home to two species each.

If one wants to select four countries with maximal diversity, then the decision heavily depends on the trees or network (Figs. 1 and 2b). Table 2 shows that using

Table 1 Presence/absence of ten pheasants in eight countries obtained from the global biodiversity information facility (

Table 2 Four countries maximizing PD on the CYB tree (first column), PD on the DCoH3 tree (second column), and SD on the split network (third column)


PD – DCoH3


Malaysia (3)

Indonesia (3)

Malaysia (3)

Philippines (2)

Malaysia (3)

Philippines (2)

Sri Lanka (1)

Philippines (2)

Indonesia (3)

India (2)

Vietnam (2)

India (2)

Highlighted in boldface are the countries present in all optimal sets. The number of species present in the country is given in brackets

the CYB and DCoH3 regions, the optimal sets only overlap in two countries: Malaysia and Philippines. If we now maximize SD instead, then the optimal set includes these two countries, the third one preferred by the PD-DCoH3 set (Indonesia), and the fourth one by the PD-CYB set (India). The union of the species sets for the selected areas contains seven species.

If budget data is available, then we have a budgeted reserve selection problem. Here, preserving these species in each country comes at a cost and we need to select those countries that maximize SD within an allocated budget.

< Prev   CONTENTS   Next >