Using Phylogenetic Dissimilarities Among Sites for Biodiversity Assessments and Conservation
Daniel P. Faith
Abstract The PD phylogenetic diversity measure provides a measure of biodiversity that reflects variety at the level of features, among species or other taxa. PD is based on a simple model which assumes that shared ancestry explains shared features. PD provides a family of calculations that operate as if we were directly counting up features of taxa. PD-dissimilarity or phylogenetic beta diversity compares the branches/features represented by two different areas. We also can consider a companion model, which shifts the focus to shared habitat/environment among taxa as the explanation of shared features, including those features not explained by shared ancestry and PD. That model means that PD-dissimilarities, among sampled and unsampled sites, can be predicted using a regression method applied to distances in an environmental-gradients space. However, PD-based conservation planning requires more than the dissimilarities among all sites, in order to make decisions informed by gains and losses of branches/features. The companion model also suggests how to transform dissimilarities to provide these needed estimates. This ED (“Environmental Diversity”) method out-performs other suggested strategies for analysis of dissimilarities, including the Ferrier et al. method and the Arponen et al. method. The global biodiversity observation network (GEO BON) can use the ED method for inferences of biodiversity change that include loss of phylogenetic diversity.
Keywords Environmental diversity • Phylogenetic beta diversity • ED comple-
mentarity • Conservation planning • Biodiversity monitoring
This book addresses important concepts, methods, and applications related to the role of evolutionary history in biodiversity conservation. In the chapter “The PD Phylogenetic Diversity Framework: Linking Evolutionary History to Feature Diversity for Biodiversity Conservation” (Faith 2015a), I reviewed the reasons why we want to conserve evolutionary history. An important rationale is that the tree of life is a storehouse of variation among taxa, and so provides possible future benefits for humans (for discussion, see Faith et al. 2010). I also reviewed the justifications for a specific biodiversity measure. It interprets the degree of representation of evolutionary history as a phylogenetic measure of biodiversity, or “phylogenetic diversity”. This measure of phylogenetic diversity, called “PD” (Faith 1992a, b) is justified as a useful biodiversity measure through its link to “feature diversity”. Feature diversity represents biodiversity “option values” – the term we use to refer to all those potential future benefits for humans – and so is well-justified as a target for biodiversity conservation. Forest et al. (2007) provide a good exemplar study, illustrating how PD links to feature diversity and to food, medicine, and other benefits to humans.
Faith (2002) summarised the link between evolutionary history, PD, and features as follows: “representation of “evolutionary history” (Faith 1994) encompassing processes of cladogenesis and anagenesis is assumed to provide representation of the feature diversity of organisms. Specifically, the phylogenetic diversity (PD) measure estimates the relative feature diversity of any nominated set of species by the sum of the lengths of all those phylogenetic branches spanned by the set…”
The calculation of the PD for a given subset of species (sampled from a phylogenetic tree) is quite simple. It is given by the minimum total length of all the phylogenetic branches required to connect all those species on the tree. However, calculation of PD is attempting something that is not all that simple – an inference of the relative feature diversity of that subset of species. The basis for this inference is an evolutionary model in which branch lengths reflect evolutionary changes, and shared ancestry accounts for shared features (Faith 1992a, b). The model implies that PD, in effect, counts-up the relative number of features represented by a given subset of species (or other taxa, including populations within a species); any subset of species that has greater PD will be expected to have greater feature diversity.
In chapter “The PD Phylogenetic Diversity Framework: Linking Evolutionary History to Feature Diversity for Biodiversity Conservation”, I described another important implication of the link to feature diversity: PD provides, not one single measure, but a set of calculations interpretable at the level of features of taxa. This helps guide the assessment of the phylogenetic diversity gains and losses from changing probabilities of extinction of species (or other taxa). This PD “calculus” also can help with the conservation problem addressed in this paper: assessing PD gains and losses when we gain or lose geographic areas. PD has long been integrated into conservation planning for areas (Walker and Faith 1994). However, the work so far has largely ignored the problem of geographic knowledge gaps; we do not know about the phylogenetic diversity represented in every area in a given region. Consequently, for conservation planning, we have to estimate or model these missing quantities, using spatial models incorporating predictive environmental variables.
One pathway for such predictions can take advantage of a part of the PD calculus called “PD-dissimilarities” or “phylogenetic beta diversity” (Fig. 1a; see also Lozupone and Knight 2005; Ferrier et al. 2007; Nipperess et al. 2010; Swenson 2011). PD-dissimilarities can be interpreted as compositional dissimilarities, based
Fig. 1 (a) A hypothetical phylogenetic tree with 5 taxa. Along the top, the presence of the taxa in two sites, j and k, is shown by + marks. The dashed-line branches indicate features only represented in j; hatched branches indicate features only represented in k; bold branches indicate features represented in both; the thin branch indicates features in neither. The presence absence version of Bray-Curtis type PD-dissimilarity between sites j and k counts the number of features in j, not k (length of dashed branches) plus the number of features in k, not j (length of hatched branches), divided by the sum of the total number of features found in each (length of dashed plus length of bold branches, plus length of hatched plus length of bold branches). Other PD-dissimilarity measures combine these counts in other ways. (b) A hypothetical environmental gradient (hollowline) with positions of sites, j, k, and l. Suppose that positions of sites along this gradient reflect their features. Sites with a given feature are found in a corresponding part of the gradient. This clumping is called a “unimodal” response. Above the gradient is the hypothetical unimodal distribution of the branches and corresponding features/branches from 1a. Under the unimodal response model, the features in both j and k, for example, form the bold line segment. This unimodal relationship means that the Bray-Curtis type PD-dissimilarity has the most robust link to distances along environmental gradients (or in environmental space; for discussion, see Faith et al. 1987). For further information, also see Faith et al. (2009)
on the branches/features represented at the different sites (a site represents all branches that are ancestral to any of its member species). These calculations are “community-based” approaches in that they compare areas based on the set of elements (the community) found in each area. We can think of the standard compositional dissimilarity measures conventionally applied at the species level as simply re-caste at the level of features, through the PD model (Fig. 1a; for discussion, see Faith 2013).
Spatial predictions can use a form of regression in which PD-dissimilarities between sites are explained and predicted by the known environmental distances between sites. Thus, we can predict the PD-dissimilarity between two un-sampled sites, given their environmental difference. Generalized dissimilarity modelling (GDM; Ferrier 2002; Ferrier et al. 2004, 2007; see also Faith and Ferrier 2002), an extension of matrix regression, is useful for these predictions. GDM realistically allows for a very general monotonic, curvilinear, relationship between increasing environmental distance and compositional dissimilarity. It is also robust in allowing for variation in the rate of compositional change at different positions along environmental gradients. GDM was developed for species-level dissimilarities, but has been extended to the prediction of PD-dissimilarities (Ferrier et al. 2007; Faith et al. 2009; Rosauer et al. 2013).
There are several ways to calculate a PD-dissimilarity (see Fig. 1a, b). The choice of the PD-dissimilarity measure for such analyses can be guided by another critical model, which makes additional assumptions about how features link to environmental variables. To understand the nature of this model, it is important to note that Faith (1992a, b; see also Faith 1996) was careful to point out that PD's sharedancestry/shared-features model provides a general prediction about feature diversity, but naturally does not apply to all possible features. This early work proposed that a companion model also can account for shared features, including those that are not explained by shared ancestry (e.g. those features that are convergent, arising independently on the phylogenetic tree). Here, a pattern among species describing shared habitat or environment explains shared branches/features (Fig. 1b; Faith 1989, 1996, 2015b; Faith et al. 2009). Figure 1b illustrates how shared habitat or environment explains shared features: the sites sharing particular branches or features form clumps or clusters in the environmental space (see also Fig. 2). I will refer to this as unimodal response (analogous the well-known unimodal response of species to environmental gradients; see e.g. Faith et al. 1987). This unimodal relationship (Fig. 1b) means that the Bray-Curtis type PD-dissimilarity has the most robust link to distances along environmental gradients (or in environmental space; for discussion, see Faith et al. 1987).
This simple model arguably deserves to make a greater contribution towards our understanding of biodiversity methods. For example, an under-appreciation of this companion model has meant that some workers (Kelly et al. 2014) still naively characterise PD as intended to account for all features, including those convergently derived. Similarly, the role of this model in explaining habitat-driven feature diversity has been neglected in the development of functional trait diversity measures (discussed in Faith 2015b). In this paper, I discuss another good reason to consider
Fig. 2 Bray-Curtis type PD-dissimilarities can be used in robust ordination methods to recover key gradients. A re-drawing of the gradient space from Rintala et al. (2008; see also Faith et al. 2009) for microbial communities in house dust and a microbial phylogenetic tree. Dots versus squares correspond to samples from two different buildings (for details of sampling see Rintala et al.). Arrows at the right side indicate major gradients revealed by the ordination. A sample locality represents the branch corresponding to a given family if the locality has one or more descendants of that branch. The two-dimensional space shows unimodal response for four branches (Acidaminococcaceae, Aerococcaceae, Enterobacteriaceae, Acetobacteraceae). For further information, see Faith et al. (2009)
this shared-habitat/shared-features model: it can fill a critical gap in our attempts to effectively use PD-dissimilarities for biodiversity assessments.
We can predict the Bray-Curtis type PD-dissimilarities from environmental distances using a GDM regression. However, this is a mixed blessing. We produce PD-dissimilarities for all pairs of sites, but a difficulty is that these dissimilarities do not directly tell us what we want to know for conservation planning – the total phylogenetic diversity represented by a given subset of areas, or the gain or loss in PD if an site is gained or lost. To fill this gap, we need to convert the pairwise dissimilarities into inferences about PD representation and/or gains and losses. I will show how the shared-habitat/shared-features model can guide this analysis.
While there are several natural candidate approaches for taking this extra analysis step (each extends methods applied to species-level dissimilarities), surprisingly, there is no established, accepted method. One proposed approach, based on the unimodal response model, is the ED (“environmental diversity”) method (defined below; see also Faith and Walker 1996a, b, c), which has for some time been linked to GDM and species-level dissimilarities (Faith and Ferrier 2002). Faith et al. (2009) proposed the application of ED to the predicted dissimilarities from phylogenetic GDM analyses, but there are no worked examples exploring this approach. Another attractive method, linked strongly to the GDM approach, is the Ferrier et al. (2004) index. This measure modifies the ED approach and has been applied for specieslevel dissimilarities. A closely related method is that of Arponen et al. (2008).
Both of these have commonalities with ED, but the similarities and differences – and the strengths and weaknesses – among these alternative candidate measures has not been explored and documented (for related discussion, see Ferrier and Drielsma 2010). Given this fundamental gap in building the complete toolbox of PD calculations for conservation, and given the lack of synthesis among candidate methods, this chapter will proceed as follows. I first show how the same model of sharedenvironment/shared-featuresthatjustifiesthechoiceamongpossible PD-dissimilarity measures (Fig. 1a, b), also justifies the choice of the ED method. I then present a sample application of ED to PD-dissimilarities. I also present a simple graphical description of ED in the one dimensional case, which clarifies how ED estimates representation and gains and losses. I then use this graphical representation to reveal key properties of the alternative methods, suggesting critical weaknesses of the Ferrier et al. and Arponen et al. methods. I finish on a positive note, pointing to future work, including expanding the range of calculations useful for conservation assessment based on ED.