Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers
Anne Chao, Chun-Huo Chiu, and Lou Jost
Abstract Conservation biologists need robust, intuitive mathematical tools to quantify and assess patterns and changes in biodiversity. Here we review some commonly used abundance-based species diversity measures and their phylogenetic generalizations. Most of the previous abundance-sensitive measures and their phylogenetic generalizations lack an essential property, the replication principle or doubling property. This often leads to inconsistent or counter-intuitive interpretations, especially in conservation applications. Hill numbers or the “effective number of species” obey the replication principle and thus resolve many of the interpretational problems. Hill numbers were recently extended to incorporate phylogeny; the resulting measures take into account phylogenetic differences between species while still satisfying the replication principle. We review the framework of phylogenetic diversity measures based on Hill numbers and their decomposition into independent alpha and beta components. Both additive and multiplicative decompositions lead to the same classes of normalized phylogenetic similarity or differentiation measures. These classes include multiple-assemblage phylogenetic generalizations of the Jaccard, Sørensen, Horn and Morisita-Horn measures. For two assemblages, these classes also include the commonly used UniFrac and PhyloSør indices as special cases. Our approach provides a mathematically rigorous, self-consistent, ecologically meaningful set of tools for conservationists who must assess the phylogenetic diversity and complementarity of potential protected areas. Our framework is applied to a real dataset to illustrate (i) how to use phylogenetic diversity profiles to completely convey species abundances and phylogenetic information among species in an assemblage; and (ii) how to use phylogenetic similarity (or differentiation) profiles to assess phylogenetic resemblance or difference among multiple assemblages.
Keywords Diversity • Diversity decomposition • Hill numbers • Phylogenetic diversity • Replication principle • Species diversity
Many of the most pressing and fundamental questions in biodiversity conservation require robust and sensible measures for quantifying and assessing changes in biodiversity. Many environmental and monitoring projects also require objective and meaningful similarity (or differentiation) measures to compare the diversities of multiple assemblages and their degree of complementarity in order to best conserve genetic, species, and ecosystem diversity. An enormous number of diversity measures and related similarity (or differentiation) indices have been proposed, not only in ecology but also in genetics, economics, information science, linguistics, physics, and social sciences, among others. See Magurran (2004) and Magurran and McGill (2011) for overviews.
In traditional species diversity measures, all species are considered to be equally different from each other; only species richness and abundances are involved. There are two general approaches: parametric and non-parametric (Magurran 2004). Parametric approaches assume a particular species abundance distribution (such as the lognormal or gamma) or a species rank abundance distribution (such as the negative binomial or log-series), and then use the parameters (e.g., Fisher's alpha) of the distribution to quantify diversity. However, these methods often do not perform well and the results are un-interpretable unless the “true” species abundance distribution is known (Colwell and Coddington 1994; Chao 2005). The parametric model also does not permit meaningful comparison of assemblages with different abundance distributions. For example, a log-normal abundance model cannot be compared to an assemblage whose abundance distribution follows a gamma distribution. Non-parametric methods make no assumptions about the distributional form of the underlying species abundance distribution. The most widely used abundancesensitive non-parametric measures have been the Shannon entropy and the GiniSimpson index. These two measures, along with species richness were integrated into a class of measures called generalized entropies (Havrdra and Charvat 1967; Daróczy 1970; Patil and Taillie 1979; Tsallis 1988; Keylock 2005), which will be briefly reviewed in this chapter.
How to quantify abundance-based species diversity in an assemblage has been one of the most controversial issues in community ecology (e.g. Hurlbert 1971; Routledge 1979; Patil and Taillie 1982; Purvis and Hector 2000; Jost 2006, 2007; Jost et al. 2010). There have also been intense debates on the choice of diversity partitioning schemes; see Ellison (2010) and the Forum that follows it. Surprisingly, all authors in that forum achieved a consensus on the use of Hill numbers, also called “effective number of species”, as the best choice to quantify abundance-based species diversity. Hill numbers are a mathematically unified family of diversity indices (differing among themselves only by a parameter q) that incorporate species richness and species relative abundances. They were first used in ecology by MacArthur (1965, 1972), developed by Hill (1973), and recently reintroduced to ecologists by Jost (2006, 2007).
Hill numbers obey the replication principle or doubling property, an essential mathematical property that capture biologists' notion of diversity (MacArthur 1965; Hill 1973). This property requires that if we have N equally diverse, equally large assemblages with no species in common, the diversity of the pooled assemblage must be N times the diversity of a single group. In other words, they are linear with respect to addition of equally-common species. We will review different versions of this property later. Classical diversity measures, such as Shannon entropy and the Gini-Simpson index, do not obey this principle and can lead to inconsistent or counter-intuitive interpretations, especially in conservation applications (Jost 2006, 2007). Hill numbers resolve many of the interpretational problems caused by classical diversity indices. Diversity measures that obey the replication principle yield self-consistent assessment in conservation applications, have intuitivelyinterpretable magnitudes, and can be meaningfully decomposed. In this chapter, Hill numbers are adopted as a general framework for quantifying and partitioning diversities.
Pielou (1975, p. 17) was the first to notice that traditional abundance-based species diversity measures could be broadened to include phylogenetic, functional, or other differences between species. We here concentrate on phylogenetic differences, though our framework can also be extended to functional traits (Tilman 2001; Petchey and Gaston 2002; Weiher 2011). For conservation purposes, an assemblage of phylogenetically divergent species is more diverse than an assemblage consisting of closely related species, all else being equal. Phylogenetic differences among species can be based directly on their evolutionary histories, either in the form of taxonomic classification or well-supported phylogenetic trees (Faith 1992; Warwick and Clarke 1995; McPeek and Miller 1996; Crozier 1997; Helmus et al. 2007; Webb 2000; Webb et al. 2002; Pavoine et al. 2010; Ives and Helmus 2010, 2011; Vellend et al. 2011; Cavender-Bares et al. 2009, 2012 among others). Three special issues in Ecology were devoted to integrating ecology and phylogenetics; see McPeek and Miller (1996), Webb et al. (2006), and Cavender-Bares et al. (2012) and papers in each issue. Phylogenetic diversity measures are especially relevant for conservation applications, since they quantify the amount of evolutionary history preserved by the assemblage; see Lean and MacLaurin (chapter “The Value of Phylogenetic Diversity”).
The most widely used phylogenetic metric is Faith's phylogenetic diversity (PD) (Faith 1992) which is defined as the sum of the branch lengths of a phylogenetic tree connecting all species in the target assemblage. As shown in Chao et al. (2010), Faith's PD can be regarded as a phylogenetic generalization of species richness. The rarefaction formula for Faith's PD was developed by Nipperess and Matsen (2013) and Nipperess (chapter “The Rarefaction of Phylogenetic Diversity: Formulation, Extension and Application”). Recently, Chao et al. (2015) derived an integrated sampling, rarefaction, and extrapolation methodology to compare Faith's PD of a set of assemblages. Like species richness, Faith's PD does not consider species abundances. For some conservation applications, the mere presence or absence of a species is all that matters, or all that can be determined from the available data. In those cases, Faith's PD is a good measure of phylogenetic diversity. However, there are important advantages to incorporating abundance information into phylogenetic diversity measures for conservation. For example, some human impacts can result in the phylogenetic simplification of an ecosystem, reducing the population shares of phylogenetically distinct species relative to typical species. An abundance-based measure can catch this effect before it leads to actual extinctions.
Ecosystem simplification may be worthy of conservation concern even if it does not lead to extinctions of focal organisms. Often, the focal organisms for conservation represent a tiny fraction of the ecosystem's biomass or richness. Each focal species will be tied to a web of non-focal species whose abundances are not usually monitored (e.g., insects). All else being equal, a more equitable distribution of the abundances of focal organisms will be able to support a more diverse, robust and stable set of non-focal species. Faith (chapter “Using Phylogenetic Dissimilarities Among Sites for Biodiversity Assessments and Conservation”) rightly argues that phylogenetic diversity is a good proxy for functional diversity. Therefore an ecosystem with a more equitable distribution of abundance across phylogenetic lineages should also exhibit greater functional complexity (per interaction between individuals) than an ecosystem whose phylogenetically unusual elements are rare. If we have to prioritize such ecosystems, the more phylogenetically equitable one, which thoroughly integrates diverse lineages, should be preferred. In addition to being more resistant to lineage extinctions, a complex, well-integrated ecosystem may be worth preserving in and of itself, above and beyond its component species; conservation is not just about species. Evolution may take a different course in ecosystems whose members are constantly surprised by their interactions compared with an ecosystem whose interactors are highly predictable. These conservation goals – robustness against extinction of distinctive lineages, and preservation of wellintegrated ecosystems with unique future option values – require phylogenetic diversity measures that incorporate species importance values.
Rao's quadratic entropy Q (Rao 1982), a generalization of the Gini-Simpson index, was the first diversity measure that accounts for both phylogeny and species abundances. The phylogenetic entropy HP (Allen et al. 2009) extends Shannon entropy to incorporate phylogenetic distances among species. Since Shannon entropy and the Gini-Simpson index do not obey the replication principle, neither do their phylogenetic generalizations. These generalizations will therefore have the same interpretational problems as their parent measures; see Chao et al. (2010, their Supplementary Material) for examples.
Chao et al. (2010) extended Hill numbers and related similarity measures to incorporate phylogeny. The new phylogenetic Hill numbers obey a generalized replication principle. Their measures were subsequently extended by Faith and Richards (2012) and Faith (2013). Both the original Hill numbers and their phylogenetic generalizations facilitate diversity decomposition (Jost 2007; Chiu et al. 2014). As with the original Hill numbers, both additive and multiplicative decompositions of phylogenetic Hill numbers lead to the same classes of similarity (or differentiation) measures. Hill numbers therefore provide a unified framework to quantify both abundance-based and phylogenetic diversity.
In this chapter, we first briefly review the classic abundance-based species diversity measures (section “Generalized Entropies”) and their phylogenetic generalizations (section “Phylogenetic generalized entropies”) for an assemblage. Then we focus on the framework of Hill numbers (section “Hill numbers and the replication principle”), phylogenetic Hill numbers (section “Phylogenetic Hill numbers and related measures”) and related phylogenetic diversity measures. We also discuss the replication principle and its phylogenetic generalization (section “Replication principle for phylogenetic diversity measures”). For multiple assemblages, we review the diversity decomposition based on phylogenetic diversity measures (section “Decomposition of phylogenetic diversity measures”). The associated phylogenetic similarity and differentiation measures are then presented (section “Normalized phylogenetic similarity measures”). We use a real example for illustration (section “An example”). Our practical recommendations are provided in section “Conclusion”.