Phylogenetic Hill Numbers and Related Measures
When the branch lengths are proportional to divergence time, all branch tips are the same distance from the root (the first node). Such trees are called “ultrametric” trees. We first discuss the phylogenetic diversity measures for ultrametric trees. The phylogenetic Hill numbers developed by Chao et al. (2010) for an ultrametric tree can be intuitively explained as the Hill number of a time-average of a tree's generalized entropy over some evolutionary time interval of interest. Suppose the phylogenetic tree for an assemblage is calibrated to some relative or absolute timescale. We can slice this phylogenetic tree at any time t in the past; see the left panel of Fig. 1 (reproduced from Chao et al. 2010) for illustration and details about how to deal with shared lineages. The number of lineages at that time is the number of branch cuts, and the relative importance of each of these lineages for the present-day assemblage is the sum of the relative abundances of the branch's descendants in the present-day assemblage. Using these relative importance values, we can calculate the generalized entropy of order q for the slice. The mean of these entropies, beginning at time –T (i.e., T years before present) and continuing until the present, is converted to a Hill number using Eq. (3c). This is the phylogenetic Hill number, which conveys information about the shape of the tree over the time interval of interest. Chao et al. (2010) symbolize it as q D (T ) , and also refer to it as the mean phylogenetic diversity of order q over T years (or simply the mean diversity for the interval [−T, 0]):
where BT is the set of all branches in the time interval [−T, 0], Li is the length of branch i in the set BT, and ai is the total relative abundance descended from branch
i. The mean diversity q D (T ) is interpreted as “the effective number of equally abundant and equally distinct lineages all with branch lengths T during the time
Fig. 1 (a) A hypothetical ultrametric rooted phylogenetic tree with four species. Three different slices corresponding to three different times are shown. For a fixed T (not restricted to the age of the root), the nodes divide the phylogenetic tree into segments 1, 2 and 3 with duration (length) T1, T2 and T3, respectively. In any moment of segment 1, there are four species (i.e. four branches cut); in segment 2, there are three species; and in segment 3, there are two species. The mean species richness over the time interval [−T, 0] is (T1 / T )´ 4 + (T2 / T )´ 3 + (T3 / T )´ 2 . In any moment of segment 1, the species relative abundances (i.e. node abundances correspond to the four branches) are {p1, p2, p3, p4}; in segment 2, the species relative abundances are {g1, g2, g3} = {p1, p2 + p3, p4}; in segment 3, the species relative abundances are {h1, h2} = {p1 + p2 + p3, p4}. (b) A hypothetical non-ultrametric tree. Let T be the weighted (by species abundance) mean of the distances from root node to each of the terminal branch tips.
T = 4 ´ 0.5 + (3.5 + 2)´ 0.2 + (1+ 2)´ 0.3 = 4 . Note T is also the weighted (by branch
length) total node abundance because T = 0.5´ 4 + 0.2 ´ 3.5 + 0.3´1+ 0.5´ 2 = 4 .
Conceptually, the 'branch diversity' is defined for an assemblage of four branches: each has, respectively, relative abundance 0.5 / T = 0.125 , 0.2 / T = 0.05 , 0.3 / T = 0.075
And 0.5 / T = 0.125 ; and each has, respectively, weight (i.e. branch length) 4, 3.5, 1 and 2. This is equivalent to an assemblage with 10.5 equally weighted 'branches': there are four branches with relative abundance 0.5 / T = 0.125 ; 3.5 branches with relative abundance 0.2 / T = 0.05 ; one branch with relative abundance 0.3 / T = 0.075 and two branches with relative abundance 0.5 / T = 0.125 (This figure is reproduced from Fig. 1 of Chao et al. 2010)
interval from T years ago to the present”. Here “equally distinct” also implies that the phylogenetic distance between any two species is T, so lineages are completely distinct (i.e., there are no shared branches).
The phylogenetic Hill numbers are invariant to the units used to measure branch lengths. When all lineages are completely distinct, the measure q D (T ) reduces to the Hill numbers q D = æaq ö . This includes the special case that T tends to
ç å i ÷
è i ø
zero, i.e., the case that we ignore phylogeny and only consider the present-day com-
munity. This shows that the framework based on Hill numbers provides a unified approach to integrate abundances and phylogeny. Also, here we have a simple ideal-
ized reference tree to understand the value of q D (T ) = z for an arbitrary tree: the mean phylogenetic diversity of the tree over the time period [−T, 0] is the same as the diversity of an idealized assemblage consisting of z equally abundant and equally distinct lineages all with branch length T.
For q = 0, when T is chosen as the age of the root node, we have 0 D (T ) = Faith s PD / T , which can be interpreted as lineage richness. Faith's PD can thus be regarded as a phylogenetic generalization of species richness. We can roughly interpret 1D (T ) as the effective number of common lineages, and 2 D (T ) as the effective number of dominant lineages in the time period [−T, 0]. When T is chosen as the age of the root node, a simple relationship exists between phyloge-
1
netic entropy HP (Allen et al. 2009) and the measure
For q = 2, when T is chosen as the age of the root node, there is a simple relationship between our measures and the widely used Rao's quadratic entropy Q (Chao et al. 2010):
The branch or phylogenetic diversity qPD(T) of order q during the time interval from T years ago to the present is defined as the product of q D (T ) and T. It quantifies the amount of evolutionary history on the system over the interval [−T, 0], or “the effective total branch-length” (Chao et al. 2010):
If q = 0, and T is age of the root node, then 0PD(T) reduces to Faith's PD, regardless of branching pattern or abundances. As explained by Chao et al. (2010), we could imagine that all the branch segments in the interval [−T, 0] form a single assemblage with relative abundance set {ai/T; i∈BT}. In this assemblage, for each i there are Li “branches” with relative abundance ai/T. Then the Hill number of order q for this assemblage is exactly the branch diversity qPD(T) given in Eq. (5a). Dividing this Hill number by T, we obtain q D (T ) given in Eq. (4a). Note in our framework that qPD(T) is truly a class of Hill numbers (“the effective number of lineage-years”), whereas q D (T ) (“the effective number of lineages”) denotes a (generalized) mean of Hill numbers. See Faith and Richards (2012) and Faith (2013) for extensions of the measure qPD(T).
Unlike previous phylogenetic diversity measures developed in the literature,
q D (T ) and qPD(T) depend explicitly on two parameters, the abundance sensitivity parameter q and the time perspective (or time-depth) parameter T. The reasons we need this time-depth parameter and our suggestion to choose a perspective time are given as follows.
1. When we compare the phylogenetic diversities of several assemblages based on the measures q D (T ) and qPD(T), all measures should refer to the same time periods to make meaningful comparisons. That is, the time-depth T should be kept as the same for all assemblages. Therefore, a parameter is required to specify the time-depth.
2. The choice of time perspective should reflect an investigator's aims and facilitate comparisons with other studies. We suggest that at least two selected time perspectives should be included: T = 0, and T = the age of the root node of a phylogenetic tree connecting all species in the study. For the case of T = 0, the phylogeny is ignored and the diversity profile reduces to the profile in the present-day assemblage based on the ordinary Hill numbers. If we choose T to be the age of the oldest node in the tree, we recover some of the standard measures of phylogenetic diversity (see Eqs. (4c) and (4d)).
3. As suggested in Chiu et al. (2014), other time perspectives can be selected, such as T = the age of the node at which the group of interest diverges from the rest of the species. This choice of T is independent of the species actually sampled, so it allows statistically robust comparisons across investigations and regions (unlike the conventional choice of T as the root node of the tree containing the species actually observed). This choice also provides an accurate measure of the proportion of a taxonomic group's evolutionary history preserved in a given assemblage. Another choice is the time of the most recent common ancestor of all taxa alive today. Other choices may be made, depending on the purpose of an investigation. The formula in Chiu et al. (2014, p. 42) can be used to convert phylogenetic diversity from one temporal perspective to another.
To see how the measures vary with q and time perspective T, we recommend using two types of profiles to completely characterize phylogenetic tree information and species abundances as described below. See section “An example” for examples. (1) The first type of diversity profile is obtained by plotting qPD(T) or q D (T ) as a function of order q as q varies from 0 to about 3 or 4 (beyond which there is usually little change), for some selected values of temporal perspective T. For this type of profile, qPD(T) and q D (T ) have similar patterns as T is fixed, so it is sufficient to plot the profile only for one measure. (2) The second type of diversity profile is obtained by plotting qPD(T) and q D (T ) as functions of T separately for q = 0, 1, and 2. This profile shows the effect of time-depth or evolution change on our diversity measures.
For the second type of profile, qPD(T) and q D (T ) generally exhibit different patterns (the profile of q D (T ) is decreasing with T whereas the profile of qPD(T) for q = 0 (Faith's PD) is always increasing, and for q > 0 is generally increasing up to a certain point, so the profiles for both measures are informative. The parameter q gives the sensitivity of the two measures to present-day species relative abundances. As in the ordinary Hill numbers, the measures with q = 2 favor more abundant species, so they are useful in ecological studies to examine the phylogenetic relationships of the dominant species in a set of assemblages, or those examining functional diversity. The measures of q = 0 emphasizes rare species, so they are useful when abundance information is not necessarily relevant (e.g., when ecologists try to identify past episodes of differentiation, or for some conservation biology applications). The measures with q = 1 weigh species according to their frequencies and can be used in most applications when neither dominant nor rare species should be favored.
When the measure of evolutionary change is typically based on the number of nucleotide base changes at a selected locus, or the amount of functional or morphological differentiation from a common ancestor, the branches of the resulting tree will then be uneven, so the tree is non-ultrametric. In this case, Chao et al. (2010) showed that the time parameter T in all formulas should be replaced by the mean base change or mean branch length T , the mean of the distances from the tree base to each of the terminal branch tips (i.e., the mean evolutionary change per species over the interval of interest). See the right panel of Fig. 1 for an illustrative example. Let BT denote the set of branches connecting all focal species, with mean branch length T . Then we can express T as T = åLi ai . The diversity of a non-ultrametric tree with mean evolutionary change T is the same as that of an ultrametric tree with time parameter T . Therefore, the diversity formulas for a non-ultrametric tree are obtained by replacing T by T in Eqs. (4a), (4b), (5a), and (5b). The resulting measures are denoted respectively as q D (T ) , 1D (T ) , q PD (T ) and 1PD (T ) ; see Chao et al. (2010) for details. When we compare the phylogenetic diversity based on the measures q D (T ) and q PD (T ) for several non-ultrametric trees, all measures should refer to the same mean base change T to make meaningful comparisons.