Phylogenetic Split Networks

Rooted phylogenetic trees as shown in Fig. 1 are well understood. Here, both trees show that the common ancestor of the taxa considered has the ancestors of the two genera as direct descendants. In general, interior nodes indicate ancestral taxa of the leaf nodes, and the edge lengths give an estimate of the amount of change observed between nodes. However, if one wishes to combine the information in both trees, it becomes difficult to identify clear ancestors. For example, TCYB and TDCoH3 disagree whether G. sonneratii or G. varius is the basal Gallus species. In order to visualize these conflicts phylogenetic split networks have been devised.

We start by describing splits. A split, denoted by A|B, is defined as a bipartition of the taxon set X into two disjoint subsets A and B, indicating that there is an observable amount of divergence between the two subsets. Every edge in a tree generates a split. If one removes an edge, the tree decomposes into two subtrees, each of which connects a unique set of leaves. TCYB has 17 splits (edges), while TDCoH3 has 15 splits (2 splits in TDCoH3 have zero length and are collapsed as they do not influence subsequent computations). Figure 2a shows the union set Σ of 20 distinct splits occurring in the pheasant trees (Fig. 1). TCYB and TDCoH3 share the ten trivial splits σ1, σ2, …, σ10 corresponding to external edges of the trees. The trees also share two non-trivial splits σ13 and σ16, where σ16 corresponds to the internal edges separating Gallus from Polyplectron species. The remaining splits are unique to each tree.

Fig. 2 (a) Set of all splits extracted from the trees in Fig. 1. Each split σ is a bipartition A|B, where '*' and '.' represent taxa in A and B, respectively. Conflicting splits are colored. (b) Visualization of this split set as a phylogenetic split network. Conflicting splits are colored accordingly and depicted by parallelograms. Here, split weights are assigned as the mean of the weight of the corresponding edges in the two trees. Highlighted in boldface are the four species maximizing split diversity

This split set is visualized in a phylogenetic split network (Fig. 2b). The major difference to trees is that the interior nodes of a split network cannot be regarded as representing ancestral taxa. Instead, the weight of a split A|B indicates the amount of difference between the taxon set A and B. A split is visualized by a single edge or a set of parallel edges. The former indicates that the split does not conflict any other splits, while the latter indicates at least one conflict. Therefore, two conflicting splits are visualized by a parallelogram. For example, σ14 (in cyan, Fig. 2) and σ15 (in pink) contradict each other on the placement of P. emphanum and P. malacense. This disagreement generates a narrow parallelogram at the basal Polyplectron.

If more than two splits are in disagreement, the split network will show multiple connected parallelograms. For example, σ17 (in red, Fig. 2) conflicts with σ19 (in green) and σ20 (in yellow). σ19 also contradicts σ18 (in blue). Therefore, σ17, σ18, σ19 and σ20 are visualized by three red, two blue, three green, and two yellow parallel edges, respectively. This generates three parallelograms within Gallus (Fig. 2b).

Not every split set can be visualized in two dimensions. For example, assuming that we had a third tree that places G. gallus at the basal Gallus lineage. This would introduce one split contradicting with both σ17 and σ19. These triple-wise conflicting splits are depicted by a three dimensional parallelepiped. The resulting split network is not easily visualized anymore. However, for the following it suffices to directly work on the split set (Fig. 2a).

 
< Prev   CONTENTS   Next >