# Phylogenetic Split Networks

Rooted phylogenetic trees as shown in Fig. 1 are well understood. Here, both trees show that the common ancestor of the taxa considered has the ancestors of the two genera as direct descendants. In general, interior nodes indicate ancestral taxa of the leaf nodes, and the edge lengths give an estimate of the amount of change observed between nodes. However, if one wishes to combine the information in both trees, it becomes difficult to identify clear ancestors. For example, *TCYB* and *TDCoH*3 disagree whether *G. sonneratii* or *G. varius* is the basal Gallus species. In order to visualize these conflicts phylogenetic split networks have been devised.

We start by describing splits. A *split*, denoted by *A*|*B*, is defined as a bipartition of the taxon set *X* into two disjoint subsets *A* and *B*, indicating that there is an observable amount of divergence between the two subsets. Every edge in a tree generates a split. If one removes an edge, the tree decomposes into two subtrees, each of which connects a unique set of leaves. *TCYB* has 17 splits (edges), while *TDCoH*3 has 15 splits (2 splits in *TDCoH*3 have zero length and are collapsed as they do not influence subsequent computations). Figure 2a shows the union set *Σ* of 20 distinct splits occurring in the pheasant trees (Fig. 1). *TCYB* and *TDCoH*3 share the ten *trivial splits σ*1, *σ*2, …, *σ*10 corresponding to external edges of the trees. The trees also share two non-trivial splits *σ*13 and *σ*16, where *σ*16 corresponds to the internal edges separating Gallus from Polyplectron species. The remaining splits are unique to each tree.

**Fig. 2** (**a**) **Set of all splits extracted from the trees in Fig. 1. Each split σ is a bipartition A|B, where '*' and '.' represent taxa in A and B, respectively. Conflicting splits are colored. (b) Visualization of this split set as a phylogenetic split network. Conflicting splits are colored accordingly and depicted by parallelograms. Here, split weights are assigned as the mean of the weight of the corresponding edges in the two trees. Highlighted in boldface are the four species maximizing split diversity**

This split set is visualized in a phylogenetic split network (Fig. 2b). The major difference to trees is that the interior nodes of a split network cannot be regarded as representing ancestral taxa. Instead, the weight of a split *A*|*B* indicates the amount of difference between the taxon set *A* and *B*. A split is visualized by a single edge or a set of parallel edges. The former indicates that the split does not conflict any other splits, while the latter indicates at least one conflict. Therefore, two conflicting splits are visualized by a parallelogram. For example, *σ*14 (in cyan, Fig. 2) and *σ*15 (in pink) contradict each other on the placement of *P. emphanum* and *P. malacense*. This disagreement generates a narrow parallelogram at the basal Polyplectron.

If more than two splits are in disagreement, the split network will show multiple connected parallelograms. For example, *σ*17 (in red, Fig. 2) conflicts with *σ*19 (in green) and *σ*20 (in yellow). *σ*19 also contradicts *σ*18 (in blue). Therefore, *σ*17, *σ*18, *σ*19 and *σ*20 are visualized by three red, two blue, three green, and two yellow parallel edges, respectively. This generates three parallelograms within Gallus (Fig. 2b).

Not every split set can be visualized in two dimensions. For example, assuming that we had a third tree that places *G. gallus* at the basal Gallus lineage. This would introduce one split contradicting with both *σ*17 and *σ*19. These triple-wise conflicting splits are depicted by a three dimensional parallelepiped. The resulting split network is not easily visualized anymore. However, for the following it suffices to directly work on the split set (Fig. 2a).