# An agglomerative method for relational constraints

An agglomerative clustering method proceeds by joining (fusing) clusters successively. Finer-grained clusters (and partitions) are contained within coarser-grained clusters (and partitions) to form a hierarchy of partitions. Both hierarchical and local optimization methods can be used for solving some types of problems with a relational constraint (Ferligoj and Batagelj 1983).

The pseudo-code of an *agglomerative algorithm* for doing this can be expressed in the following form:

For a selected strategy, in step 2.5, it is necessary to specify how to *adjust the relation after joining Cp* and *Cq* to form *C = CpV Cq,* where *Cs* is any other cluster from *C(k).* The rules for the strict, leader, and tolerant strategies are presented in Figure 9.8.

Figure 9.8 **Relation adjustment rules for three clustering strategies.**

For *p* g *Cp, q* g *Cq,* and *s* g *Cs,* there are four configurations to consider when there is only an unreciprocated arc, *p ->• q,* when *p* and *q* are both placed in *C.* These configurations are shown on the left in Figure 9.8. The arc *(p, q)* is always present with the four configurations distinguished by having only one of arcs *(p, s), (s, p), (p, s),* or *(s, q).* The adjusted ties between *s* and *(p, q)* for the three methods are shown on the right in three columns with each set of adjustments in a column.

The adjustment rules are much simpler for the two-way strategy with *p <-" q:* the cluster *Cs* is linked with *C* only in the two cases presented in Figure 9.9.

Figure 9.9 **Relation adjustment rules for the two-way strategy.**

The adjustment rules shown in Figures 9.8 and 9.9 work in the matched pairs: *)Ф1* and the tolerant method; 2) Ф2 and the leader method; 3) Ф4 and the two-way method; and 4) Ф5 with the strict method.

The condition *y/(Cf, Cf)* is equivalent to *CtRCj* for the tolerant, leader, and strict methods; and to *CjRCj A CjRCj* for two-way method.

Also, in step 2.4, it is necessary to update the *dissimilarities between clusters.* The foundations for doing this are described in Section 9.3.1. For the minimum, maximum, and average measures, in ordinary clustering cases, it is straightforward to derive the update formulas allowing us to determine the dissimilarity *D(C, Cs),* where *C = Cp* U *Cq,* from the dissimilarities *D(Cp,Cs)* and *D(Cq,Cs).* Note that Г1пГ2 = 0.

# Hierarchies

The agglomerative clustering procedure produces a series of feasible clusterings C(n), C(n -1),..., C(m) with C(m) є МахФ (the maximal elements for E).

Their union, T= Ujt=m C(^), is called a *hierarchy* and has the property

The set inclusion, c, is a *tree* or *hierarchical* order on 77 The hierarchy, 77 is *complete* iff *V* e 77

For *W* c *V* we define the *smallest cluster Cj(W)* from Tcontaining by two conditions:

CI. *W* c Cr(T*0 and C2. VCeT: (FCC^> Cr(T*0 c C) C^is a *closure* on Twith a special property

We can assign the levels to the clusters of a hierarchy. A mapping *h* : T-"- is a fevd *function* on 7" iff LI. VXeT/ : ft({X}) = 0and L2. C,CC^ ft(C?) < *h(Cq).*

A simple example of level function is *h(C) =* card(C) - 1.

Every hierarchy/level function determines an ultrametric dissimilarity on *V:*

The converse is also true (see Dieudonne (I960)): Let *d* be an ultrametric on *V.* Denote *B(X, r) =* {Y є *V : d(X,* Y) < *r}.* Then for any given set AcR+ the set

is a complete hierarchy, and *h(C) =* diam(C) is a level function.

The pair *(T, h)* is called a *dendrogram* or a *clustering tree* because it can be visualized as a tree.

**Theorem 9.1. (Monotonicity) ***If a dissimilarity, D, has the reducibility property, then hD is a level function.*

The minimum, maximum and average dissimilarities have the reducible property.

# Fast agglomerative clustering algorithms

When the reducibility property is satisfied, the nearest neighbors network for a given network is preserved after joining the nearest clusters. This permits the development of very fast agglomerative hierarchical clustering procedures (Batagelj et al. 2008).

## Nearest neighbors graphs

For a given dissimilarity, *d,* on the set of units, *V,* and relational constraint, *R,* the *k nearest neighbors graph GNN = (V, A)* is defined as

(X, Y) e *A* Y is selected among the *k* nearest neighbors of X, and X.RY.

For (X, Y) e *A* its value can be set to *w((X,* Y)) = *d(X,* Y) to obtain a nearest neighbor network *AfNN* = *W,* **A ***w).*

When there are equidistant pairs of units, a choice is required about selecting from them. There are two options. One is to include all equidistant pairs of units in a nearest neighbor graph. The other option is to specify an additional selection rule for including only some of them. We denote the graph where all equidistant pairs are included by When the choice is to include only a single nearest neighbor, we denote this graph by gjyjy**.**

## The structure and some properties of nearest neighbor graphs

Let *AfNN* = C^"" A *w)* be a nearest neighbor network. A pair of units X, Y e *V* are *reciprocal nearest neighbors,* or RNNs, iff (X, Y) e *A* and (Y, X) e *A.*

Suppose cardCIO > 1 and *R* has no isolated units. Then in A" (Murtagh 1985):

• every unit/vertex X e *V* has outdeg(X) > 1 (there are no isolated units), and

• along every walk the values of *w* are not increasing.

Using these two observations, some additional properties in are:

• all the values of *w* on a closed walk are the same and all its arcs are reciprocal (all arcs between units in a non-trivial strong component are reciprocal);

• every maximal (cannot be extended) elementary (no arc is repeated) walk ends in an RNNs pair;

• there exists at least one RNNs pair - corresponding to minx>Ye?/,x^Y

*d(X,* Y).

## The algorithm

Any network *AfNN* is a subnetwork of A"^. Its connected components are directed (acyclic) trees with a single RNNs pair in the root. It can be shown that if the clustering method has the reducibility property then the NN-chain remains a NN-chain also after the agglomeration of an RNNs pair.

Therefore, based on the nearest neighbor network, very efficient algorithms for agglomerative clustering for methods with the reducibility property can be built. In pseudo-code:

Its implementation for the methods minimum, maximum, and average is available in Pajek using the command

Network/Create Hierarchy/Clustering with Relational Constraint/ When Clustering with Relational Constraint has been reached there are three options:

Run

Make Partition

Extract Subtree as a Hierarchy