# Conditions for hierarchical clustering methods

The set of all partitions is denoted by *U(V)* with the set of feasible clusterings denoted by Ф: Ф с *ЩУ).* Ф determines *the feasibility predicate* Ф(С) = С e Ф defined on *V(V(V) * {0}); and, conversely, Ф = {С e *V(V(V) * {0}) : Ф(С)}.

**Table 9.1 Some types of relational constraints.**

**Figure 9.7 Examples of different types of connectedness. In the set, the relation of** * clustering inclusion*,

**e, can be introduced by**

When there is a clustering inclusion, the clustering is a *refinement* of the clustering C2.

It is well known that (11(1/), e) is a partially ordered set.8 Because any subset of a partially ordered set is also partially ordered, from o c 11(1/), it follows that (o, e) is a partially ordered set.

The clustering inclusion determines two further relations on

1. *Ci* e C2 = Ct e C2 A *Cx ±* C2 (this implies e is a strict inclusion)

2. Cx e C2 = Cx 1= C2 A ->3C go : (Cj cCaCc C2) (this defines e as a *predecessor* relation).

Henceforth, we assume that the set of feasible clusterings, o c 11(1/), satisfies all of the following conditions:

Fl. 0={{X} : Xg1/}G<D. Intuitively, the partition where every element of *If* is a singleton in its own cluster is a feasible clustering.

F2. The feasibility predicate, *Ф, is local* by having the form of Ф(С) = ДСеС q?(C) where *cp(C)* is a predicate defined on *V(V)* {0} (clusters).

The intuitive meaning of *cp(C)* is: *cp(C) =* the cluster *С* is good.9 Therefore, the locality condition can be read as stating that a good clustering С e Ф consists of good clusters.

F3. The predicate, Ф, has the property of *binary heredity* with respect to the *fusibility* predicate *y/(Ci,* C2), that is,

This condition means that in a good clustering, a fusion of two fusible clusters produces a good clustering.

F4. The predicate *y/* is *compatible* with clustering inclusion E, that is,

F5. The *interpolation* property holds in Ф, that is, VC1( C2 e Ф :

These conditions provide a framework in which the hierarchical methods can be applied also for constrained clustering problems. The number of clusters in a partition is denoted by *k,* the value of which is fixed ahead of time or selected during the clustering analysis. Partitions with higher values of *k* have lower values of the criterion function but may be too detailed for useful summaries. There is always a compromise between the quality of a solution and having an efficient and more useful summary. The set of feasible partitions with *k* clusters is a subset of all partitions with *k* clusters: *<&k(V)* c *Yik(V).*

In the case of ordinary clustering problems both the predicates *cp(C)* and *y/(Cp, Cq)* are always true: all of the conditions F1-F5 are satisfied.

# Clustering with a relational constraint

We return to the motivating problem of clustering spatial units in terms of their properties, while taking into account the spatial adjacency relation, *R.* In general, the units are described by attribute data and each unit *X* is mapped by a mapping, *a,* to its data values - description *[X].* Formally: *a: V ->• [V].* Further, the units are related by a binary *relation RCVxV.* Together, these determine the *relational data (V, R, a)* as defined by *V, R,* and *a.*

Clustering the units according to the similarity of their descriptions while also considering the relation, *R,* imposes *constraints* on the set of feasible clusterings as defined in Section 9.3.1. This relational constraint can be stated formally in a general fashion:

The property 'good' is described formally in the next section by a predicate describing the specific constraint.

In the case of regional units, while they are clustered in terms of their attributes, this clustering is constrained by having the units also clustered into regions forming contiguous parts of the territory covered by all of the units in a cluster.

Table 9.1 listed five types of relational constraints. For each of them, the corresponding fusibility predicates can be stated. They are

For *y/3* the property F5 does not hold if *R* is non-symmetric.

It is necessary to consider also the details of the clustering method used for clustering attributes in relation to fusing clusters. The fusibility condition i^(C(-,Cp is equivalent to *CjRCj* for the tolerant, leader, and strict clustering methods. Also, *y/(Ct,* cp is equivalent to *CfRCj* A *CjRCf* for the two-way clustering method.

In the original approach of Ferligoj and Batagelj (1983), a complete dissimilarity matrix (with dissimilarities for all pairs of units) was required. This worked well for small problems, but for a system of the size considered here, a much faster algorithm is necessary. One way of obtaining one is to consider *only* the dissimilarities between *adjacent* units in *R.* Let *(V, R),* where *R* c *V* x *V,* be a graph and 0 c *S, T* c *V,* and *S* n *T =* 0 where *S* and *T* are clusters of elements in *R.* A *block* (c.f. the set of ties between units in a pair of positions in generalized blockmodeling Doreian et al. (2005) of the relation *R* for *S* and *T* is *R(S, T) = R* n *S* X *T.* We denote the *symmetric closure* of the relation *R* by *R.* It follows that *R = Ru R~l.* Further, *R(S, T) = R(T, S).* For all dissimilarities, *d,* between units, *s g S* and *t* e *T,* we define a dissimilarity between clusters, *D(S, T),* as

Here, we consider three options for extending *D* to clusters: the minumim, the maximum, and the average values which are defined as follows with the updating formula following the definition.

**Minimum**

**Maximum**

**Average**

Let *w* : *V* -"• *R* be a weight on units; for example *w(v) =* 1, for all *v* e *V,*

Let C?, C? and *Cs* be three clusters where *Cp,Cq,Cs* c **7/ **and let *t* be a threshold. The dissimilarity *D* has the property (Bruynooghe 1977) iff

The minimum, maximum, and average dissimilarities have the reducibility property.