# Two-mode networks from data tables

Thus far, we have treated bibliographic networks only in terms of citations from one document to other documents. However, there is far more information potentially available about these documents. This can include the author(s) of publications, the content of the publication (usually expressed in terms of keywords provided by their authors or extracted from titles and abstracts), the organizational affiliations of the authors, the years that productions appeared, and the journals in which articles are published.^{[1]} Coupling these attributes to scientific productions is to mobilize two-mode networks (Batagelj and Cerinsek 2013).

The sources for the additional attribute data for scientific documents are secondary sources. Two-mode networks were introduced in Section 2.2.3. They can be used effectively to couple the additional information to documents. Most often, these primary data are available in the form of tables. Every such table can be transformed in a collection of compatible two-mode networks from which a variety of *derived networks* can be obtained.

Formally, a *data table* Tis a set of *records T= {Tk* : *k* e *fC],* where *fC* is the set of *keys.* A record has the form *Tk = (k, qx (k), q2(k),..., qr(k))* where *qt(k)* is the value of the *property* (attribute) q, for the key *k.*

Suppose the property, q, has, as values, subsets of the set *Q.* In the case of ordinary variables, the subset contains a single element. For example, describing bibliographies usually the properties include Authors, Keywords, and publication year (PubYear). For example, Wasserman and Faust (1994) is a book for which the set of keywords includes:

Authors(SNA) = { S. Wasserman, K. Faust},

PubYear(SNA) = { 1994},

Keywords(SNA) = { network, centrality, matrix,...},...

**Table 3.2 An example of bibliography descriptions.**

The list of publications can be expanded to incorporate a broader literature. Table 3.2 contains seven publications as an initial part of such an expansion. Here, Work is a key, with Authors and Pub Year as properties.^{[2]}

If *Q* is finite, we can assign to the property q a two-mode network *fC* X q = *(fC, Q, A, w)* where *(k, v)* e *A* iff *v* e *q(k),* and *w(k, v) =* 1. Note that the set *Q* can always be transformed into a finite set by partitioning it and recoding the values.

## Multiplication of two-mode networks

The different two-mode networks are obtained from bibliographic data, for example, include works by authors, works by journals, authors by institutional locations, and institutional locations by countries. Combining these discrete two-mode arrays is straightforward with matrix multiplication. 'Multiplying networks' is done by multiplying their matrices. Expressed more precisely, the product of two *compatible* networks is the network corresponding to the product of matrices corresponding to the given networks. Formally, for a simple (having no parallel arcs) two-mode *network J1/ =* ((I, *J), A, w);* where I and *J* are sets of *vertices, A* is a set of *arcs* linking I and *J,* and *w : A* -"• *R* is a *weight;* a *network matrix* W = is assigned with elements: *witj = w(i,j)* for e *A* and *witj =* 0 otherwise.

Two two-mode networks are compatible (for multiplication) iff the second set of vertices in the first network is equal to the first set of vertices in the second network. Given a pair of compatible two-mode networks, *MA =* ((I, *fC), AA, wA)* and *MB = ((fC, J), AB, wB)* with corresponding matrices Alxjc and B^j, the *product of networks MA* and *MB* is a network *Afc =* ((I, *J), Ac, wc),* where *Ac = : i* e *I,j* e *J, cUj* ^ 0} and *wc(i,j) = cUj* for e *Ac.* The product matrix C = [c(j]ixj = A * B is defined in the standard way:

The general scheme for multiplying two-mode networks is shown in Figure 3.7 where **A **represents the matrix for *MA* and **B **is the matrix for *MB.* (When I = *fC = J,* the multiplication is of ordinary one-mode networks with square matrices.)

**Figure 3.7 A visualization of two-mode network multiplication.**

In the expression for *ctj*, the only terms *aik • bkj* contributing to its value occur when both *aik* and *bkJ* are non-zero. The set *NA(i)* is made up of the *successors* of vertex / in network *JfA,* and is the set of *predecessors* of vertex *j* in network *JTB.* For *NA(i)* U *N~(J)* ^ 0,

Therefore, if all weights in networks *JfA* and *JfB* equal 1, then the product *aik* • Z>fc • = 1 and the value of *ctj* counts the number of ways we can go from / e *1* to *j* G *J* passing through *JC.* As shown in Figure 3.7, the element *ctj* has the value of 2.

The standard matrix multiplication has complexity *0(I* • *JC* • *J).* As noted in Section 2.3, it is too slow to be practical for large networks. Most of large networks are *sparse:* their matrices contain many more zero elements than non-zero elements. For sparse large networks, a much faster multiplication is possible by considering only the non-zero elements. In pseudo-code:

The multiplication of large sparse networks can be a 'dangerous' operation since the result *need not* be a sparse network. When this occurs, the resulting matrix 'explodes'. And when a sequence of compatible networks is multiplied, having one of them not be sparse raises a serious computational problem that must be addressed. Doing this takes the form of establishing conditions under which this explosion does not occur.

For a two-mode network, let *XT* and *V* be the two sets of vertices for the two modes. Further, let *nx = | XT* | and *n2 = | V.* Let £ denote the set of edges. The corresponding complete two-mode network, *Kn ^* is defined by: Vm e *V* and Vu e V, (k, u) e £. In words, there are arcs between all pairs of vertices in the two modes.

From the network multiplication algorithm, each intermediate vertex *k* e *fC* adds to a product network a complete two-mode subgraph *KN-^N ^* (or, in the case I = *J,* a complete subgraph *KN^).* If both degrees *degA(k) =* |AT~(&)| and *degB(k) = NB(k)* are large then already the computation of this complete subgraph has a quadratic (time and space) complexity. This multiplication ceases to be a practical option.

However, if at least one of the sparse networks *MA* and *MB* has small maximal degree on *fC,* the resulting product network *Mc* is sparse. Indeed, there is a stronger result: if for the sparse networks *MA* and *MB,* there are, in *fC,* only some vertices with large degree and none of them have large degree in both networks then also the resulting product network *Mc* is sparse. A proof follows:

Let

Then

Define also *Amin =* max^ *dmin(k)* and

let J* = argmind(|£(rf)| < *d)* and *1Z* = JC(d*).* Then *JC* < d** and the number of non-zero elements in the product:

Therefore: if, for the sparse networks *MA* and *MB,* the quantities *Amin* and *d** are small then also the resulting product network, *Mc,* is sparse. This result is equivalent to the above claim. If these conditions are satisfied, the matrix multiplication yields a matrix that does not explode. These conditions are satisfied for all of the citation networks in Table 3.1 and all of the bibliographic networks studied in Chapters 4-6.

A nice application of network multiplication is the computation of other kinship relations (is sister of, is uncle of, is cousin of, etc.) from the basic genealogical relations (is child of, is married to, is female) (Batagelj and Mrvar 2008).