Two-mode networks from data tables

Thus far, we have treated bibliographic networks only in terms of citations from one document to other documents. However, there is far more information potentially available about these documents. This can include the author(s) of publications, the content of the publication (usually expressed in terms of keywords provided by their authors or extracted from titles and abstracts), the organizational affiliations of the authors, the years that productions appeared, and the journals in which articles are published.[1] Coupling these attributes to scientific productions is to mobilize two-mode networks (Batagelj and Cerinsek 2013).

The sources for the additional attribute data for scientific documents are secondary sources. Two-mode networks were introduced in Section 2.2.3. They can be used effectively to couple the additional information to documents. Most often, these primary data are available in the form of tables. Every such table can be transformed in a collection of compatible two-mode networks from which a variety of derived networks can be obtained.

Formally, a data table Tis a set of records T= {Tk : k e fC], where fC is the set of keys. A record has the form Tk = (k, qx (k), q2(k),..., qr(k)) where qt(k) is the value of the property (attribute) q, for the key k.

Suppose the property, q, has, as values, subsets of the set Q. In the case of ordinary variables, the subset contains a single element. For example, describing bibliographies usually the properties include Authors, Keywords, and publication year (PubYear). For example, Wasserman and Faust (1994) is a book for which the set of keywords includes:

Authors(SNA) = { S. Wasserman, K. Faust},

PubYear(SNA) = { 1994},

Keywords(SNA) = { network, centrality, matrix,...},...

Table 3.2 An example of bibliography descriptions.

An example of bibliography descriptions.

The list of publications can be expanded to incorporate a broader literature. Table 3.2 contains seven publications as an initial part of such an expansion. Here, Work is a key, with Authors and Pub Year as properties.[2]

If Q is finite, we can assign to the property q a two-mode network fC X q = (fC, Q, A, w) where (k, v) e A iff v e q(k), and w(k, v) = 1. Note that the set Q can always be transformed into a finite set by partitioning it and recoding the values.

Multiplication of two-mode networks

The different two-mode networks are obtained from bibliographic data, for example, include works by authors, works by journals, authors by institutional locations, and institutional locations by countries. Combining these discrete two-mode arrays is straightforward with matrix multiplication. 'Multiplying networks' is done by multiplying their matrices. Expressed more precisely, the product of two compatible networks is the network corresponding to the product of matrices corresponding to the given networks. Formally, for a simple (having no parallel arcs) two-mode network J1/ = ((I, J), A, w); where I and J are sets of vertices, A is a set of arcs linking I and J, and w : A -"• R is a weight; a network matrix W = is assigned with elements: witj = w(i,j) for e A and witj = 0 otherwise.

Two two-mode networks are compatible (for multiplication) iff the second set of vertices in the first network is equal to the first set of vertices in the second network. Given a pair of compatible two-mode networks, MA = ((I, fC), AA, wA) and MB = ((fC, J), AB, wB) with corresponding matrices Alxjc and B^j, the product of networks MA and MB is a network Afc = ((I, J), Ac, wc), where Ac = : i e I,j e J, cUj ^ 0} and wc(i,j) = cUj for e Ac. The product matrix C = [c(j]ixj = A * B is defined in the standard way:

The general scheme for multiplying two-mode networks is shown in Figure 3.7 where A represents the matrix for MA and B is the matrix for MB. (When I = fC = J, the multiplication is of ordinary one-mode networks with square matrices.)

A visualization of two-mode network multiplication.

Figure 3.7 A visualization of two-mode network multiplication.

In the expression for ctj, the only terms aik • bkj contributing to its value occur when both aik and bkJ are non-zero. The set NA(i) is made up of the successors of vertex / in network JfA, and is the set of predecessors of vertex j in network JTB. For NA(i) U N~(J) ^ 0,

Therefore, if all weights in networks JfA and JfB equal 1, then the product aik • Z>fc • = 1 and the value of ctj counts the number of ways we can go from / e 1 to j G J passing through JC. As shown in Figure 3.7, the element ctj has the value of 2.

The standard matrix multiplication has complexity 0(IJCJ). As noted in Section 2.3, it is too slow to be practical for large networks. Most of large networks are sparse: their matrices contain many more zero elements than non-zero elements. For sparse large networks, a much faster multiplication is possible by considering only the non-zero elements. In pseudo-code:

The multiplication of large sparse networks can be a 'dangerous' operation since the result need not be a sparse network. When this occurs, the resulting matrix 'explodes'. And when a sequence of compatible networks is multiplied, having one of them not be sparse raises a serious computational problem that must be addressed. Doing this takes the form of establishing conditions under which this explosion does not occur.

For a two-mode network, let XT and V be the two sets of vertices for the two modes. Further, let nx = | XT | and n2 = | V. Let £ denote the set of edges. The corresponding complete two-mode network, Kn ^ is defined by: Vm e V and Vu e V, (k, u) e £. In words, there are arcs between all pairs of vertices in the two modes.

From the network multiplication algorithm, each intermediate vertex k e fC adds to a product network a complete two-mode subgraph KN-^N ^ (or, in the case I = J, a complete subgraph KN^). If both degrees degA(k) = |AT~(&)| and degB(k) = NB(k) are large then already the computation of this complete subgraph has a quadratic (time and space) complexity. This multiplication ceases to be a practical option.

However, if at least one of the sparse networks MA and MB has small maximal degree on fC, the resulting product network Mc is sparse. Indeed, there is a stronger result: if for the sparse networks MA and MB, there are, in fC, only some vertices with large degree and none of them have large degree in both networks then also the resulting product network Mc is sparse. A proof follows:

Let

Then

Define also Amin = max^ dmin(k) and

let J* = argmind(|£(rf)| < d) and 1Z* = JC(d*). Then JC* < d* and the number of non-zero elements in the product:

Therefore: if, for the sparse networks MA and MB, the quantities Amin and d* are small then also the resulting product network, Mc, is sparse. This result is equivalent to the above claim. If these conditions are satisfied, the matrix multiplication yields a matrix that does not explode. These conditions are satisfied for all of the citation networks in Table 3.1 and all of the bibliographic networks studied in Chapters 4-6.

A nice application of network multiplication is the computation of other kinship relations (is sister of, is uncle of, is cousin of, etc.) from the basic genealogical relations (is child of, is married to, is female) (Batagelj and Mrvar 2008).

  • [1] Overwhelmingly, the publications in scientific citation networks are journal articles, with journals being a primary institution of science.
  • [2] Single-valued properties can be represented more compactly by a partition. This is more easily done for publication years.
 
< Prev   CONTENTS   Next >