Sparse matrix test collections
Comparisons between sparse matrix strategies and computer programs are difficult because of the enormous dependence on implementation details and because the various ordering methods (introduced in Section 5.3 and discussed in detail in Chapters 7-9 are heuristic. This means that comparisons between them will be problem dependent. These concerns led to the development of a set of test matrices by Duff and Reid (1979) extended by Duff, Grimes, and Lewis (1989). A nice interface for accessing both the problems and associated statistics has been designed at NIST (MatrixMarket 2000) and the original Harwell-Boeing format was extended to the Rutherford-Boeing format by Duff, Grimes, and Lewis (1997). The GRID-TLSE project in Toulouse offers a web-friendly interface to direct sparse solvers and sparse matrix test problems (Amestoy, Duff, Giraud, L’Excellent, and Puglisi 2004).
Currently, the most extensive collection of sparse matrices is the University of Florida Sparse Matrix Collection (Davis and Hu 2011). Not only are the matrices available in several formats, but there is a substantial discussion of each with several pictures, not just of the matrix. Additionally, there is software to enable the extraction of matrices with various characteristics including the application domain.
A major objective of the test collections has been to represent important features of practical problems. Sparse matrix characteristics (such as matrix size, number of entries, and matrix pattern including closeness to symmetry) can differ among matrices arising from, for example, structural analysis, circuit design, or linear programming. The test problems from these different application areas vary widely in their characteristics and often have very distinctive patterns. Pictures of some of these patterns are included in Appendix B. While their patterns are often not regular, they are certainly not random.
Throughout this book, we will use matrices to illustrate our discussion, to demonstrate the performance of our algorithms, and to show the effect of varying parameters. The default is that these will be matrices from the Florida Collection. Those not from this collection will be flagged by a footnote and the accompanying text will indicate their origin. Where we show the numbers of entries in a table, we count explicit zeros because the analysis performed may be required later for another matrix having the same pattern where some of these entries are now nonzero. We count the entries in both the upper and lower triangular parts of a symmetric matrix, although some algorithms require only one of the triangles. These test matrices will be symmetric, unsymmetric, or even rectangular determined by the algorithm or code being considered. As some approaches exploit patterns that are nearly symmetric, we will sometimes display the symmetry index, as defined by Erisman, Grimes, Lewis, Poole Jr., and Simon (1987), which is the proportion of off-diagonal entries for which there is a corresponding entry in the transpose, so that a symmetric matrix has a symmetry index of 1.0.
Another group of matrices that we use arises from solving partial differential equations on a square q x q grid, pictured for q = 5 in Figure 1.7.1. The simplest finite-element problem has square bilinear elements. Here, there is a one-to-one correspondence between nodes in the grid and variables in the system of equations. The picture of the grid is also a picture of the graph. The resulting symmetric matrix A has order n = (q + 1)2, and each row has entries in nine columns corresponding to a node and its eight nearest neighbours. Such a pattern can also arise from a 9-point finite-difference discretization on the same grid. Another test case arises from the 5-point finite-difference discretization, in which case each row has off-diagonal entries in columns corresponding to nodes connected directly to the corresponding node by a grid line. Further regular problems arise from the discretization of three-dimensional problems using 7-point, 11-point, or 27-point approximations. These matrices are important because they typify matrices that occur when solving partial differential equations and because very large test problems can be generated easily.
Fig. 1.7.1. A 5x5 discretization.
Another way of generating large sets of matrices is to use random number generators to create both the pattern and the nonzero values. Early testing of sparse matrix algorithms was done in this way. While small random test cases are useful for testing sparse matrix codes, we do not recommend their use for performance testing because of their lack of correlation to real problems.
When using an existing sparse matrix code for particular applications, the user will be confronted with a variety of choices as will be discussed in the rest of the book. Since many of the algorithms discussed here are based on heuristics, we recommend that users experiment with these choices for their own applications. The test problems provide an invaluable source for this purpose. Other readers may be more interested in extending research: which algorithms work best depending on size of matrix, application area, computer architecture, and many other factors? The test problems are useful for this purpose as well.
The Florida Collection will also be used in several of the Research Exercises that we have included in many of the chapters.