Hyper-sparsity and linear programming
In the previous section, we assumed that the right-hand side or the solution was full or nearly so. We exploited the occasional zero by the test on zero in the code of Figure 10.5.1. However, if the right-hand side is very sparse and the matrix is reducible, there may be more tests for zero performed than actual arithmetic operations. This often occurs for linear programming problems and has been considered by Hall and McKinnon (2005) who call this hyper-sparsity and report speedups by a factor of about 5 from exploiting it.
One of the mechanisms that they use is the Gilbert-Peierls algorithm, which we met in Section 10.4. There, we were applying a sequence of elementary Gaussian elimination operations to a column of the matrix that usually begins and ends as sparse. Exactly the same process is applicable to solving
when L is held by columns and both L and b are sparse. The total computational effort needed is proportional to the number of nonzero arithmetic operations. For the back-substitution
it is important for hyper-sparsity to hold U by columns, for then the Gilbert- Peierls algorithm can again be applied.
If it is desired to solve several systems for the transposed matrix
with hyper-sparsity, which is the case for linear programming where A is the basis matrix, it is worthwhile to hold copies of L and U by rows so that the factorization
is held by columns and the Gilbert-Peierls algorithm is applicable.
Hall and McKinnon (2005) pay detailed attention to hyper-sparsity for every part of the linear programming calculation. It is outside the scope of this book to describe it all and we refer the reader to their paper. However, we will mention one simple device that may be applicable elsewhere. There is a need to calculate the product pTN between a sparse vector p (found by a hyper-sparse solution) with the sparse matrix N that is the non-basic part of the constraint matrix. Usually, N is held by columns. In this case, each pTN:j for a single column may be computed with effort proportional to the number of entries in N:j. Many of the products pjNj may be zero, although Nj =0 and many of the resulting vectors pTN:j may themselves be zero. The solution is to hold a copy of N by rows and form pTN as ^j pjNi:. Now only nonzero products are calculated.
Table 10.7.1 Speeds (GFlops) for vector operations on a Dual-socket 3.10 GHz Xeon E5-2687W, using the Intel compiler with O2 optimization. x is addressed directly or indirectly; y is addressed directly.
Size |
x = x + ay |
a = xT y |
||||
Direct |
Indirect |
Direct |
Indirect |
|||
BLAS |
In line |
In line |
BLAS |
In line |
In line |
|
5 |
0.5 |
1.6 |
1.2 |
0.6 |
1.2 |
1.1 |
10 |
0.8 |
1.7 |
1.2 |
1.0 |
1.8 |
1.6 |
20 |
1.4 |
1.8 |
1.4 |
1.6 |
2.0 |
1.6 |
50 |
1.6 |
1.7 |
1.4 |
1.9 |
2.1 |
1.7 |
100 |
1.5 |
1.7 |
1.4 |
2.0 |
2.0 |
1.7 |
200 |
1.5 |
1.5 |
1.4 |
2.1 |
1.9 |
1.5 |
500 |
1.6 |
1.8 |
1.4 |
2.2 |
2.3 |
1.7 |
1000 |
1.7 |
1.8 |
1.4 |
2.2 |
2.3 |
1.8 |