We now consider the use of the factors generated by the algorithm of Section 10.3 or 10.4 for the solution of a set of linear equations by forward substitution and back-substitution. This is the simplest and fastest phase of the computation, although it is important to do it efficiently since there are several applications where many solutions (tens or hundreds) are performed for each FACTORIZE.
Fig. 10.5.1. Code for solving a system with lower triangular matrix stored by columns.
Examples occur when using extrapolation methods to solve stiff systems of ordinary differential equations (Duff and Nowak 1987) or in the solution of problems in optimization, such as the Schur-complement updating method (Gill, Murray, Saunders, and Wright 1990) where Gould and Toint (2002) report that there are often over a thousand solves required for each factorization. We also consider taking advantage of sparsity in the right-hand side vector.
If the storage pattern for the factors is different from that used by the algorithms of this chapter, the SOLVE phase will also be different. Examples are the frontal and multifrontal schemes of Sections 11.9-11.12. Appropriate variations of SOLVE are discussed in Chapter 14.
If every entry of the right-hand side is nonzero, the amount of arithmetic needed to solve a triangular system is clearly proportional to the number of entries in the triangular matrix, each being used in a single computation. So long as we can get ready access to the entries by rows or by columns, the SOLVE phase is easy to implement. We show in Figure 10.5.1 code for the solution of a lower triangular system where the factors are stored as in Figure 10.3.2. The computation involves subtracting a sequence of packed vectors from the working vector initialized to the right-hand side. The code for this is quite simple (see Figure 10.5.1). Observe that this is simpler than adding two packed vectors (see Section 2.4) since x is in dense form. This formulation can take advantage of sparsity in the right-hand side and the whole inner loop may be avoided for zero multipliers. The only defect with this scheme is that a component of the right-hand side is updated for every multiplication in the inner loop.
If access to L is by rows, the inner loop consists of subtracting a sparse inner product from each component of x in turn. The code for this is also quite simple (see Exercise 10.7). It has the merit of writing to each component of x just once, so better use of the cache will be made. The disadvantage is that it is not possible to take advantage of any zeros in the right-hand side since any test would have to be performed within the inner loop and, on many computing systems, would be as expensive as the actual numerical calculation.
In the back-substitution phase, the solution will be full unless the matrix is reducible, see Section 7.9. Hence there is often no sparsity in the solution vector to exploit in this phase, but there are occasions when only a partial solution is wanted. In this case, a substantial saving can be made if U is held by rows and the required components are all near the bottom of the solution vector. Therefore, to take advantage of both zeros in the right-hand side and partial solutions, it is advantageous to hold L by columns and U by rows. In the symmetric case, if we store L by columns, we implicitly store U = LT by rows.
Note that, although the storage needed for the factors of a symmetric matrix is about half that needed if the symmetry is ignored, the work needed to solve the system by forward and back-substitution is about the same since the factor is used twice.
So far in this chapter, we have not considered the details of handling permutations because this would have obscured the main points of our discussion. However, in most cases permutations will be required and we examine their role here since their implementation can have a significant effect on the efficiency of the solution. We consider the solution of the equation
followed by the solution of the equation
and the calculation
where P and Q are permutation matrices, held as ordering vectors (see Section 2.17). The solution proceeds as follows:
- (i) Permute b to Pb and store it in the full-length vector w.
- (ii) Solve Ly = w and Uz = y.
- (iii) Permute z to Qz and store the result in x.
Although it appears that five arrays (for b, w, y, z, and x) are required, it is normal to hold w, y, and z in the same array. Therefore, three arrays are needed if b is to be kept and only two are needed if the solution vector x overwrites b.
If we hold the permutations as sequences of interchanges (see Exercise 10.8), it is possible to use a single array for the solution process, since the sequence of interchanges can be used to reorder b and z in place.
The scheme of the last two paragraphs requires that permuted indices are stored for L and U. An alternative is to store the original indices. In this case, the working vector is not permuted initially but is permuted between the forward substitution and the back-substitution.
Whatever scheme is used internally by SOLVE, it is of course important that the user of the software is not involved in the level of detail that we have discussed in this section. A user should have to supply only the vector b and should receive the solution x, rather than a permutation of it.
-  This form is very convenient for factorization, since the columns are updated one by one,but cache considerations may make it more efficient to hold L and U separately in the solutionphase.