Section I: Background and Motivation
Introduction and Background
Optimization is a critical step in a wide range of engineering disciplines including structural design optimization, topology optimization, space trajectory planning, product designs, proxy models optimization, and the inverse problem optimization found in several applications. Engineering Optimization - sometimes referred to as Design Optimization - utilizes optimization methods and algorithms to achieve desired design goals in engineering. The engineer or designer uses mathematical optimization to support decision making or improve a given design. In particular, real-valued function minimization is one of the most common optimization problems, where a parameter space is searched for the minimum value of the objective function, and the corresponding parameters values.
In searching for an optimal design, a first step is to mathematically formulate the engineering problem. An optimization problem usually has the following mathematical format:
The function / : R" —> R is a scalar objective function that is constructed based on the goal of optimization. The vector x = [jci , - ■ • ,x„] r contains the n design variables which the designer can vary to achieve the optimization goal. The functions gj : R" —> R, / = 1, ■ • ■ , m. are the inequality constraint functions, which are set up based on the problem constraints and limits. The scalars bhi= l, -- ,m, are the limits which are also derived from the problem constraints. The functions hj : E" —> E, j = 1, • • •, /, are the equality constraint functions. The set of all the feasible points of T (feasible solutions) is called the design space.
Once the optimization problem is formulated in the form of (1.1), the engineer then needs to decide on how to solve this specific mathematical optimization problem. Solving this problem means finding the optimal value ?* for the variable vector.? that will make the function fix*), minimum among all the feasible vectors x. The solution method or algorithm depends on several factors. There are few classes of optimization problems, and for each class there are specific methods that can solve optimization problems in this class. Optimization problems may be classified based on the existence of constraints, the nature of the design variables, and the nature of the equations involved. For example, if all the functions /, and g,,V7 are linear then this optimization problem is called
a linear programming problem. Linear programming problems are natural in operations research. There are methods that can solve this type of problem very effectively such as simplex methods. If any of the objective or the constraint functions is nonlinear, then this problem is called nonlinear programming. There are a few special cases that can be solved very effectively such as least-squares problems. Otherwise, in general, there is no effective manner for solving the general nonlinear programming problem without compromise.
Engineering optimization then involves the formulation of the engineering problem into a mathematical optimization problem, and then solving this problem. There could be multiple mathematical formulations for the same engineering problem, and different formulations may have different levels of difficulty in terms of solving the problem. The problem formulation is then a crucial step and this book presents case studies on how to formulate an engineering optimization problem.
Mathematical Background
Definitions
X
*2 r -T
A vector x £ E'1 has n scalar components, ? = = [jq xj . . .a-„J , where the
Xn
superscript T is the transpose.
The norm of a vector (||.||) is a real-valued function ||?|| : E'1 —> E. There are various norms:
/i — norm : ||?|| i = Yl"= 11*|
/2 — norm : ||?||2 = Q^"_i |*j2)^ (distance or length of T)
lp-norm : QXi
/oo - norm : ||3c||oo = Max,{ |x,|}
The norm of a vector satisfies the following properties:
- 1. p|| > o, vt,
- 2. ||jc|| = 0 if and only if x = 6,
- 3- P+T|l
?
Differentiability and Taylor’s Theorem
The real-valued function f(x): R —> R is said to be differentiable at point a if the limit
exists. If the limit exists, it is called “derivative” of / at a, and is denoted f'(ci). The function / is a “differentiable" function if it is differentiable at each point in its domain.
For functions of multiple variables, the function f(x) : R" —> R is said to be differentiable at point a if there exists a linear transformation L : R" —> R that satisfies the condition
The vector associated with the linear transformation L is the vector of partial derivatives; that is L(h) = V/(<7) ■ h.
Let f(x) be an a function of a single variable, and let /(x) be infinitely differentiable in some open interval around x = xo- The Taylor series of /(.x) around *o is
Hence, a quadratic approximation for f(x) would be
For functions of multiple variables, assume a function f(x) : R" —> E is infinitely differentiable in some open interval around x = x0. The Taylor series expansion can be written as
where H is the matrix of second derivatives, called the Hessian matrix. For the case when n = 2, the Hessian matrix is
where fy is the order partial derivative with respect to the variable x(i) and x(j). The quadratic approximation for the function f(x) is
Orthogonal Vectors
For any two vectors x and у in the n-dimensional Euclidian space R", the scalar (inner) product is defined as:
where each of the x, and у,- is the i,h component of the vectors x and y, respectively, and x1 is the transpose of the vector x.
The projection of a vector x on a vector у is defined as:
Two vectors are said to be orthogonal if they are perpendicular to each other; that is is the scalar product of the two vectors is zero. The vectors {J1 ,x2, ■ ■ ■ ,x”'} are said to be mutually orthogonal if every pair of vectors is orthogonal. A set of vectors is said to be orthonormal if every vector is of unit magnitude and the set of vectors are mutually orthogonal. A set of vectors that are orthogonal are also linearly independent.
Consider for example the two vectors xT = [8 x Ю-10.1.2 x 106] and y7 = [2 x 10-9,1.5 x 105] (these vectors could be measurements of some process.) If we try to measure the size of each vector using the Euclidian norm, then we get || x ||г—| x(2) | and || у Ц2—| y(2) |; this is because the first component is very small compared to the second component. This means we loose information if we use the Euclidean norm. If we select a diagonally weighted norm || x ||q= fWQpf, where [Q] is a diagonal matrix with positive elements, chosen to normalize each entry, then we will get a better representation. For instance if
TO18 0 1 1 Г 8 1 i 2
we let [0] = Q 10_12 then [Q]',-x= J2 and[g]?y= j 5 , in which the
two components are comparable. Note that:

The general form for the scalar product is known as the Hermitian form, which is defined as:
where уt is the conjugate transpose of у and [Q] is any Hermitian positive definite matrix. A Hermitian matrix is a complex square matrix that is equal to its own conjugate transpose. In the case of real numbers, this scalar product can be thought of as the scalar product of the directionally scaled vectors, using positive scaling. So, if x and у are two vectors in R" and [Q] is symmetric positive definite n x n matrix, then (T,y)G = xT[Q]y is a scalar product. If (jc,y)Q = 0 then the vectors x and у are said to be Q-orthogonal.
Gram-Schmidt Orthogonalization
Orthogonalization is the process of finding a set of orthogonal vectors that span a particular subspace. This section presents a systematic process that can be used to construct a set of n orthogonal vectors given a set of n linearly independent vectors in R". The process can be illustrated by a simple example. Consider two linearly independent vectors xx = [01] and x2 = [l l] . These two vectors are not orthogonal. It is possible to construct two orthogonal vectors, however, given I1 and x2 as follows. Let the two orthogonal vectors be у1 and y2. We start the process by selecting у1 = T1. To compute y2, we follow this process:
The two vectors у1 and у2 are orthogonal. This process can be generalized for n vectors as follows. Given a set of linearly independent vectors xx ,x2, ■ ■ ■ ,x" in R'1. Then we can construct a set of orthogonal n vectors using the process:

Figure 1.1 illustrates the geometrical meaning of this process in three- dimensional space.

Figure 1.1: Gram-Schmidt Orthogonalization in 3-dimenional space.
Q-Orthogonal (Q-Conjugate) Directions
Similar to the development presented in Section 1.1.3.1, it is possible to construct a set of Q-orthogonal (Q-conjugate) n directions given a set of linearly independent n vectors 3c1 ,x2, ■■■ ,x" in K" using the following process:

Convergence Rates
In most of the numerical optimization methods that are presented in this book, the search for the optimal solution is carried out iteratively. That is, at each iteration k, the algorithm finds a current guess (iterate) xk for the optimal solution. The optimal solution is denoted as x*. Different algorithms converge to the optimal solution at different rates. Let (**) be a sequence of iterates converging to a local optimal solution x*. The sequence is said to converge:
Linearly: if there exists c, where 0 < c < 1, and ктах > 0 such that V& > kmax:

Super linearly: if there exists a null sequence c* (a sequence converging to zero) of positive numbers, and kmax > 0, such that /k > kmax:

Quadratically: if there exists c > 0 and kmax > 0, such that Vk > kmax:
