Unconstrained Optimization: Numerical Methods

We’ve been investigating analytical techniques to solve unconstrained multivariable optimization problems

In many problems (most real problems!), it is quite difficult to find the stationary or critical points, and then use them to determine the nature of the stationary point. In this section, we’ll discuss numerical techniques to maximize or minimize a multivariable function.

Gradient Search Methods

We want to solve the unconstrained nonlinear programming problem (NLP) given in Equation 4.1 above. Calculus tells us that if a function / is concave, then the optimal solution if there is one -will occur at a stationary point x*; i.e., at a point x* where

Finding the stationary points in many problems is quite difficult. The methods of Steepest Ascent (maximization) and Steepest Descent (minimization) offer an alternative by finding an approximation to a stationary point. We’ll focus on the gradient method, the Steepest Ascent.

Examine the function shown in Figure 4.9.

A Surface Defined by a Function

FIGURE 4.9: A Surface Defined by a Function

We want to find the maximum point on the surface. If we started at the bottom of the hill, we might proceed by finding the gradient vector, since the gradient is the vector that points “up the hill” in the direction of maximum increase. The gradient vector is defined as

(The symbol V is called “del” or “nabla”.[1])

If we were lucky, the gradient would point all the way to the maximum of the function, but the contours of functions do not always cooperate- actually, they rarely do. The gradient “points uphill,” but for how far? We need to find the distance along the line given by the gradient to travel that maximizes the height of the function in that direction. From that new point, we compute a new gradient vector to find a new direction that “points uphill” in the direction of maximum increase. Continue this method until reaching to the “top of the hill,” the maximum of f.

To summarize: from a given starting point, we move in the direction of the gradient as long as /’s value continues to increase. At that point, recalculate the gradient and move in the new direction as far as f continues to improve. This process continues until we achieve a maximum value within some specific tolerance (margin of acceptable error). The algorithm for the Method of Steepest Ascent using the gradient is:

Method of Steepest Ascent Algorithm

Find the unconstrained maximum of the function f : R" —f R. INPUTS: Function: /

Starting point: Xo Tolerance: £

OUTPUTS: Maximal point x“ and maximum value /(x*)

Step 1. Initialize: Set x = Xo

Step 2. Calculate the gradient g = V/(x)

Step 3. Compute the maximum t* of the l-variable function

= /(x + t • g)

Step 4. Find the new point xnew = x + t*-g = x + t*- V/(x)

Step 5. If ||x — xne№|| < £ OR || V/(x) || < £, then STOP and return estimates x* = xnew and f(xnew). Otherwise, set x = xnew and return to Step 2.

Remember that ||y|| = ||(t/i, y2,..., J/„)|| = s/yj + y2 H----+ Уl-

It’s time for several examples using the method of steepest ascent .

Example 4.13. Steepest Ascent Example, I.

Maximize /(x) = 2xX2 + 2*2 — x22x to within £ = 0.01.

We start with Xo = (0,0).

Iteration 1.

The gradient of f(xi,X2), V/, is found using the partial derivatives; /’s gradient is the vector V/(a?i, атг) = {2ж2—2жъ 2х+2—4a?2). Then V/(0,0) = (0,2). From (0,0), we move along (up) the a.'2-axis in the direction of (0,2); that is, along the line L(t) = Xo + V/(xo) ■ t = (0,0) + f(0,2) = (0,2f). How far do

we go?

We need to maximize the function 4>{t) = /(0, 21) = 4f — 8f2 starting at t = 0 to find how far along the line Lit) to step. This function can be maximized by using any of the one-dimensional search techniques for singlevariable optimization from Chapter 3 or by simple calculus.

Then L(t*) gives the new point X| = L(0.25) = (0,0.5).

The magnitude of the difference (xo — Xi) is 0.5 which is not less than our desired tolerance of 0.01 (chosen arbitrarily at the beginning of the solution). Since we are not optimal, we continue. Repeat the calculations from the new point (0,0.5).

Iteration 2.

The gradient vector g = V/((0,0.5)) = (1,0). Now L(t) = (0,0.5) + (l,0)t. Again, how far do we travel along L to maximize = /((t, 0.5)) = —t2 + t + 0.5 ? Simple calculus gives

Then, the new x is X2 = L(0.5) = (0.5,0.5).

The magnitude of the difference (Xj — X2) is 0.5 which is still not less than our desired tolerance of 0.01. The magnitude of V/(x 1) = 1 which is also not within our tolerance 0.01. We continue to iterate.

Maple will continue the iterations for us using the function Steepest As cent which is in the book’s PSMv2 package. Load the package via with(PSMv2). Define the function using vectors.

The syntax of our SteepestAscent function is

Adding a third argument tells SteepestAscent to produce a graph of the path Xo, X|, ..., xn. The output of SteepestAscent is a DataFrame containing the generated points, gradients, function value, and step-size.

To see the list of points generated, use SAf[’Pts’, to see just the last point and its function value, use SAf[ Pts’][-!], SAf[ ’Fen’][-1].

The multivariable calculus solution is straightforward to compute by solving the system {df/dxi = 0.df/dx? = 0}, and checking f’s Hessian at the critical point.

The Hessian

is negative definite, so the point x* = (1,1) is a maximum with /(x*) = 1.

To get a closer approximation with the steepest ascent method, we would make our tolerance smaller. A look at f’s contour plot confirms a hill at approximately (1,1) in Figure 4.10.

Contour Plot of f

FIGURE 4.10: Contour Plot of f

Example 4.14. Steepest Ascent Example, II.

Maximize g(x) = 55*i — 4x + 135*2 15*2 100 using the Steepest Ascent method starting at the point (1,1) to within a tolerance of £ = 0.01.

Figure 4.11 provides a visual reference for the maximum.

Surface Plot of

FIGURE 4.11: Surface Plot of

The gradient is У,<у(х) = (55 — 8x, 135 — ЗО.г'2).

Iteration 1.

We begin with Xo = (1,1). Then V,9(xo) = (47.105). From (1.1), we move in the direction of (47,105). How far do we go along the line L(t) = (1,1) + (37,105)t? We need to maximize the function

This function can also be maximized by simple single-variable calculus:

The new point Xi is found by evaluating L(t*) = Xo + i* • V

Since ||xi — Xq|| > 0.01, iterate again.

Iteration 2.

Now compute V,g(xi) = (32.719,-14.645). We move from (2.785,4.988) in the direction of (32.719,-14.645). How far do we go along the line L(t) = (2.785,4.988) + (32.719, —14.645)t? We need to maximize the function

This function can also be maximized by simple single-variable calculus:

The new point X2 is found by evaluating L(t*) = Xj +1* ■ Vry(xi).

Since ||X2 — Xi|| > 0.01, we must iterate again.

Use our SteepestAscent function to have Maple finish the iterations. Define g as a function of the vector x.

Unconstrained Optimization: Numerical Methods

The final point Хц and the g's maximum value are:

This time, the solution process zig-zagged and converged more slowly to an optimal solution as illustrated in Figure 4.12.

Zig-Zagging to a Solution

FIGURE 4.12: Zig-Zagging to a Solution

We’ll now look at maximizing a transcendental multivariable function where the critical points cannot be found analytically in a closed form, but must be approximated numerically.

Example 4.15. Steepest Ascent Example, III.

Maximize h(x) = 2xX2 + 2x2 ~ e-Xl ~ cXl + Ю using the Steepest Ascent method starting at the point (0.5,0.5) to within a tolerance of £ = 0.01.

Verify that h has a local maximum near (1,1.25} by looking at a graph. The gradient of h is

Iteration 1.

We begin with x0 = (0.5,0.5). Then V/i(x0) = (—0.649,1.351). From (0.5,0.5), we move in the direction of (—0.6487,1.3513). How far do we go along the line L{t) = (0.5,0.5) + (—0.6487, 1.3513)t? We need to maximize the

Unconstrained Optimization: Numerical Methods function

Maple’s fsolve gives t* = 0.2875. Therefore

Since ||xi — Xo || > 0.01, we continue. But let Maple do the work.

Modified Newton’s Method

The zig-zag pattern we saw in Figure 4.12 shows that steepest ascent doesn’t always go directly to an optimum value. The Newton-Raphson[2] iterative root finding technique using the partial derivatives of the function provides an alternative numerical search method for an optimum value when modified appropriately. Given the right conditions, this numerical method is more efficient and converges faster to an approximate optimal solution: quadratic convergence versus the linear convergence of steepest ascent.

Newton’s Method for multivariable optimization searches is based on the single-variable root-finding algorithm. Modify the procedure to look for roots of the first derivative rather than roots of the original function:

and iterate until xn+i — xn| < £ for our chosen tolerance £ > 0.

Extend the Modified Newton’s Method to several variables by using gradients and the Hessian, the matrix of second partial derivatives.

now iterating until ||xn+i — х,г|| < £ for our chosen tolerance £ > 0. Applying a little bit of linear algebra gives us the method as an algorithm.

Modified Newton’s Method Algorithm

Find the unconstrained maximum of the function / : IR" —¥ R. INPUTS: Function: /

Starting point: Xo

Tolerance and Maximum Iterations: £, N OUTPUTS: Maximal point x* and maximum value /(x*)

Step 1. Initialize: Set x = Xo

Step 2. Calculate the gradient g = V/(x) and II = Hessian(x)

Step 3. Compute d = H and create new matrices

Hi = substitute g for column I of II H‘2 = substitute g for column 2 of II

Step 4. Compute: Да?! = II/d and Дж2 = |//-21/с/•

Step 5. Find the new point xn(.„. = (a.’i + Ai’i, + Джг)

Step 6. If || (Дат, Да-’г)II < G then STOP

and return estimates x* = xn(.„. and f{xnew). Otherwise, set x = xnew and return to Step 2.

Again, remember that ||y|| = ||{j/i,y2)ll = s/y't+y'i-

Let’s repeat our examples for the steepest ascent method using Newton’s method. We’ll use the function ModifiedNewtonMethod which is in the book’s

PSMv2 package. Load the package via with(PSMv2). The syntax of the Mod- ifiedNewtonMethod function is

ModifiedNewtonMethod{f, (*o, J/o)>N)

where / is the function, (*o,2/o) is the starting vector, £ is the tolerance, and N is the maximum number of iterations allowed. The output of ModifiedNew- tonMethod is a DataFrame containing the generated points, function values, Hessians, and the definiteness of the Hessian.

Example 4.16. Modified Newton’s Method Example, I.

Maximize /(x) = 2*1*2 + 2*2 — x — 2x to within £ = 0.01 starting at x0 = (0,0).

Limit the number of iterations to N = 20.

We have a maximum of f of 1 at (1,1) since the Hessian is negative definite there.

On to the second example.

Example 4.17. Modified Newton’s Method Example, II.

Maximize g(x) = 55* i — dx + 135*2 15*2 100 starting at the point (1,1) to within a tolerance of £ = 0.01 limiting iterations to N = 20.

We have a maximum of g of 392.9 at (6.875,4.500) since the Hessian is negative definite there.

Now, for the third example.

Example 4.18. Modified Newton’s Method Example, III.

Maximize /i(x) = 2xX'2 + 2x2 ~ eXl ~ eX2 + Ю starting at the point (0.8,0.8) to within a tolerance of e = 0.01 with no more than N = 20 iterations.

We have a maximum of h of 8.827 at (1.031,1.402) since the Hessian is negative definite there.

Even though Newton’s method is faster and more direct, we must be cautious. The method requires a relatively good starting point. Try searching for a maximum for h starting at (0.5,0.5) and looking at just the last entry in the DataFrame report.

We started at {0.5,0.5), not far from our original (0.8,0.8). But the Hessian being indefinite tells us that we’ve found an approximate saddle point, not a maximum. Modified Newton’s method is very sensitive to the initial value chosen.

Comparisons of Methods

We compared these two routines finding that Modified Newton’s Method converges faster than the gradient method. Table 4.4 shows the comparison.

TABLE 4.4: Comparison of the Steepest Ascent and Modified Newton’s Methods

f

Initial Value

Iterations

Solution

max /(x)

Steepest Ascent

(0,0)

16

(0.9922,0.9961)

1.0

Modified Newton

(0,0)

2

(1.0,1.0)

1.0

9

Initial Value

Iterations

Solution

max

Steepest Ascent

(0,0)

4

(-0.266,0.385)

8.3291

Modified Newton

(0,0)

2

(-0.269,0.381)

8.3291

h

Initial Value

Iterations

Solution

max /i(x)

Steepest Ascent

(0.5,0.5)

6

(1.025,1.397)

8.828

Modified Newton

(0.8,0.8)

7

(1.031,1.402)

8.827

As a final comparison, add the modified Newton’s method points to the contour map showing the steepest ascent points for g of Figure 4.12 to obtain Figure 4.13 below.

Steepest Ascent and Modified Newton’s Method Solutions

FIGURE 4.13: Steepest Ascent and Modified Newton’s Method Solutions

When given a good starting value, a Modified Newton’s method search is much faster and more direct than a steepest ascent gradient method.

Exercises

1. Maximize /(x) = 2xX22x — 2x to within a tolerance of 0.1.

a. Start at the point x = (1,1). Perform 2 complete iterations of the steepest ascent gradient search. For each iteration clearly show xn, xn+i, V/(xn), and t*. Justify that the process will eventually find the approximate maximum.

b. Use Newton’s method to find the maximum. Starting at x = (1,1). Clearly show x„, xn+i, V/(xn), and the Hessian for each iteration. Indicate when the stopping criterion is achieved.

2. Maximize /(x) = ZxX2 — 4a?i — Ax to within a tolerance of 0.1.

a. Start at the point x = (1,1). Perform 2 complete iterations of the steepest ascent gradient search. For each iteration clearly show xn, xn+i, V/(xn), and t*. Justify that the process will eventually find the approximate maximum.

b. Use Newton’s method to find the maximum. Starting at x = (1,1). Clearly show x„, xn+i, V/(xn), and the Hessian for each iteration. Indicate when the stopping criterion is achieved.

3. Apply the modified Newton’s method to find the following:

a. Maximize f(x, y) =x3 + 3x + 84?/ — 6y2 starting with the initial value (1,1). Why can’t we start at (0,0)?

b. Minimize f(x, y) = —Ax + 4a:2 — 3у + у2 starting at {0,0).

c. Perform 3 modified Newton’s method iterations to minimize f(x, y) = (a; — 2)4 + (a: — 2y)2 starting at (0,0). Why is this problem not converging as quickly as b?

4. Use a gradient search to approximate the minimum to /(x) = (aq — 2)2 + a?i + x2. Start at (2.5,1.5).

Projects

Project 4.1. Modify the Steepest Ascent Algorithm (pg 147) to approximate the minimum of a function. This technique is called the Steepest Descent Algorithm.

Project 4.2. Write a program in Maple that uses the one-dimensional Golden Section search algorithm instead of calculus to perform iterations of a gradient search. Use your code to find the maximum of

Project 4.3. Write a program in Maple that uses the one-dimensional Fibonacci search algorithm instead of calculus to perform iterations of a gradient search. Use your code to find the maximum of

References and Further Reading

[B2016] William C. Bauldry, Introduction to Computation, 2016.

[BSS2013] Mokhtar S. Bazaraa, Hanif D. Sherali, and Chitharanjan M. Shetty, Nonlinear Programming: Theory and Algorithms, Wiley, 2013.

[F1992] William P. Fox, “Teaching nonlinear programming with Minitab,” COED Journal, Vol. II (1992), no. 1, 80-84.

[F1993] William P. Fox, “Using microcomputers in undergraduate nonlinear optimization,” Collegiate Microcomputer, Vol. XI (1993), no. 3, 214-218.

[FGMW1987] William P. Fox, Frank Giordano, S. Maddox, and Maurice D. Weir, Mathematical Modeling with Minitab, Brooks/Cole, 1987.

[FR2000] William P. Fox and William Richardson, “Mathematical modeling with least squares using Maple,” Maple Application Center, Nonlinear Mathematics, 2000.

[FGW1997] William P. Fox, Frank Giordano, and Maurice Weir, A First Course in Mathematical Modeling, 2nd Ed., Brooks/Cole, 1997.

[FW2001] William P. Fox and Margie Witherspoon, “Single variable optimization when calculus fails: Golden section search methods in nonlinear optimization using Maple,” COED Journal, Vol. XI (2001), no. 2, 50-56.

[M2013] Mark M. Meerschaert, Mathematical Modeling, Academic Press, 2013.

[PRS1976] D.T. Phillips, A. Ravindran, and .1. Solberg, Operations Research, Wiley, 1976.

[PFTV2007] William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling, et ah, Numerical Recipes, Cambridge U. Press, 2007.

[R1979] S.S. Rao, Optimization: Theory and Applications, Wiley Eastern Ltd., 1979.

  • [1] Wikipedia https://en.wikipedia.org/wiki/Nabla_symbol has a humorous history of V.
  • [2] Newton invented the method in 1671, but it wasn’t published until 1736; Raphsonindependently discovered the method and published it in 1690.
 
Source
< Prev   CONTENTS   Source   Next >