Artificial Neural Network
An ANN model is a computational algorithm that has interconnections between the various neurons in different layers of the ANN system. This study uses a 3-layer neural network system. Figure 11.1 illustrates the basic 3-layer ANN topology. The first layer consists of input neurons that send data through synapses to the second layer of hidden neurons, from which a single output is generated through a non-linear activation function in the third layer of output neurons.
The ANN model has the fundamental processing unit, a neuron k, described by:
Fig. 11.1 The topological structure of the 3-layer Artificial Neural Network used for prediction of the Standardized Precipitation and Evaporation Index (SPEI). The first layer consists of 4 inputs, “X” represent precipitation (X0), maximum temperature (X1), minimum temperature (X2) and potential evapotranspiration (X3). The second layer consists of 50 hidden neurons (n1:n50). The third layer is the output layer, the predicted SPEI (t+1)
where x1, x2,, xm are the input signals; wk 1, wk2, ? ??, wkm are the synaptic weights of neuron k; uk is the linear combiner output due to input signals; bk is the bias; Ф(.) is the activation function; and yk is the output signal of the neuron.
The bias bk has the effect of transforming the output uk of the linear combiner by shifting the activation functions to the right or to the left, hence it is very useful in successful learning of the model, shown by:
Thus, the combination of Eqs. (12) and (13) may be formulated as:
The three transfer functions: tangent sigmoid, /(x) logarithmic sigmoid, W(x) and linear, v(x) are described as follows (Vogl et al. 1988):
where (17)—(19) may be trailed with different combinations to determine the best predictive model (§ahin et al. 2013).
In order to develop a computationally efficient ANN network, a second-order training algorithms, known as the Levenberg-Marquardt (LM) or the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton backpropagation learning algorithms can be employed (Dennis and Schnabel 1983; Marquardt 1963). These training algorithms help minimize the mean squared error between the predicted and the observed variable (Tiwari and Adamowski 2013). The LM method uses an approximation to the Hessian matrix, given as:
where J is the Jacobian matrix that is calculated using standard backpropagation techniques and contains first derivatives of network errors with respect to the weights and biases (Hagan and Menhaj 1994). The computation of Jacobian matrix is simpler than the Hessian matrix (Marquardt 1963). The term e is a vector of errors.
An alternative to the conjugate gradient methods for fast optimization is the BFGS quasi-Newton. It uses the following equation:
where A^1 is the Hessian matrix (second derivatives) of the performance index at the current values of the weights and biases (Dennis and Schnabel 1983).