The application of RNNs has proved itself in various fields. The spectrum of RNN applications is so wide, and it touched various aspects. Various architectures and learning algorithms have been developed to be applied in solving problems in various fields. The spectrum of application is ranging from natural language processing, financial forecasting, plant modeling, robot control, and dynamic system identification and control. In this section, we review two case studies. One of these cases will be on grammatical inference and the other will be on control.
Natural Language Processing
In the last years, there has been a lot of efforts and progress in developing RNN architectures for natural language processing. Harigopal and Chen  proposed a network to recognize strings which are much longer that the ones which the network was trained on. They used a second-order recurrent network model for the problem of grammatical inference. Zheng et al.  proposed a discrete recurrent network for learning deterministic context-free grammar. Elman  addressed three challenges of natural language processing. One challenge is the nature of the linguistic representations. Second, is the representation of the complex structural relationships. The other challenge is the ability of a fixed resource system to accommodate the open-ended nature of a language.
Grammatical inference is the problem of extracting the grammar from the strings of a language. There exists a FSA that generates and recognizes that grammar. In order to give the reader a clear idea about the application of RNNs in grammatical inference, we will review a method proposed by Chen et al.  to design a RNN for grammatical inference. Chen et al.  proposed an adaptive RNN to learn a regular grammar and extract the underlying grammatical rules. They called their model as adaptive discrete recurrent neural network finite state automata (ADNNFSA). The model is based on two recurrent network models, which are the neural network finite state automata (NNFSA) proposed by Giles et al.  and the discrete neural network finite state automata (DNNFSA) proposed by Zeng et al. .
Figure 23 shows the network architecture for both NNFSA and DNNFSA which was also used in the ADNNFSA model. The network consists of two layers. The first layer consists of N units (context units) that receive feedback from the next layer (state layer) and M input units that receive input signals. The outputs of this layer is connected to the inputs of the second layer via second-order weight connections. The second layer consists of N PEs. The state PEs are denoted by the vector s(t - 1) and the input units are denoted by the vector x(t - 1). In the second layer, s(t) is the current-state output vector and h(t) is the current-state activation vector. The activation of the second layer can be computed as follows:
Fig. 23 The general architecture for NNFSA, DNNFSA, and ADNNFSA models
In the implementation of the NNFSA model, f (?) is a sigmoid function and g(-) is an identity function, while in the implementation of DNNFSA, g(^) is a discrete hard-limiter as follows:
The NNFSA model applies the true-gradient descent real-time recurrent learning (RTRL) algorithm, which had a good performance. In the NNFSA model, Giles et al.  used the analog nature of the network, which does not match with the discrete behavior of a FSA. Therefore, Zeng et al. , in DNNFSA, discretized the analog NNFSA by using the function in Equation 51. Therefore, all the states are discretized, and the RTRL algorithm is no longer applicable. The pseudo-gradient algorithm was used, and it hinders training because it is an approximation of the true gradient. Therefore, Chen et al.  used analog internal states at the beginning of the training, and as the training progresses, the model changes gradually to the discrete mode of the internal states. Thus, the current-state activation output hj (t) is computed same as in Equation 49, and current-state output Sj (t) is computed as follows:
To decide whether the mode of a state PE has to be switched to the discrete mode, a quantization threshold parameter в is used. If the output of the sate PE j, sj (t) < в or Sj (t) > 1.0 -в for all the training strings, the mode of this state PE is switched to the discrete phase. This recurrent network model adapts the training from the initial analog phase, which has a good training performance, to the discrete phase, which fits properly with the nature of the FSA, through the progress of the training for automatic rule extraction.