Deep Networks and Deep Learning Algorithms


The most advanced organ in a person’s body is his brain. It decides the manner in which we see, hear, smell, taste, and feel. The brain allows us to store dreams, various experiences, emotions, and also our dreams. Without it we w'ould be mere common creatures, deprived of doing any different tasks than the simplest. In our w'hole body the part w'hich is responsible for making us think before doing any task and which provides us intelligence is the brain. It weighs only three pounds, approximately one and a half kilograms, w'hich is sufficient to solve the most complex

Dogs or chicken

FIGURE 12.1 Dogs or chicken.

problems that our super equipped computers can't solve or find difficult to solve. After the birth of a human child in only in few months the baby starts recognizing the faces of their parents, starts identifying objects, and even start producing different voices. In their early stages they start locating objects even after getting blocked and start relating different kinds of things by their sounds. After growing to their early school days they start learning about grammar and start getting familiar with many different kinds of words.

Here is proof that shows that we humans sometimes lose on classification tasks despite having been trained for millions of years! (Figures 12.1-12.5).

Deep Learning

Since early days humans have always kept pushing forward and inventing new machines that would have brains similar to humans, such as robots serving as helpers in our household activities and devices like microscopes which are able to identify disease on their own. To do this we have to design artificial intelligent programming machine, which requires us to work on some of the most complex computer-related types of problems with which we have never struggled with before. These are problems that our mind can solve easily in fractions of seconds. In order to handle these

Dogs or donuts

FIGURE 12.2 Dogs or donuts.

Dogs or mop

FIGURE 12.3 Dogs or mop.

Dogs or cookies

FIGURE 12.4 Dogs or cookies.

tasks, we need to develop a very different way to program a computer, with the help of techniques which have been developed in the very past decade. Techniques called deep learning generally refer to the network which is active, an area of artificial intelligence (AI) in machines.

Why Deep Learning

There is no doubt that machines are really fast; they can do calculations in seconds that a human brain w'ould take several minutes to do. Also, their performance and accuracy are unmatchable. But what if a machine or a computer is given a handwritten text to read and calculate? Take, for example. Figure 12.6.

The computer would not be able to recognize the alphabets or digits as efficiently as we can. So, this is an example of a need for deep learning. Actually, deep learning can be said to be a subset of the generalized field of AI that is called machine learning (ML), based on the principle of learning from example. ML works by a very different method: it doesn’t follow a set of long rules for solving problems, it is given a task according to which it solves different problems following instructions of algorithms for modifying models whenever it makes any kind of mistake. We are currently working on very well-suited models that could solve any problem very accurately wfithin a very short period of time (Buduma, 2017).

The Perceptron (Neuron)

The main part of our brain is referred to as a neuron. As we know, the size of a grain of w'heat is very small, similarly a very small part of brain has more than 10,000 particles of neurons in which approximately 5,000 links are established in between the neurons. It is this big biological connection that allow's us to explore the huge

Handwritten notes

FIGURE 12.6 Handwritten notes.

The neuron

FIGURE 12.7 The neuron.

world around us in a very broad manner. The precept for us is to use a similar structure to build out ML blocks that could help us solve any type of problem. Internally the neuron is made in such a way that it could get data from other neurons, process the data in a very unique format, and transfer the output to other micro parts such as cells. Figure 12.7 illustrates the process.

Inputs are received by the neurons through tentacle-like bodies known as dendrites. All of these links are randomly either strong or weak on the basis of how frequently they are used, and the ability of every link shows how much the inputs have contributed to the output of the neurons. As soon as the primary attributes of the simultaneous connections are recognized, the inputs are made to sum up together. This summed up thing is then converted to a different new signal, w'hich is then transmitted along the axon of the cell and then sent to all the other neurons (Buduma, 2017).

Neural Networks

The artificial network is a prototype modeled by the architecture of the human brain. In conventional prototypes of the human brain, a huge number of simple computers work together, through which our human brain can proceed with high-level computational algorithms. Super computational architectures are designed along this computational paradigm. Attempts to learn along with these networks w'ere introduced during the twentieth century. Defining a neural network includes nodes and links, where it is graphed with nodes (neurons) and edges (links between the nodes). Every node gets input as a summation of the weighted output nodes connected to the incoming links (Apostolou and King, 1999; Ben-David, 2014).

Network Topologies

Given the composite elements of the neural design, now follows a clear idea of the general designs (topologies) of neural type of networks, i.e., to build networks having these elements. All the topologies shown here in this text are demonstrated by a map and its Hinton diagram; light gray fields are used to show the dotted weights, dark ones for solid weight. In the Hinton diagram the dotted weights, which have been added for clarity reasons, could not be found there. To clarify the connections between the neuron lines and neuron columns, there is a small arrow in the upper- left cell inserted by me.

Feed forward neural network

FIGURE 12.8 Feed forward neural network.

Feedforward Neural Networks

While single neurons are much stronger than linear perceptron, they are not so expressive that they can solve the difficult problems of learning. There is a reason why our brain is made up of so many neurons. Take, for example; the ability to differentiate between handwritten digits, which is nearly impossible for a single neuron. In order to tackle these types of complex activities we have to take our ML model a step ahead. Our brain is made up of multiple layers of neurons. In our brain the model responsible for most of the intelligence activities is the cerebral cortex, which is made up of six layers. The information flows from each layer unless the sensory input is converted to logical understanding. The lowermost part of the brain, known as the visual cortex, gets raw visual data from the eyes. Each layer processes the information and keeps passing it to the next layer until it reaches the last one. Figure 12.8 shows more details of these layers (Buduma, 2017).

Layers and connections for every layer are present in a feedforward network. Here the first thing that we are going to look at (although different topologies are going to be used later) is feedforward network which are networks. Grouping neurons is done in layers which are: an input layer, n hidden processing network of layers (not visible through the out-layers side, which is why the neurons are also said to be hiding), and an output layer. Each neuron in one layer in a feedforward network has just the direct connections with the next layer of neurons (towards the output layer). Figure 12.9 represents the permitted connections (solid lines) for a feedforward network. Preventing a collisions of names is handled by referring to the output neuron as £2 (Kriesel, 2005).

The Artificial Neuron

The biological neuron is simulated in an artificial neural network (ANN) by an activation function. In classification tasks (e.g., identifying spam e-mails) this activation function must have a “switch on” characteristic - in other words, once the input is

Permitted connections for feed forward networks

FIGURE 12.9 Permitted connections for feed forward networks.

greater than a certain value, the output should change state i.e., from 0 to 1, from -1 to 1, or from 0 to >0. This simulates the “turning on” of a biological neuron. A common activation function that is used is the sigmoid function, which looks like this:

As can be seen in Figure 10.9, the function is said to be “activated,” i.e., it moves from 0 to 1, when the input x is greater than some specific value. The sigmoid function isn’t a step function, however, the edge is “soft,” and the output doesn’t change instantly. This means that there is a derivative of the function and this is important for the training algorithm (Thomas, 2017).

Gradient Descent

Let’s look at how we can reduce the doubled mistakes on every training instance by looking into the problem simply. Assume that our straight neuron has only two weights, wl and w2. After this, we could consider an instance of a 3D space where the horizontal dimensions correspond to the weights wl and w2 and the linear dimension refers to the value of the error equation Z. Considering the current area, points in the linear plane refer respectively to diverse changes of the respective weights, and the length at those points refer to the error that has occurred. If the errors are considered we create over all considerable weights, we get an area in this 3D area, specifically, a parabolic surface as showrn in Figure 12.10.

One can also simply see this area as a cluster of elliptical curves, where the least error is at the center. In this setup, we are working in a two-dimensional plane where the dimensions correspond to the two weights. Contours correspond to settings of wl and w2 that evaluate to the same value of E. The closer the contours are to each other, the steeper the slope. In fact, it turns out that the direction of the steepest descent is always perpendicular to the contours. This direction is expressed as a vector known

Gradient descendent

FIGURE 12.10 Gradient descendent.

as the gradient. Now we can develop a high-level strategy for how to find the values of the weights that minimizes the error function. Suppose we randomly initialize the weights of our network so we see all of us on the linear area. By evaluating the gradient at our current position, we can find the direction of steepest descent, and we can take a step in that direction. Then we’ll find ourselves at a new position that’s closer to the minimum than we were before. We can re-evaluate the vector of steepest descent by taking the gradient at the new position and taking a step in this new direction. It’s easy to see that, as shown in Figure 12.11, on considering this plan we would slowly get to the point of least error. This algorithm is known as gradient descent (Buduma, 2017).

The Backpropagation Algorithm

The backpropagation algorithm was pioneered by David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams in 1986. Can we guess the centralized idea behind the evolution of the backpropagation algorithm? We do not realize what the unseen entities might be working on, all we can work on is to calculate how rapidly the error changes as we change an unseen activity. From there, we can figure out how


FIGURE 12.11 Backpropagation.

fast the error changes when we change the weight of an individual connection. This is the only way that we’re going to be working in an extremely high-dimensional space. We begin by computing the error derivatives with respect to a single training example. Numerous output units are affected by each of the hidden things. Numerous other changes on the error in a very informative way will have to be populated by us. Programming dynamically will be one of our methods. As soon as the derivatives of error are known to us for a single hidden unit layer, we will use it to calculate the byproduct of error activities for invisible units; it is little bit easy for getting out the derivatives of the erroneous part of the weights leading to hidden unit (Buduma, 2017) (Figure 12.12).

TensorFlow and Its Use

TensorFlow is a Python library that allows users to convey arbitrary computations as a graph of data flow rate. Nodes in this graph represent mathematical operations.

while edges represent data that is communicated from one node to another. Data in TensorFlow is represented as tensors, which are multidimensional arrays (representing vectors with a ID tensor, matrices with a 2D tensor, etc.) (Buduma, 2017).

Apart from TensorFlow, there are other libraries that have come up over the decade for building deep neural networks. These include Theano, Torch, Caffe, Neon, and Keras.

< Prev   CONTENTS   Source   Next >