IV: Applying Neural Networks and Deep Learning Models for Cognitive Problems

Recent Issues with Machine Vision Applications for Deep Network Architectures


Department of Mathematics. Indian Institute of Technology, Roorkee, Uttarakhand 247667, India, E-mail: This email address is being protected from spam bots, you need Javascript enabled to view it , This email address is being protected from spam bots, you need Javascript enabled to view it


Iii this chapter, we put an effort to focus on the processing and storage issues that can be carried out from the updates of unstructur ed data in the unconstraint environment. In the technological aspects, we discuss the existing intelligent deep network architectures along with traditional machine learning approaches. The outcomes of the analytical discussion presented in this chapter enable to picturize a generalized intelligent neural network for processing highly complex visual data with graphs and manifold structures. The essential issues with updating the hidden layers and some fast optimization techniques are also introduced. Finally, this can be concluded that the presented work reflects the sounding challenges to process and extract qualitative information from the densely unstructured tremendous amount of training data. In case of video processing, we need to frame out various deep learning aspects that can lead the research work to highly resourceful scope for deep data analytics and many problems that require high-performance computing in visual media.


hi a real-life scenario, every small activity is inherently structured with a massive amount of visual context. The variations in visual information can be easily pointed out by human and machine vision. Making records and processing of continuously generated visual information for any specific objective can create a significant challenge to the traditional storage capacity and processing issues with machines intelligence system. Such problems are widespread in several domains such as particular human activity recognition at public places, road traffic monitoring, chemical process, and financial updates in the share market. Graphs have been a ubiquitous computational entity from many decades to deal with a greater extent of computational work in many disciplines of applied science and engineering. A graph is a simple network of nodes that can be used collectively to process, control, and communicate the flow of information between the nodes through its edges. Therefore, a graph can be considered a computational structure, which creates the scope of the large computational framework in many disciplines of science and engineering applications. The detailed characteristic and attributes of the components of graphs vary with the specific domain of the network, hi the computational views, the nodes in any graphs correspond to the random variables, and the edges between the nodes reflect statistical operations between the variables. To develop an efficient graphical model for the evaluation of the system in any engineering or scientific discipline requires detailed practical knowledge of graph theory and statistical mechanics. Hidden Markov model, Markov random fields, and Kalman filter Bayesian network are famous examples of the graphical model. In applied mathematics and engineering problems, these models are primarily used to deal with uncertainty and complexity, which are known as classification score or categorization of the objects in the spatiotemporal scenario. All the very specific classical multivariate probabilistic systems come in the categories of the phenomena that are frequently shidied in pattern recognition, information theory, and statistical mechanics.

Semantically visual key information is ever hardly fixed from the perspective several perceptual vision, that is, the information related to any event due to human actions varies with video understating [25]. Real-world things are realized to the computers by artificial intelligence (AI) from the data captured from perceptual vision. Jordan [1] proposed that the general graphical model that can efficiently formulate and design a new framework for a system. AI techniques can produce the design and development of any complex system for many disciplines in a very smooth fashion. The processing criterion covers logic programming, reinforcement learning, expert system, neural network, cognitive science, swarm intelligence, and fuzzy logic. For achieving more exceptional data analytics, AI is promoted to machine learning and deep learning. All the building blocks of AI can be broadly exploited in sounding research domain such as natural language processing (NLP), medical imaging, game playing, computer vision, and robotics.

Regarding data generation, a report fr om the National Security Agency presented that the whole universe exerts the total amount of 1826 Petabytes data per day. This data statistics represent the entire amount of energy consumed and stored per day hi the world. In 2009, it was reported that the data generation rate had become nine times more than the data generated in the last five years from 2004. From similar statistics, it is predicted that this amount will be 35 trillion gigabytes of the world in 2020 [6,7]. This is amazing and revolutionary information for data scientists. On the positive side, this information can meet the dream of data scientists to develop the several smart technologies such as automated healthcare diagnoses, safety and security, and an intelligent system for education and psychological training. However, from the development aspects, the processing issues of such huge amount of data to get trained require to develop a very big network. This will be the biggest hurdle to resolve the issues created due to big data characteristics. However, this opens a very good research domain of developing an intelligent online system that can reduce the overhead due to big data issues. Here, instead of focusing on big data analytics, we have chosen the prime objective to represent the issues with real-world data analytics and to get framed a deep neural network model of graph-based real-world information [31-34]. The deep neural network is expected to resourceful to solve the challenging many challenging issues in the visual domain. In fact, this is a very broad area and a generalized topic in computer vision problem. Apart from this, the graphical evolution to develop a deep network is also depicted along with several operations in hidden layers. This analytical study will help to develop a highly optimized and fast deep network model.


Evidence from neurology and psychological studies represents the fact that understating the things of the real world varies with the various perspectives of human nature such as locality, age factor, etc. hr contrast, the machines remain constant and act in all the conditions accordingly once the instructions are set to perform any specific job related to visual media. This process requires to put the majority of efforts to achieve learning of the things with the desired accuracy. Zero-shot learning or unsupervised leanring becomes the necessity in very common cases of real-life problem where extracting specific ground truth for a particular research problem is extensively burdensome. This means that real-world structure creates unconstraint issues to any machine learning problem. For instance, this fact can be realized as recognizing frontal face can give far better accuracy than the face in wild environments. Therefore, several such complex research problems in computer vision such as human-computer interface, scene understating, and human brain interface is referred to as hot research problems due to unstructured real-world visual media. This needs to develop an efficient machine intelligence system (MIS), which can efficiently provide an interface to solve the many physical world problems.


Noil-Euclidean structures are vital substances in real world that create hard issues in processing and analysis the specified information using machines. This happens because of the variations in the features of similar objects and similarity of features in different objects or overlapping the coincident features. The example of the chair is shown in Figure 10.1 to explain this concept in static vision. In contrast, different objects may have a color similarity or may overlap one to another for a particular moment in motion. This problem is referred to as occlusion.

Visionary issues with random features of real-world structures (e.g., chair)

FIGURE 10.1 Visionary issues with random features of real-world structures (e.g., chair).

In Figure 10.2, a veiy general overview is presented to reflect the requirements of the vision based deep architecture of an intelligent system. This includes three components: (1) heterogeneous system real-world objects with random features, (2) a neural network system that required to develop by training with the specific ground from the physical datasets, and (3) the specific library to work as a back-end for the front-end network.

A general overview of visual media to develop an intelligent system

FIGURE 10.2 A general overview of visual media to develop an intelligent system.

This is not very necessary to use exactly the famous deep architectures such as AlexNet [14], GoogleNet [35], and VGGNet [36]. In addition, developing a new network is not very difficult because the concept of deep network architecture allows updating the hidden layers accordingly the specified objective. There are several factors that can affect the structure of a deep network that may include a number of layers and parameters selected to process the data, size of stride, and type of layers introduced.


Acoustic data consider ID signal processing of sound, whereas images and videos processing take account for 2D and 3D signal processing. Machine intelligence is characterized by supervised and unsupervised learning of deep geometrical structures of the real-world objects. Geometrical deep knowledge can extract important features, from which an optimized and faster model can be built. Therefore, the analysis and evaluations of all the non-Euclidean structures can be performed faster. Several applications of computational science and domain network analysis can be found to exploit the geometrical deep learning detection and tracking of the occluded objects in video sequences. Furthermore, NLP, video captioning, and descriptor-based research work for features analysis in recognition phases are supposed to consider the importance of deep network architecture for real-world geometry.

A convolutional neural network (CNN) [37] has been proved to be an outstanding backbone of several deep networks, which is characterized by a cluster of matrices multiplication. For graphs and manifolds, a generalized CNN is represented to exploit image and video processing with deep learning [2]. hi addition, it highlighted that the data related to gr aphs and manifold are most prominently used in the network social media, transportation, and the sensor-based anatomical structure of the human brain. Graphs are efficient tools to represent any complex real-world information, but learning them with the machine is quite difficult. In this case, deep learning is veiy useful to automate graph-based representation of real-world entities [3]. Graph- based kernel, indexing, and hashing use map classification and recognition. Detecting an object in video or still image is an open challenge to the computer vision community. This requires precise and exact detection in less amount of time. Several variations of the CNN are reported to develop object detectors, single-shot detection [12], the real-time object detection scheme “you look only once” (YOLO), and its series YOLOv2 and YOLOv3 [39]. All the methods outperformed the region proposal network, which is simply a faster recurrent neural network (RNN), faster-R-CNN [40]. Deep learning in graph clustering is outshined the spectral in terms of computational complexity [17]. In this, graph clustering with deep learning, spectral clustering, and /г-means clustering are utilized to implement the graph encoder with sparse autoencoder. Sparse encoder controls reconstruction and sparsity errors. Processing the whole data as a graph is the toughest problem, but this can provide efficient analytical solutions in several disciplines such as molecular biology, pattern recognition, and astronomy with the study of the geospatial satellite. The graphs with the CNN are combined as a graph convolution network (GNN) [20,29]. hi the same tune, a graph coevolution network is used to develop graph autoencoders for unsupervised learning [21]. As a vision perspective, the worldly information is updated with variations in the characteristics of nodes, which means any specific worldly event is happened due to the changes in the functions of a particular object. Considering the physical world as a graph becomes an issue of high-performance computing. This approach can provide alternative solutions of many problems like suspicious event detection, undirected human actions on the basis of nodes, and edges information in the connected graph [22]. This concludes that the possibility of success of such experiments can be expected only with the support of deep learning methodology. Motivated by this fact, the spatial-temporal GNN is proposed for human action recognition from depth information [18,30]. In the proposed work, the human body is assumed a graph of joints as nodes and bones as the edges connecting to nodes. The experiments were performed on NTU RGBD human action benchmarks. Several such experiments ensure that deep learning is remarkably capable of filling the gaps of spatiotemporal events by developing a deep model of the action sequence. Such experiments can be successful in providing desired results by jointly exploiting sequence learning networks such as RNN and spatiotemporal graphs. With the help of graphs of the depth information collected from the human body and stmctural-RNN, the important experiments to model the human motion and interaction with machines are performed [19]. Practically, lack of ground truth may be noticed to raise the training issues with the model for many existing or new problems in the machine learning domain. Leveraging the lack of exact training semisupervised learning is a better option. For this case, fast approximation convolution is performed, and the comparison processor was shown on central processing unit and graphics processing unit (GPU) [23]. Furthermore, the label propagation jointly with semisupervised learning utilized a large number of facts from unlabeled data with neural networks [26]. Machine learning with graphs and deep network opens a ubiquitously high range of solutions.

In the review work presented in [24], the mam problem for machine learning with graphs is highlighted as the information association between the nodes. The encoding and decoding scheme with the embedding of graphs is helpful for informatics. Recently observed state-of-the-art presents scalability and interpretability in temporal graphs as open issues [26].


Graphs provide a mechanism of lucid representation to the processing components in a network. Computer vision literature is full with plenty of research on ID, 2D, and 3D data representation, winch is referred as analysis of acoustic signal, image, and video processing, respectively. All these representations are termed Euclidean structures. Non-Euclidean structures include graphs and manifolds. Recently, the success of deep learning has been reported as an interesting work with non-Euclidean geometry, but several techniques for graph-based image processing such as normalized cut-based image segmentation remain ever sounding [41-43]. With the advent of deep learning, they constructed a deformation invariant model of manifolds and graphs in 3D spatial domain. Achieving deformation invariant features for non-Euclidean objects in frequency domain is an open challenge to the computer vision community. On these structures, convolution is not supposed easily applicable since recovery of such lower dimensional structures manifolds referred as nonlinear dimensionality reduction, which can be consider as instance of unsupervised deep learning. Being veiy specific for graph-based deep learning application, the experiments on human skeleton are performed for action recognition [44]. They utilized lie group features on the graphs of skeleton joints, which can be easily aligned with temporal features due to rotation in joints.

< Prev   CONTENTS   Source   Next >