Big Data Analytics for AV Inspection and Maintenance
Big Data Analytics and Cyber-Physical Systems
- 11.1.1 Big Data Analytics
- 11.1.1.1 Big Data Analytics Definition
Big data analytics is defined as mining of pertinent knowledge and valuable insights from large amounts of stored data (Rajeshwari, 2012). The key objective of such analytics is to facilitate decisionmaking for researchers, such as offering dashboards, graphics, or operational reporting to monitor thresholds and key performance indicators (KPIs). This involves using mathematical and statistical methods to understand data, simulate scenarios, validate hypotheses, and make predictive forecasts for future incidents.
Data mining is a key concept in big data analytics; it consists of applying data science techniques to analyze and explore large datasets to find meaningful and useful patterns in those data. It involves complex statistical models and sophisticated algorithms, such as machine learning algorithms, mainly to perform four categories of analytics: descriptive analytics, predictive analytics, prescriptive analytics, and discovery (exploratory) analytics.
Descriptive analytics turns collected data into meaningful information for interpreting, reporting, monitoring, and visualization purposes via statistical graphical tools, such as pie charts, graphs, bar charts, and dashboards. Predictive analytics is commonly defined as data extrapolation based on available data for ensuring better decision-making. Prescriptive analytics is associated with descriptive and predictive analytics. Likewise, based on the present situation, it offers options on how to benefit from future opportunities or mitigate a future risk and details the implication of each decision option. Finally, discovery (exploratory) analytics illustrates unexpected relationships between parameters in big data (Rajeshwari, 2012). Some authors argue that the output of predictive analytics can benefit from the potential of descriptive analytics through the use of dashboards and scorecard computations (Bahri, Zoghlami. Abed, & Tavares, 2019).
11.1.1.2 Big Data Analytics
Big data analytics constitutes one of the most important arenas in big data systems, as it reveals hidden patterns, unknown correlations, and other useful information, which in turn, boosts revenue for many businesses. In this section, we present an overview techniques and tools for big data analysis (Atat et al., 2016).
A. Data mining
One of the interesting features of cyber-physical systems (CPSs) is automated decisionmaking. This means that CPS objects are supposed to be smart in sensing, identifying events, and interacting with others (Qu, Wang, & Yang, 2010). The massive data collected by CPS need to be converted into useful knowledge to uncover hidden patterns to find solutions and enhance system performance and quality of services. The process of extracting this useful information is referred to as data mining. One way to facilitate the data mining process is to reduce data complexity by allowing objects to capture only the interesting data rather than all of them. Before data mining can be applied to the data, some processing steps need to be completed, such as key feature selection, preprocessing, and transformation of data. Dimensionality reduction is one potential method to reduce the number of features of the data (Xu. Li. Shu, & Peng, 2014). Chen. Sanga, Chou, Cristini. and Edgerton (2009) used a neural network with к-means clustering via principal component analysis (PCA) to reduce the complexity and the number of dimensions of gene expression data to extract disease-related information from gene expression profiles. Knowledge discovery in databases (KDD) is also used in different CPS scenarios to find hidden patterns and unknown correlations in data so that useful information can be converted into knowledge (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). One such use of KDD is in smart infrastructures, when these systems need to answer queries and make recommendations about the system operation to the facility manager (Behl & Mangharam. 2016).
Tsai, Lai, Chiang, and Yang (2014) broke down the core operations of data mining into three main operations: data scanning, rules construction, and rules update. Data scanning is selecting the needed data by the operator. Rules construction includes creating candidate rules by using selection, construction, and perturbation. Finally, candidate rules are checked by the operator, then evaluated to determine which ones will be kept for the next iteration. The process of scanning, construction, and update operations is repeated until the termination criteria are met. This data mining framework works for deterministic mining algorithms such as к-means, and the metaheuristic algorithms such as simulated annealing and the genetic algorithm (Atat et al., 2016).
Clustering, classification, and frequent pattern are different mining techniques that can be used to make CPSs smarter.
Tsai, Lai, Chiang, and Yang (2014) discussed two purposes for clustering (Atat et al., 2016):
i. Clustering in Internet of Things (IoT) infrastructure;
ii. Clustering in IoT services.
Clustering in the IoT infrastructure helps enhance system performance in terms of identification, sensing, and actuation. For example, in Kardeby (2011), nodes exchanged information to identify whether they could be grouped together depending on the needs of the IoT applications. Clustering can also help provide higher quality IoT services, such as in smart homes (Rashidi, Cook, Holder, & Schmitter-Edgecombe, 2011).
Classification does not require prior knowledge to complete the partitioning of objects into clusters, also known as unsupervised learning. Classification tools include decision trees, ^-nearest neighbor, naive Bayesian classification, adaboost, and support vector machines. Classification can be done to improve the infrastructure and the services of IoT.
Finally, frequent pattern mining is about uncovering interesting patterns, such as which items will be purchased together with previously purchased items, or suggest items for customers to purchase based on customer’s characteristics, behavior, purchase history, and so on. Figure 11.1 illustrates the CPS big data mining process for useful information extraction (Atat et al., 2016).
B. Real-time analytics
Real-time analysis is another approach to produce useful information from massive raw data. Real-time data streams are converted to structured data before being analyzed by big data tools, such as Hadoop. Many application domains such as healthcare, transportation systems, environmental monitoring, and smart cities, require real-time decision-making and control. For example, Twitter data can be real-time analyzed to enhance the prediction process and to make useful recommendations to users; terrorist incident data can be real-time analyzed to predict future incidents; big data streams in healthcare can be analyzed to help medical staff make decisions in real time, and this, in turn, can save patients’ lives and improve the healthcare services provided, while reducing medical costs. A near real-time big data analysis architecture for vehicular networks proposed by Daniel, Paul, and Ahmad (2015) consists of a centralized data storage for data processing and a distributed data storage for streaming processed data in real-time analysis (Atat et al., 2016).

FIGURE 11.1 CPS big data mining process (Atat et al., 2016).
Wang, Mi, Xu, Shu, and Deng (2016) proposed a real-time hybrid-stream big data analytics model for big data video analysis. Zhou et al. (2017) considered online network analysis as a stream analysis issue and proposed using Spark Streaming to monitor and analyze the high-speed Internet traffic data in real time. Cao, Wachowicz, and Cha (2017) suggested mobile-edge computing nodes deployed on a transit bus, with descriptive analytics used to explore meaningful patterns of real-time data streams for transit. Trinks and Felden (2017) discussed the terms and related concepts of real-time analytics (RTA) for industry big data analytical solutions. Ali (2018) provided a framework to efficiently leverage big data technology and allow deep analysis of large and complex datasets for real-time big data warehousing (Atat et al., 2016).
Arranging the data in a representative form can provide information visualization, making the information extraction and understanding of complex large-scale systems much easier (Liu et al., 2016). Geographical Information Systems (GIS) is one important tool of visualization (Chopade, Zhan, Roy, & Flurchick, 2015), as it can help real-time analysis of many applications, such as in healthcare, urban and regional planning, transportation systems, emergency situations, public safety, and so on. Chopade, Zhan, Roy, and Flurchick (2015) proposed a large-scale system data visualization architecture called X-SimViz, which allows real-time dynamic data analytics and visualization. Computer vision is another approach to detecting security anomalies. Visualization can also be a useful tool in predicting real-time cyber attacks. For instance, Tan et al. (2015) used computer vision to transform the network traffic data into images using a multivariate correlation analysis approach based on a dissimilarity measure called Earth Mover’s Distance to help detect denial-of-service attacks. A computer vision deep learning algorithm for human activity recognition was proposed by Mo, Li, Zhu, and Huang (2016). The model is capable of recognizing 12 types of human activities with high accuracy and without the need of prior knowledge, useful for security monitoring applications (Atat et al., 2016).
C. Cloud-based big data analytics
Cloud-based analysis in CPS constitutes a scalable and reliable architecture to perform analytics operations on big data streams, such as extracting, aggregating, and analyzing data of different granularities. A massive amount of data is usually stored in spreadsheets or other applications, and a cloud-based analytics service, using statistical analysis and machine learning, helps reduce the big data to a manageable size so information can be extracted, hypotheses can be tested, and conclusions can be drawn from nonnumerical data, such as photos.
Data can be imported from the cloud, and users are able to run cloud data analytics algorithms on big datasets, after which data can be stored back to the cloud. For instance, Yetis et al. (2016) used cloud computing using the MapReduce algorithm to conduct analyze on crime rates in the city of Austin using different attributes like crime type and location to help build a design that prevents future crimes and improves public safety.
Even though cloud computing is an attractive analytics tool for big data applications, it comes with some challenges, mainly concerning security, privacy, and data ownership. Tawalbeh, Mehmood, Benkhlifa, and Song (2016) extended the use of clouds to mobile cloud computing to help overcome the challenge of resource limitations, such as memory, battery life, and CPU power. A mobile cloud computing architecture was suggested for healthcare applications, along with the various big data analytic tools available.
Clemente-Castellety et al. (2015) suggested using a hybrid cloud computing consisting of public and private clouds to accelerate the analysis of massive data workloads on the MapReduce framework without requiring significant modifications to the framework. In a private cloud, cloud services delivered over the physical infrastructure are exclusively dedicated to the tenant. The hybrid cloud uses a set of virtual machines running on the private cloud, which take advantage of data locality; another set of virtual machines runs on a public cloud to run the analysis at a faster rate (Atat et al., 2016).
To optimize the utilization of cloud computing resources, predicting the expected workload and the amount of resources needed is important to reduce waste. Neves, Rose, Katrinis, and Franke (2014) developed a system that predicts the resource requirements of a MapReduce application to optimize bandwidth allocation to the application, while Islam, Keung, Lee, and Liu (2012) used linear networks, along with linear regression, to predict the future need of new resources and virtual machines (VMs). When the system falls short in predicting the right amount of resources needed, it becomes incapable of accommodating a high workload demand, leading to anomalies. Anomaly detection is an essential part of big data analytics, as it helps improve the quality of service by checking whether the measurements of the workload observed and the baseline workloads diverge by a specific margin, with the baseline workloads providing a measure of how the demand changes during a period of time based on historical records (Buyya et al., 2015).
D. Spatial-temporal analytics
Massive data obtained from widely deployed spatiotemporal sensors have caused challenges in data storage, process scalability, and retrieval efficiency. Zhong, Fang, and Zhao (2013) proposed the distributed composite spatiotemporal index approach, Vegalndexer, to efficiently process large amounts of spatiotemporal sensor data. Zheng, Ben, and Yuan (2014) investigated the big data issues in Internet of Vehicles (IOVs) applications and proposed to use cloud- based big data space-time analytics to enhance the analysis efficiency. Sinda and Liao (2017) proposed STAnD to determine anomaly patterns for potential malicious events within these spatial-temporal datasets.
The spatially distributed CPS nodes can be used to analyze location information. Ding, Tan, Wu, and Zhang (2014) proposed an efficient indoor positioning based on a new empirical propagation model using fingerprinting sensors, called the regional propagation model (RPM), based on the cluster-based propagation model theory. In another study. Ding, Tan, Wu, Zeng, and Zhang (2015) used particle swarm optimization (PSO) to estimate the location information using the Kalman filter to update the initial estimated location (Atat et al., 2016).
E. Big data analytical tools
Typical tools for big data analytics, data mining, real-time big data analytics, and cloud- based big data analytics include the following (Atat et al., 2016):
1. Tools for data mining
Hadoop is an open source managed by the Apache Software Foundation. There are two main components for Hadoop, HDFS and MapReduce. HDFS was inspired by GFS. It is a scalable and distributed storage system, an appropriate solution for data-intensive applications, such as those on a gigabyte and terabyte scale. Rather than just being a storage layer of Hadoop, HDFS adds to throughput improvement of the system and supplies efficient fault detection and automatic recovery. MapReduce is a framework used to analyze massive datasets in a distributed fashion by means of numerous machines. There are two functions in the mathematical model of MapReduce, Map and Reduce, both of which are available to be programmed. R is an open source software environment for data mining developed by AT&T Bell Labs. It is a realization of the S language used to explore data, implement statistical analysis, and draw plots. Compared with S, R is more popular and supported by a large number of database manufacturers, such as Teradata and Oracle (Atat et al„ 2016).
2. Tools for real-time big data analytics
Storm is a distributed real-time computing system for big data analysis. Compared with Hadoop, Storm is easier to operate and more scalable to provide competitive and efficient services. Storm makes use of distinct topologies for different storm tasks in terms of storm clusters, which are composed of master nodes and worker nodes. The master nodes and worker nodes play two kinds of roles in the fields of big data analysis, nimbus and supervisor, respectively. The functions of these two roles are in agreement with job tracker and task tracker of the MapReduce framework. Nimbus takes charge of code distribution across the storm cluster, the schedule and assignment of worker nodes tasks, and the whole system surveillance. The supervisor compiles tasks given by nimbus. Splunk is also a real-time platform designed for big data analytics. Based on the Web interface, Splunk is available to search, monitor, and analyze machine-generated big data, and the results are exhibited in different formats including graphs, reports, alerts, and so on. Unlike other real-time analytical tools, Splunk provides various smart services for commercial operations, system problem diagnosis, and so on (Atat et ah, 2016).
3. Tools for cloud-based big data analytics
The most popular tool for cloud-based big data analytics, Google’s cloud computing platform consists of GFS (big data storage), BigTable (big data management), and MapReduce (cloud computing). GFS is a distributed file system, enhanced to meet the requirements of big data storage and usage demands of Google Inc. In order to deal with the commodity component failure problem, GFS facilitates continuous surveillance, error detection, and component fault tolerance. GFS adopts a clustered approach that divides data chunks into 64-KB blocks and stores a 32-bit checksum for each block. BigTable supplies highly adaptable, reliable, applicable, and dynamic control and management in the field of big data placement, representation, indexing, and clustering for enormous and distributed commodity servers; it constitutes a row, column, record tablet, and time stamp (Atat et ah, 2016).
F. Summary and insights
To better extract information from big data, it is important to enhance the cloud’s analytic performance. A combination of the different techniques discussed in this section can be used to optimize cloud computing resources. If VMs and cloud resources and requirements can be predicted beforehand, workloads can be efficiently processed and analyzed by taking advantage of the cloud’s analytical tools. Using a hybrid cloud can further speed up the analysis of workloads, leading to reduced latency and efficient data mining (Atat et ah, 2016).
11.1.1.3 Descriptive Tasks of Big Data Analytics
The descriptive task of big data analytics is to identify the common characteristics of data with the purpose of deriving patterns and relationships existing in the data. The descriptive functions of big data mining include classification analysis, clustering analysis, association analysis, and logistic regression analysis (Lee, Cao, & Ng, 2017).
- • Classification analysis: Classification is a typical learning model used in big data analytics; it aims to build a model for making predictions on data features from a predefined set of classes according to certain criteria. A rule based classification is used to extract IF-THEN rules to classify different categories. Examples of classification techniques are neural networks, decision trees, and support vector machines.
- • Clustering analysis: This is the process of grouping data into separate cluster of similar objects to segment data and acquire the data features. Data can be divided into subgroups according to their characteristics. Practioners may formulate appropriate strategies for different clusters. Common examples of clustering techniques are the к-means algorithm, self-organizing map, hill climbing algorithm, and density-based spatial clustering.
- • Association analysis: An association model helps practitioners recognize groups of items that occur synchronously. The association algorithm is developed to search frequent sets of items with a minimum specified confidence level. The criteria support and confidence level help to identify the most important relationships among items.
- • Regression analysis: Regression analysis determines the logical relationship of the historical data. The focus in regression analysis is on measuring the dependent variable given one or several independent variables; the result is a conditional estimation of the expected outcome using the regression function. Linear regression, nonlinear regression, and exponential regression are common statistical methods to measure the best fit for a set of data.
- 11.1.2 Cyber-Physical Systems (CPSs)
- 11.1.2.1 Introduction
Computing and communication capabilities will soon be embedded in all types of objects and structures in the physical environment, and applications with enormous societal impact and economic benefit will be created by harnessing these capabilities across both space and time. Such systems that bridge the cyber world of computing and communications with the physical world are referred to as cyber-physical systems (CPSs). CPSs are physical and engineered systems whose operations are monitored, coordinated, controlled, and integrated by a computing and communication core. This intimate coupling of the cyber and physical will be manifested from the nano-world to large-scale wide-area systems of systems. The Internet transformed how humans interact and communicate with one another, revolutionized how and where information is accessed, and even changed how people buy and sell products. Similarly, CPS will transform how humans interact with and control the physical world around them.
Examples of CPS include medical devices and systems, aerospace systems, transportation vehicles and intelligent highways, defense systems, robotic systems, process control, factory automation, building and environmental control, and smart spaces. CPSs interact with the physical world and must operate dependably, safely, securely, and efficiently in real time.
The World Wide Web can be considered a confluence of three core enabling technologies: hypertext, communication protocols like TCP/IP, and graphical interfaces. This integration enabled significant leaps in technology (e.g., graphics, networking, semantic Webs, multimedia interfaces and languages), infrastructure (e.g., global connectivity with increasing bandwidth, PCs for every desktop and laptop), and applications (e.g., ecommerce, auctions, entertainment, digital libraries, social networks, and online communities). Likewise, CPS can be considered to be a confluence of embedded systems, real-time systems, distributed sensor systems, and controls.
The promise of CPS is pushed by several recent trends: the proliferation of low-cost and increased- capability sensors in increasingly smaller forms; the availability of low cost, low-power, high-capacity, small form-factor computing devices; the wireless communication revolution; abundant Internet bandwidth; and continuing improvements in energy capacity, alternative energy sources, and energy harvesting. The need for CPS technologies is also being pulled by CPS vendors in sectors such as aerospace, building and environmental control, critical infrastructure, process control, factory automation, and healthcare who are increasingly finding that the technology base to build large-scale safety-critical CPS correctly, affordably, flexibly, and on schedule is seriously lacking.
CPSs bring together the discrete and powerful logic of computing to monitor and control the continuous dynamics of physical and engineered systems. The precision of computing must interface with the uncertainty and noise in the physical environment. The lack of perfect synchrony across time and space must be dealt with. The failures of components in both the cyber and physical domains must be tolerated or contained. Security and privacy requirements must be enforced. System dynamics across multiple time scales must be addressed. Scale and increasing complexity must be tamed. These needs call for the creation of innovative scientific foundations and engineering principles. Trial-and-error approaches to build computing-centric engineered systems must be replaced by rigorous methods, certified systems, and powerful tools. Analyses and mathematics must replace inefficient and testing-intensive techniques. Unexpected accidents and failures must fade, and robust system design must become an established domain. New sensors and sensor fusion technologies must be developed. Smaller and more powerful actuators must become available.
The confluence of the underlying CPS technologies enables new opportunities and poses new research challenges. CPSs will be composed of interconnected clusters of processing elements and large-scale wired and wireless networks that connect a variety of smart sensors and actuators. The coupling of the cyber and physical contexts will be driven by new demands and applications. Innovative solutions will address unprecedented security and privacy needs. New spatial-temporal constraints will be satisfied. Novel interactions among communications, computing, and control will be understood. CPS will also interface with many nontechnical users. Integration and influence across administrative boundaries will be possible.
The innovation and development of CPS will require computer scientists and network professionals to work with experts in various engineering disciplines, including control engineering, signal processing, civil engineering, mechanical engineering, and biology. This, in turn, will revolutionize how universities educate engineers and scientists. The size, composition, and competencies of industry teams that design, develop, and deploy CPS will also change dramatically. The global competitiveness of national economies that become technology leaders in CPS will improve significantly (PCAST, 2007).
The ability to interact with, and expand the capabilities of, the physical world through computation, communication, and control is a key enabler for future technology developments. Opportunities and research challenges include the design and development of next-generation airplanes and space vehicles, hybrid gas-electric vehicles, fully autonomous urban driving, and prostheses that allow brain signals to control physical objects.
Over the years, systems and control researchers have pioneered the development of powerful system science and engineering methods and tools, such as time and frequency domain methods, state space analysis, system identification, filtering, prediction, optimization, robust control, and stochastic control. At the same time, computer science researchers have made major breakthroughs in new programming languages, real-time computing techniques, visualization methods, compiler designs, embedded systems architectures and systems software, and innovative approaches to ensure computer system reliability, cyber security, and fault tolerance. Computer science researchers have also developed a variety of powerful modeling formalisms and verification tools. CPS research aims to integrate knowledge and engineering principles across the computational and engineering disciplines (networking, control, software, human interaction, learning theory, as well as electrical, mechanical, chemical, biomedical, material science, and other engineering disciplines) to develop new CPS science and supporting technology.
In industrial practice, many engineering systems have been designed by decoupling the control system design from the hardware/software implementation details. After the control system is designed and verified by extensive simulation, ad hoc tuning methods have been used to address modeling uncertainty and random disturbances. However, the integration of various subsystems, while keeping the system functional and operational, has been time consuming and costly. For example, in the automotive industry, a vehicle control system relies on system components manufactured by different vendors with their own software and hardware. A major challenge for original equipment manufacturers (OEMs) who provide parts to a supply chain is to hold down costs by developing components that can be integrated into different vehicles (Baheti & Gill, 2011).
The increasing complexity of components and the use of more advanced technologies for sensors and actuators, wireless communication, and multicore processors pose a major challenge for building next- generation vehicle control systems. Both the supplier and integrator need new science that enables reliable and cost-effective integration of independently developed system components. In particular, theory and tools are needed for developing cost-effective methods to (Baheti & Gill, 2011):
- 1. Design, analyze, and verify components at various levels of abstraction, including the system and software architecture levels, subject to constraints from other levels.
- 2. Analyze and understand interactions between the vehicle control systems and other subsystems (engine, transmission, steering, wheel, brake, and suspension).
- 3. Ensure safety, stability, and performance while minimizing vehicle cost to the consumer. New functionality and the cost of vehicle control systems are increasingly major differentiating factors for business viability in automobile manufacturing.
- 11.1.2.2 CPS Definition
A cyber-physical system is the integration of computation with physical processes. Embedded computers and networks monitor and control the physical processes, usually with feedback loops in which physical processes affect computations and vice versa. In the physical world, the passage of time is inexorable, and concurrency is intrinsic. Neither of these properties is present in today’s computing and networking abstractions.
Applications of CPS arguably have the potential to dwarf the 20th-century IT revolution. They include high confidence medical devices and systems, assisted living, traffic control and safety, advanced automotive systems, process control, energy conservation, environmental control, avionics, instrumentation, critical infrastructure control (e.g., electric power, water resources, and communications systems), distributed robotics (telepresence, telemedicine), defense systems, manufacturing, and smart structures. It is easy to envision new capabilities, such as distributed micro-power generation coupled into the power grid, where timing precision and security issues loom large. Transportation systems could benefit considerably from better embedded intelligence in automobiles, as this could improve safety and efficiency. Networked autonomous vehicles (AVs) could dramatically enhance the effectiveness of our military and offer substantially more effective disaster recovery techniques. Networked building control systems (such as heating, ventilation, air conditioning or HVAC, and lighting) could significantly improve energy efficiency and demand variability, reducing our dependence on fossil fuels and our greenhouse gas emissions. In communications, cognitive radio could benefit enormously from distributed consensus on available bandwidth and distributed control technologies. Financial networks could be dramatically changed by precision timing. Large-scale service systems, leveraging of radio frequency identification (RFID) and other technologies to track goods and services could acquire the nature of distributed realtime control systems. Distributed real-time games that integrate sensors and actuators could change the (relatively passive) nature of online social interactions.
The positive economic impact of any one of these applications areas would be enormous. Today’s computing and networking technologies, however, may have properties that unnecessarily impede progress towards these applications. For example, the lack of temporal semantics and adequate concurrency models in computing, and today’s “best effort” networking technologies make predictable and reliable real-time performance difficult, at best. Software component technologies, including object-oriented design and service-oriented architectures, are built on abstractions that match software much better than physical systems. Many of these applications may not be achievable without substantial changes in the core abstractions (Lee, 2008).
11.1.2.3 CPS Concept
CPS is an integration of computation with physical processes; it is about the intersection, not the union of the physical and the cyber. A complex CPS definition was given by Shankar Sastry from University of California, Berkeley, in 2008: “A cyber-physical system (CPS) integrates computing, communication and storage capabilities with monitoring and/or control of entities in the physical world, and must do so dependably, safety, securely, efficiently and real-time” (Sanislav & Miclea, 2012).
CPSs are not the traditional embedded systems or real-time systems, today’s sensor networks or desktop applications, but they have certain characteristics that define them, as mentioned in Huang (2008) and presented below (Sanislav & Miclea, 2012):
- 1. Cyber capabilities in every physical component;
- 2. Networked at multiple and extreme scale;
- 3. Dynamically reconfiguring/reorganizing;
- 4. High degrees of automation, with closing control loops;
- 5. Dependable operation and certified in some cases;
- 6. Cyber and physical components integrated for learning and adaptation, higher performance, self-organization, auto assembly.
CPSs, like all information and communication systems, are chosen according to certain fundamental properties (Sanislav & Miclea, 2012):
- 1. Functionality
- 2. Performance
- 3. Dependability and security
- 4. Cost.
Properties that affect dependability and security are the following (Sanislav & Miclea, 2012):
- 1. Input and feedback from/to the physical environment—secured communication channels.
- 2. Management and distributed control—a federated approach.
- 3. Real-time performance requirements.
- 4. Large geographical distribution without physical security components in various locations.
- 5. Very large-scale control systems (system of systems (SoS)).
- 11.1.2.4 Grand Challenges and Vision of CPS
The core science and technology required to support the CPS vision are essential for future economic competitiveness. Creating the scientific and technological basis for CPS can pay dividends across a wide variety of application domains, resulting in unprecedented breakthroughs in science and engineering. Groundbreaking innovations will occur because of the pervasive utility of the technology resulting in major societal and economic gains. Some possibilities are the following (Rajkumar, Lee, Sha, & Stankovic, 2010): [1]
- • Physical critical infrastructure that calls for preventive maintenance (PvM)
- • Self-correcting CPSs for “one-off” applications.
- 11.1.2.5 System Features of CPS
CPSs are a result of the emergence of faster computer processors, the miniaturization of electronic components, broader communication bandwidths, and the seamless integration of networked computing with everyday systems. They blend physical technologies, software and middleware technologies, and cyber technologies. Future systems will make more extensive use of synergic technologies, which integrate hardware and cyber technologies. Physical technologies enable the implementation of artifacts that can be recognized, located, operated, and/or controlled in the physical world. Cyber technologies are used for capturing, analyzing, and processing sensed signals and data produced in the physical world for decision-making. Synergic technologies enable not only a borderless interoperation between physical and cyber elements but also a holistic operation of the whole system. The design of the physical and computational aspects is becoming an integrated activity.
CPSs link the physical world with the cyber world through the use of multiple sensor and actuator networks integrated in an intelligent decision system. In other words, CPSs combine sensing and actuation with computation, networking, reasoning, decision-making, and the supervision of physical processes.
Low- and high-end implementations of CPS can be distinguished based on the extensiveness and sophistication of the resultant integrity. Low-end implementations are linearly complex, closed architecture, distributed and networked, sensing and reasoning enabled, smart and proactive (often embedded and feedback controlled) collaborative systems. High-end implementations are nonlinearly complex, open and decentralized, heterogeneous and multi-scale, intelligent and partly autonomous, and selflearning and context-aware systems.
The systems belonging to the latter class display organization without any predefined organizing principle and change their functionality, structure, and behavior by self-learning, self-adaption, or self- evolving. Complicated cyber-physical systems (C-CPSs) are low-end implementations because they are not supposed to change their functionality or architecture but to optimize their behavior, for instance, energy efficiency (e.g., due to the necessity to operate during an extended period of time), while operating under dynamically changing operating conditions or unforeseen circumstances. Some of these systems should operate in real-time applications and provide a precisely timed behavior; they should also achieve a synergic interaction between the physical and the cyber worlds by integrating computational and physical processes.
The cyber and physical parts of the systems are interconnected and affect each other through information flows. Due to this functional synergy, the overall system performance is of higher value than the total of the individual components. This synergy is particularly important for high-end CPSs, which exhibit properties such as self-organization. In general, CPSs strive towards a natural human- machine interaction that extends to the human cognitive domain. These kinds of systems are also capable of exhibiting extensive remote collaboration. Unlike linear complex systems (LCSs), CPSs work on non-dedicated networks. CPSs are often connected in a hierarchical manner, as systems of systems, in which one system monitors, coordinates, controls, and integrates the operation of other systems. For this reason, they can be considered multidimensional complex systems. Based on their functionality and characteristics, high-end CPSs can be used in areas such as transportation, healthcare, and manufacturing.
Some CPSs are mission-critical systems (MCSs) because their correct functioning is critical to ensuring the success of a mission, provisioning an essential supply, or safeguarding security and wellbeing. These are the systems that ensure proper and continuous operation of, for example, nuclear plants, automated robot control systems, and automatic landing systems for aircraft. Any failure in MCSs can lead to loss of human life and damage to the environment and may cause losses in terms of supply and cost. However, their operation is always characterized by the presence of uncertainty.
This introduces challenges from the point of view of the dependability, maintenance, and repair of mission-critical nonlinear CPSs. In the long run, it is crucial to comprehensively analyze what the maintenance of these systems theoretically, methodologically, and practically means and how it can be implemented in different systems (Ruiz-Arenas, Horvath, Mejia-Gutierrez, & Opiyo, 2014).
11.1.2.6 Application Domains of CPS
CPSs present a set of advantages: they are efficient and safe systems; they allow individual entities to work together in order to form complex systems with new capacities. Cyber-physical technology can be applied in a wide range of domains, offering numerous opportunities: critical infrastructure control, safe and efficient transport, alternative energy, environmental control, telepresence, medical devices and integrated systems, telemedicine, assisted living, social networking and gaming, manufacturing, agriculture (Huang, 2008; Lee, 2008). Critical infrastructure, i.e., assets that are essential for the functioning of a society and economy, includes facilities for: water supply (storage, treatment, transport and distribution, waste water), electricity generation, transmission and distribution, gas production, transport and distribution, oil and oil products production, transport and distribution, and telecommunication (Sanislav & Miclea, 2012).
Wan, Man, and Hughes (20Ю) listed some requirements that CPSs should meet according to the business sectors where they will be used, i.e., automotive, environment monitoring/protection, aviation and defense, critical infrastructure, healthcare (see Table 11.1). The physical platforms, supporting CPSs provide the following five capabilities: computing, communication, precise control, remote cooperation, and autonomy (Sanislav & Miclea, 2012).
Unlike traditional embedded systems, CPSs interface directly with the physical world, making the detection of environmental changes and system behavior adaptation the key challenges in their design (Sanislav & Miclea, 2012).
11.1.2.7 The Past, Present, and Future of CPS
A CPS is an orchestration of computers and physical systems. Embedded computers monitor and control physical processes, usually with feedback loops, where physical processes affect computations and vice versa.
Applications of CPS include automotive systems, manufacturing, medical devices, military systems, assisted living, traffic control and safety, process control, power generation and distribution,
TABLE 11.1
CPS Characteristics and Application Domains (Sanislav & Miclea, 2012)
Application Domain |
CPSs Characteristics |
Automotive |
CPSs for the automotive industry require high computing power, due to complex traffic control algorithms that calculate, for example, the best route according to traffic situation. |
Environment |
CPSs for environment monitoring, distributed in a wide and varied geographical area (forests, rivers, mountains), must operate without human intervention for long periods with minimal energy consumption. In such an environment, the accurate and in-time data collection provided by the ad hoc network with low power consumption, represents a real research challenge. |
Aviation, defense |
CPSs for aviation and defense require precise control, high security, and high-power computing. The development of the security protocols will be the main research challenge. |
Critical infrastructure |
CPSs for energy control, water resources management, etc. require a precise and reliable control, leading to application software methodologies to ensure the quality of the software. |
Healthcare |
CPSs for healthcare and medical equipment require a new generation of analysis, synthesis, and integration technologies, leading to the development and application of interoperability algorithms. |
energy conservation, HVAC, aircraft, instrumentation, water management systems, trains, physical security (access control and monitoring), asset management, and distributed robotics (telepresence, telemedicine).
As an intellectual challenge, CPS is about the intersection, not the union, of the physical and the cyber. It combines engineering models and methods from mechanical, environmental, civil, electrical, biomedical, chemical, aeronautical, and industrial engineering with the models and methods of computer science. These models and methods do not combine easily. Thus, CPS constitutes a new discipline that demands its own models and methods.
The term “cyber-physical systems” emerged around 2006, when it was coined by Helen Gill at the National Science Foundation in the United States. The related term “cyberspace” is attributed to William Gibson, who used it in the novel Neuromancer, but the roots of the term CPS are older and deeper. It would be more accurate to view the terms “cyberspace” and “cyber-physical systems” as stemming from the same root, “cybernetics,” coined by Norbert Wiener, an American mathematician who had a huge impact on the development of control systems theory. During World War II, Wiener pioneered technology for the automatic aiming and firing of anti-aircraft guns. Although the mechanisms he used did not involve digital computers, the principles involved are similar to those used today in computer-based feedback control systems. His control logic was effectively a computation, albeit one carried out with analog circuits and mechanical parts, and, therefore, cybernetics is the conjunction of physical processes, computation, and communication. Wiener derived the term from the Greek Ko(kpw)Ti]c (kybernetes), meaning helmsman, governor, pilot, or rudder.
The term CPS is sometimes confused with “cybersecurity” which concerns the confidentiality, integrity, and availability of data and has no intrinsic connection with physical processes. The term “cybersecurity” is about the security of cyberspace and only indirectly connected to cybernetics. CPS certainly involves many challenging security and privacy concerns, but these are by no means the only concerns.
CPS connects strongly to the currently popular terms IoT, Industry 4.0, the Industrial Internet, Machine-to-Machine (M2M), the Internet of Everything, TSensors (trillion sensors), and the fog (like the cloud, but closer to the ground). All of these reflect a vision of a technology that deeply connects our physical world with our information world. The term “CPS” is more foundational and durable than all of these, however, because it does not directly reference either implementation approaches (e.g., the “Internet” in IoT) or particular applications (e.g., “Industry” in Industry 4.0). It focuses instead on the fundamental intellectual problem of conjoining the engineering traditions of the cyber and the physical worlds. We can talk about a “cyber-physical systems theory” in a manner similar to “linear systems theory.” Like linear systems theory, a CPS theory is all about models. Models play a central role in all scientific and engineering disciplines. However, since CPS conjoins distinct disciplines, which models should be used? Unfortunately, models that prevail in these distinct disciplines do not combine well (Lee, 2015).
- [1] Blackout-free electricity generation and distribution • Extreme yield agriculture • Safe and rapid evacuation in response to natural or man-made disasters • Perpetual life assistants for busy, senior/disabled people • Location-independent access to world-class medicine • Near-zero automotive traffic fatalities, minimal injuries, and significantly reduced trafficcongestion and delays • Reduced testing and integration time and costs of complex CPS systems (e.g., avionics) by oneto two orders of magnitude • Energy-aware buildings and cities