Integration to Corporate Analytics Systems
The time-series and event-framed data available in the EIDI is extracted to machine tools, business intelligence tools, or a data lake as shown in Figure 10.3. The team has been looking at business tools, such as MATLAB,
Integrating EIDI time-series data with big data analytics.
Python, Tableau, Microsoft's Power BI, and Amazon's QuickSight. They also use Microsoft's Azure Machine Learning and Amazon Machine Learning. They have worked on a prototype using Microsoft's analysis tools via EIDI Open Database Connectivity (ODBC). Their EIDI offers a standard integration tool that simplifies integration with offline analytics tools.
Using a Real-Time Data Infrastructure versus a Data Lake Approach
Many IT departments of major energy, process, and manufacturing companies use a real-time data infrastructure or EIDI at the plant operations and engineering levels. Typically, the sensor data, either control system- based or separate Internet of things (IoT) sensors, are stored at its original resolution (excluding a noise filter) into a permanent historical data archive. Additionally, the sensor data is often transformed into more valuable inferred data values, summarized roll-ups, totalized data, and key performance metrics via EIDI system online analytic calculations.
The EIDI Asset Framework (AF) is a database model that hierarchically represents the customer's physical assets in a metadata layer. The AF database includes the sensor data points used to comprise an asset along with run-time asset analytics for derived/calculated data points and unit attributes and provides real-time notifications when unexpected conditions occur.
EIDI system users can also classify specific run times as event frames, which time-slice the data into meaningful periods of time (e.g., batches, lots, startups, shifts). These event frames further contextualize the data for analysis against similar event types.
As such, the raw sensor data input to the EIDI system goes through several iterations and transforms basic sensor data to highly contextualized asset and event relative information, suitable for input to artificial intelligence (AI) and predictive analytics. A user selects which data or event to extract. This contextualized data is prepared and converted into a published data set, ready for input into data lakes, big data analytics, advanced data visualization systems, and predictive analytics, such as ML.
An alternate method of providing data to big data and predictive analytic systems is to stream real-time and relational data directly into a data lake for subsequent use by analytics software. This approach collects and stores data, often in its natural format, into a data warehouse or cloud-based storage, known as a data lake. Subsequently, data scientists, IT personnel, engineers, business people, or consultants clean and prepare the data so that it can be input to their analytics. Which approach to use depends on the following criteria:
- • What groups are managing the contextualizing and curating the data, that is, are they ОТ or IT/data scientists? If operations, production, and engineering personnel (ОТ) are extracting, verifying, and cleansing the data, it seems to make sense to use an EIDI system in production environments where there can be tremendous payback in terms of decreasing production costs, maintaining an asset fleet, situational real-time awareness, quality improvement, and unified reporting in the production environment.
- • If the data is simply collected for analytic purposes, the data lake may suffice. However, because the data is raw and uncontextualized, this requires personnel with high levels of specific domain knowledge to
- • Determine which data is relevant for analytic usage,
- • Clean and confirm the data is accurate,
- • Put the data into its proper context, and
- • Format the data so that it is easily consumable by the analytic systems.
In the long run, this approach may be costly and relies on personnel being extremely familiar with the data, as industrial Internet of things (IIoT) sensor data is not validated in the same way that a traditional control system (distributed control system [DCS], programmable logic controller [PLC], supervisory control and data acquisition [SCADA]), together with a PI System, would. As such, this approach requires substantially more efforts in the following areas:
- • Determining what the data actually represents
- • Validating the data is accurate; if not, painstakingly cleansing the data
- • Determining which data should be discarded as outlying data, meaning data that does not accurately represent what you are trying to analyze or model (e.g., idle equipment, data collected when making out-of-spec product, or transitioning between products)
- • Filling in missing data gaps
- • Formatting and publishing large cleansed data sets so that the analytics system can easily ingest the data set
We have presented two approaches to corporate or cloud-based analytics. However, it's not an either/or decision. Both can be used in the following manner—the EIDI AF system provides a workbench of sorts for preparing, accessing, evaluating, and putting advanced analytics models into operation. Companies can integrate this AF-contextualized EIDI data into their data lakes for more broad analytics and reporting. This approach also prevents data scientists from having to cleanse, prepare, and remove a large amount of data in a Python environment.