Staging Area and Operational Data Stores
ETL processes transfer business source data from the operational systems (e.g., the accounting system) to a staging area, usually either raw and unprocessed or transformed by means of simple business rules. The staging area is a temporary storing facility in the area before the data warehouse (see Exhibit 5.1). Source systems use different types of formats on databases (e.g., relational databases such as Oracle, DB2,
SQL Server, MySQL, SAS, or flat text files). After extraction, data is converted to a format that the ETL tools can subsequently use to transform this data. In the staging area, data is typically arranged as flat files in a simple text format or in the preferred format of the data warehouse, which could be Oracle. Normally, new data extracts or rows will be added to tables in the staging area. The purpose is to accumulate the history of the base systems.
In the staging area, many subsequent complex ETL processes may be performed which, upon completion, are scheduled for processing with an operations management tool. The tables may be transformed hundreds of times on several levels before data is ready to leave for the actual data warehouse.
If the business needs to access data with only a few minutes' delay—for example, because the contents are risks calculated on the portfolio values of the bank—it may make sense to implement an operational data store (ODS). This will enable business users to access this data instantly. Typically, it will not be a requirement that data in a data warehouse be accessible for business analyses until the following day, even though the trend of the future is real-time information. Pervasive BA, as we've mentioned earlier, requires real-time data from the data warehouse. The ETL jobs that update rows in a data warehouse and in data marts will usually run overnight, and be ready with fresh data the next morning, when business users arrive for work. In some situations, however, instant access is required, in which case an ODS is needed.
In regard to digital processes like multichannel marketing systems and apps pulling operational data, the data will typically not be provided directly by the data warehouse, but from operational data platforms that manage the real time interaction with customers. Albeit with some delay, these interactions will be written to the data warehouse, just like these operational platforms, with some delay, will be populated by the data warehouse.