First Steps in Analytics Process Design
Before going into the practice of analytics, there are some steps we need to keep in mind in order to perform the tasks that are mentioned in the previous sections. This section focuses on how important it is to present things in a way that will allow people to understand ideas and feel comfortable with visual and easy-to-understand results. The three key steps in control of the analytics process will be validation of assumptions, testing models, and contextualizing the results (Figure 1.16). The first step is to decide on a methodology in order to work with data and to mine data in order to discover knowledge.
In general, we can find two main options for describing a data-mining process: The CRISP methodology and the SEMMA methodology. In this book, the analytics process is described as a process that contains the data-mining process and some other actions. In the analytics process, there is a combination of KM processes and data-mining ones. In the analytics process, we have, at the same time, data to mine and data to gather. We need to design experiments and to provide meaning to the knowledge discovery from data. The cross industrial standard process for data mining (CRISP-DM) methodology comprises six steps:
- 1. Business understanding
- 2. Data understanding
- 3. Data preparation
- 4. Modeling
- 5. Evaluation
- 6. Deployment
The second methodology mentioned is SEMMA from SAS, which is defined by the search of the sample definition, exploration, modification, modeling, and assessment. This methodology is mainly derived from SAS, the statistics software. In general, the analytics process and the way to develop it in the direction of building an analytics KMS requires permanent feedback (Figure 1.16) and to design the appropriate methodologies, techniques, and tools to perform the process.
This means, at this level, it is important to clarify what we mean about these three concepts:
Methodology: This is the set of steps that, in a sequence, will connect techniques or methods and models or tools in the problem or system solution or design. This is the way to achieve the analytics KMS and to implement the analytics process.
We refer to methods and techniques as processes (subprocesses) that we need to follow and to apply to the analytics process in order to obtain answers and to generate actions.
At the same time, models and tools are the vehicles to create meaning from data. These models and tools will be mainly the systems, computing, mathematics, and statistics-based capabilities that we use for solving problems and supporting decisions.
In general, we could say that we need the analytics process defined in Figures 1.17 and 1.18. For following this process, we need methodologies to perform some of the
Proposed analytics process
? Prediction - Description - Diagnosis - Prescription - Controlling - Embedding
Figure 1.18 Analytics process summary.
actions; at the same time, we need some methods and techniques in each action and models and tools to create the solution.
In this book, the suggested analytics process requires the combination of problem solving, data management steps, and the KM process. The analytics process comprises the following (see Figure 1.18):
1. Problem definition, delimitation, definition of scope through the needs of the business. At this level, it is important to review some possible problem categories of description, visualization, forecasting, classification, optimization, and simulation. This will be part of the connection with Step 3. It is common that, from the beginning, the possible models to solve have not been considered, and later, when data has been gathered and organized, some data issues emerge. This stage has the following aspects to work on:
a. Answer the questions: What is the company trying to accomplish? How is the corporate performance measured? What are the needs of data to support the measurement system? What are the analytics steps, and how can the analytics projects support the planning and control of strategy design and implementation?
b. Develop a conceptual model to describe the ideal solution and the blocks that are required for getting to that solution. Identify the blueprint to build the solution by blocks.
c. Start from the possible solutions of the problem to solve—that is, assuming a potential solution and identifying the relationships to the organization’s strategic and tactical metrics. Review the measurement process (in case the metrics are not well defined), identify the metrics that are a priority, and go backward to identify the data that is required for the intermediate steps.
d. Build the first model starting with exploratory data analysis (EDA) and Visualization. Review what is needed in order to have a good model and look at issues with data and with models.
2. Managing the data. Participate in the data-fixing process, data gathering, and data architecture design. However, the methods are not straightforward, and they need to consider extra actions, such as the following:
a. Data transformation. In many cases, continuous data has to be transformed into categorical data. The transformation can be based on statistics or based on the understanding of the problem or variable.
b. Variable creation. Many variables can be created from the raw data (original data). For example, there can be the case of a metric or the relationship of two raw data variables. If, for instance, we want to measure driving behaviors, and we have data about pushing a brake or acceleration pedal, we can define a variable that could say something about driving behaviors.
3. Managing the models: This process is based on thinking the approach deterministic or stochastic and using models of multiple shapes, algebraic, statistical, machine learning-based, etc. The steps cover not only the creation, but also the testing of the models and their assumptions. In particular,
a. Assumption validation. The analytics process has to be connected to the context, and the assumptions of the modeling process need to be well managed. We have to keep in mind the creation of models that are under the principle of simpler and understandable better than extremely complex and not easy to digest.
b. Modeling, testing, revising the risk model, prototyping.
c. Delivery of partial and final results in each step.
d. Validation and feedback of results. Permanent feedback from each step; this is a back-and-forth process. The agile approach is highly recommended in the analytics process.
e. Creation of a time series of the metrics in order to perform further analysis when the same experiment or business problem solution is repeated.
- 4. Developing understanding and meaning. Review of the context and knowledge domain. This refers to the work of creating meaning that is not only based on the results interpretation side, but also on the initiation of the data selected and the model construction.
- 5. Knowledge sharing and transfer. Analytics work is, in most of the cases, multidisciplinary and in multiple areas of organizations. The analytics process cannot finish when the results are obtained. The results need to create actions. The results need to be embedded in the organization through the adoption of them in business processes. Frame the work of the analytics process under the view of creating an analytics KMS. In principle, we should start, if the time allows it, the same as in the systems analysis and design, prototyping and showing partial or functional results.
- 6. Application, actions, and business processes under a permanent plan and act. The analytics process requires a very strong and permanent back-and-forth flow of teaching and learning. Contextualization of the results is crucial. The workflow improvement and innovation are part of the objectives in analytics knowledge creation.
The analytics process should be considered as ongoing innovation. Analytics creates new connections—like new neural pathways between previously unconnected dots in the organizational setting—permanently reconnecting data and people’s minds with the organization’s problems. Innovation based on analytics means many things:
- ? From self-driven cars to an automated system for customer service
- ? From crime prevention to the development of better quality of life
The analytics process is part of developing more analytics thinking than analytics operations only. The reason is that analytics is not only a set of tools, but also and mainly a way to think about and develop solutions in organizations. This analytics thinking within organizations has the creation of solutions as the priority—the study of problems in the whole spectrum. No problem is the same—different conditions, settings, data, challenges, and tools. But analytics thinking keeps the same approach:
- ? Systematic
- ? Rigorous
- ? Inductive-deductive
- ? Testing and prototyping
The analytics process needs the KM processes because we need to develop permanent analytics learning and teaching processes in organizations—not only to learn and solve, but also to be part of the solution implementation, helping others to understand what we do, our methods, and our tools. We do not need to reinvent the wheel every time. There is no sense if other people cannot understand what we do. The principle will be to be good at managing economies of scope. The problem is to do more with what we have. We don’t need more people with access to Excel’s powerful features still using it like Visicalc from 30+ years ago. In many cases, we do not need, for now, more models or more computational capacity or frameworks. We need more people exploiting the great analytics arsenal that we already have.
The implementation of the analytics process has some difficulty in design and implementation. Possibly, one is the management of expectations. The reason is that when a management movement is in place, a “magic” touch is expected for solving various problems. Most difficulties are human in origin, for instance, identification of problems and intelligence to solve them.
They arise from the lack of human capabilities to transform data into knowledge, understanding, and action. It is appropriate to think about this quote: “Any fool can know. The point is to understand” —Albert Einstein. The difficulties come as a result of trends and, putting it in simple terms, fashions, which, because of wrong understanding, are converted into paying more for what you do not need.
At the same time, difficulties arise as a result of ignorance. This concept is highly interesting. Ignorance is a lack of knowledge, keeping the organization under uncertainty in many areas. This is different from risk management practice, in which variation of results is a focus to develop better management practices and to control risk based on knowledge creation. Ignorance comes not only because of lack of knowledge in dealing with problems to solve data and models, but also because of interpretation. The danger is when the organization is not doing something to improve and reduce ignorance or worse as Dilbert (Adams 2000) points out: “When did ignorance become a point of view?”
Another way to observe the issues in the analytics process adaptation is because of bias and use of no representative data, lack of appropriate steps to improve integration of data, plus reduced alignment among the data-problem-model-understanding of context. This reduced alignment could lead to thinking about Big Data as always better for solving problems or, in particular, some problems for which solutions are not related to the volume of data but in the understanding of the problem itself and possible solutions in the knowledge domain.
In summary, possibly it is a good practice to start the analytics process correctly through the development of the art of visualizing data. The main reason for this is that from this visualization process many hypotheses can be formulated and ideas about potential needs in the analytics process can be identified. Visual and report- related activities need to be analyzed within a bigger context or under the influence of more variables. This means there is a need to start with bivariate analysis and to move to multivariate analysis.
In this chapter, the presentation has been based on the analytics process: the roots and how data and knowledge are related to the analytics process. I have presented some illustration of the type of analytics work and people performing this work as well as the first steps in analytics: plan and data visualization that can move to present the analytics process structure based on the modeling process in risk management in order to prepare us for the second part of the case reviews. In the following chapter, the discussion is about the commonalities of the analytics process and mathematical thinking using the illustration of problems and solutions in risk management.