Business Predictive Analytics: Tools and Technologies
University of Nebraska at Omaha
Working with data, it is best practice to explore the data available before deciding on a specific technique or method to analyze that data. Similarly, it is essential to understand the tools available to the data analyst prior to diving into a project. Some tools or software packages are designed with beginning analysts in mind, while others require significant technical expertise. When choosing a tool, it is helpful to consider not only the goals of the project, but also the intended data consumer.
This chapter is designed to provide a high-level description of several prominent classes of tools in business analytics. Each description will contain the following:
- 1. Functionality: What can they do? Major capabilities of each tool class will be discussed here, emphasizing cases in which one class is typically more useful than the others.
- 2. Weaknesses: When should this tool be avoided? Are there barriers to use?
- 3- Intended audience: What type of audience would benefit most from the consumption of resulting data products? What level of technical expertise enables consumption of the product? Are there additional expectations when consuming the data product?
Each class of tool will be described generally in order to facilitate comparison between the types of tools and the needs of various products. The groups to be considered are:
- 1. Business Intelligence (BI) Software
- 2. Open-Source Analytics Tools 3- Proprietary Analytics Tools
Business Intelligence Software is typically designed to aid in the consumption of prepared data and does not require a significant technical training. On the other hand, Open-Source Analytics Tools and Proprietary Analytics Tools typically expect higher levels of technical proficiency and can be utilized to consume prepared data or to facilitate the preparation of data for consumption. The primary difference between the two is that Proprietary Analytics Tools are typically provided at a cost and provide support to users, while Open-Source Analytics Tools are most often free, but offer little support outside of documentation.
While this chapter will not provide coverage of all possible tools, it will endeavor to provide valuable information about some of the most prominent tools available. The end of this chapter will provide a case study for the use of the Microsoft Power platform.
- 1. Demonstrate knowledge of available data tools.
- 2. Identify the requirements of a specific business problem relative to selecting the appropriate tool for that problem.
- 3- Identify the appropriate tool for a business problem.
- 4. Familiarize the reader with the creation and use of data dashboards in Microsoft Power BI.
Business Intelligence (BI) Software
Business Intelligence (BI) Software is designed to accommodate the needs of firms and teams as they design reports focused on information designed to be part of the decision-making workflow. These software suites are designed to be responsive, dynamic, and accessible after only elementary training and familiarization with the software itself. While some familiarity with the software is important, BI software is designed to make understanding of the data and business needs more important than software training.
Three major players (among many) in this area are Tableau Desktop, Microsoft Power BI Desktop, and Googles Data Studio (Tableau, 2019; Microsoft, 2019; Google, 2019). While many other options exist, these three platforms highlight the capabilities of BI software to transform data into insight. Additionally, each is designed to work with data from nearly any source, and to process data focused on diverse business applications.
When using BI software, the user should expect functions to be focused primarily on the presentation of information through the available data. The tools provided should enable the user to visualize, quantify, and characterize information. This capability makes BI software valuable to users at all levels of data proficiency.
One of the most valuable characteristics of BI software is the ability to generate dynamic reports. Rather than creating static figures using data to be presented in a slide show, BI software instead encourages the user to combine visuals and statistics into a report-style format. Once the figures, tables, and values are arranged, the user can explore the data interactively by clicking on various elements in a visual or table in order to isolate that same group of observations in the other elements of the report.
A unique capability driven by interactive reporting is the possibility of addressing questions immediately when discussing the solution to business questions. When a question is asked about an outcome or specific demographic, the interactive report format allows the user to immediately drill down to address that question (provided that the report has been thoughtfully constructed). This capability can easily reduce the number of follow-up emails or meetings required to address specific concerns and is one of the most powerful capabilities of BI software suites.
BI tools are not designed to handle all of the many ways in which data may need to be used. Some of the use cases in which other tools should typically be used include:
- 1. Data cleaning: Most BI software expects the data to be processed prior to being imported. Currently, Tableau Desktop is only available as part of the Tableau Creator software suite. The other piece of software included is Tableau Prep, a program specifically designed for preparing and cleaning data prior to creating reports. Microsoft Power BI and Google Data Studio each expect that data will be cleaned prior to use, as well. In those cases, the user could certainly consider tools such as Microsoft Excel or Google Sheets as candidates for data cleaning software.
- 2. Modeling: BI software is designed to provide insights through carefully designed and presented reports. It is not designed to create powerful forecasting models or other robust statistical measures. This work is best done in other software although BI software often provides rudimentary trending and forecasting options.
While nearly any business or individual can benefit from the versatile reporting that is created by BI software, it is especially valuable for individuals who regularly need to provide reports to managers, or who frequently present material to non-technical audiences. By presenting visual reports that can be adapted dynamically, users can benefit from the unique and simple presentation style facilitated by BI software.
Another tremendous advantage is that almost no technical expertise is required of the user. While it is of course necessary to understand the context and scope of the data being used to create the report, very little training is needed for an individual to become a skilled user of BI software. Even individuals with little to no statistical background and no programming experience can successfully implement BI reports using any of the software suites mentioned above.
Open-Source Analytics Tools
Open-source tools are more diverse and varied than BI software. Some tools, such as Dash (a tool built to use Python) or Shiny (a tool implemented using R), are designed to function similar to BI software, and allow users to create valuable dashboards or reports that can be presented to diverse audiences (Plotly, 2019; RStudio, 2019; Python Software Foundation, 2019; The R Foundation, 2019). Some tools, such as TensorFlow or Scikit-learn, are designed to implement highly technical statistical models that can be used to forecast or predict based on new observations (Abadi et ah, 2015; Pedregosa et ah, 2011). Other tools are focused on data cleaning, data collection, or research-oriented statistical modeling.
The greatest advantage, then, of open-source software is the versatility that it offers. For nearly any analytics project, there are tools that can improve the efficiency and effectiveness of that project. For example, an open-source data analytics pipeline to understand the way that people feel about a product sold online (sentiment analysis) using tools built on the Python programming language might look like the following:
- 1. Use Scrapy (open-source library for scraping websites) to collect customer comments or reviews (Scrapinghub, 2019).
- 2. Use Pandas (open-source library for handling data sets) to build the structure of the data (McKinney, 2010).
- 3- Use NLTK or spaCy (both open-source libraries for processing text) to identify keywords and sentiment in comments/reviews (Bird, Klein, and Loper, 2009; Honnibal and Montani, 2017).
- 4. Use one of many open-source plotting tools to visualize trends.
- 5- Use Statsmodels or scikit-learn (open-source libraries for statistical analysis and prediction) to create statistical models (Seabold and Perktold, 2010; Pedregosa et ah, 2011).
- 6. Use Dash (open-source library for creating interactive web-based dashboards) to create a report on the results of the study (Plotly, 2019).
Open-source software is immensely flexible in functionality. There are available software options for nearly any need, ranging from operating systems allowing for the implementation of an analytics pipeline to libraries to implement a single statistical model or generate a single type of visual. Because these are tools that are developed by individuals and organizations, and are subsequently licensed for use in other contexts, the existence and availability of an open-source tool is based almost entirely on the existence of a business need and the willingness of the developer to collaborate with other individuals on the use and development of that tool.
It is difficult to scope the functionality of open-source tools, but it is reasonable to assume that if there is a specific functionality that is needed in more than a single use case, there is likely an open-source tool to accommodate that need.
The weaknesses of open-source software result directly from its greatest strengths. Open-source software is typically created and maintained by individuals or organizations who created the tool for their own use. The tools that these groups design expect much more of the end-user than software such as Tableau or Power BI. For example, libraries of code based on Python or R are nearly always designed with an expectation that the user be proficient in Python or R. Users who are not proficient in the programming language on which a tool is designed are simply left behind. For statistical models and libraries, individuals are also assumed to have sufficient statistical knowledge to implement the models without assistance.
Open-source software is created to solve specific problems. When those problems are resolved, many open-source projects cease to be maintained. Users are left to either maintain the project on their own or to move on to a newer project or library. This can be a serious drawback where a data project depends on stability and low-cost maintenance.
Interestingly, the strength of open-source software is also a weakness. It can become difficult to find the right library or software platform among the many options available. In order to find the right tool, a user needs to know what to look for. This problem is magnified when the user is not yet proficient in either programming or statistics, and therefore not prepared to search for technical terms that might reduce the number of search results.
While significant weaknesses exist within the open-source model, open-source software dominates data science. Firms and institutions ranging from Google and Facebook to government bodies and non-profits depend on these tools daily to drive decision-making processes.
The intended audience for Open-Source Analytics Tools is best described as technically skilled and willing to design a custom workflow tailored to specific needs. The user needs to be technically skilled because, unlike with BI software, it is typically necessary to build out the tools used on an open-source platform. For example, while Google shares the TensorFlow deep learning library under an open-source license, the user must create a learning model using a compatible programming language and integrate that model with data in order to gain any value from TensorFlow for his or her own projects.
Again, the user must also be willing to design the workflow. In BI software, reports can be generated by dragging and dropping elements onto a canvas. In open-source software, it is expected that users will have to design each element of the interface using code. Dash and Shiny provide very similar functionality to BI software, yet each require the user to programmatically describe the design of the dashboard or report.
Where BI software is like eating at a restaurant, using open-source software is comparable to cooking a gourmet meal at home. With open-source software, the “flavor” can be tailored precisely, but at the cost of greater effort in training and implementation.