V. Digital technologies deep dive
Data, data management, data analytics, and data science technologies
Data literacy and proficiency is the unifying theme of the concepts discussed in this chapter. It refers to the ability to leverage the vast quantities of internal and external data to improve an organization’s efficiency, effectiveness, and agility.1 During the last 30 years, the data available to businesses have increased exponentially. Technology innovation resulted in even more information becoming available in a greater variety of formats (emails, web pages, social media, wikis, apps, etc.). This information is accessible through a greater variety of media and communication channels, resulting in an increasingly complex and rich information environment.2 Gartner expects that 80% of organizations have either rolled out internal data literacy initiatives to upskill their workforce or they intend to do so in the coming year. Reaping major rewards from data has become a critical organization issue, with data now being argued to be an even more important resource than oil.34 Used effectively, the large volumes of internal and external data being created every minute provide organizations with great opportunities for breakthroughs in how they organize, operate, manage talent, create value, and scale their reach.5 Effectively capturing, storing, organizing, integrating, protecting, analyzing, and making the most of their data requires an organization-wide team effort.1' There are limited benefits to managing data in silos or restricting its management to a few experts in a technical function within the organization.7 Given this, nontechnical stakeholders in every part of the organization also need to be literate and proficient with data.
It follows, then, that managers who supervise or oversee these stakeholders particularly need to be literate and proficient with data to ensure that data literacy and proficiency requirements are reflected in hiring, performance management, and firing decisions. But managers tend to avoid getting dragged into data issues, perhaps due to the technical terminology, the complex methodologies, the sheer scale of the data sets, and the lack of sufficient technical grounding in data management foundations.8 This makes it tempting to “leave it to the experts.”9 But this can be a major mistake, as data issues are now quintessential business issues for managers at all levels of the organization.1" It is managers who advocate, set the objectives for, and allocate resources for data management efforts. It is managers who determine how closely data analytics, data science, and business analytics specialists work with their team and other parts of the organization.11 It is managers who have the domain expertise critical to the development of data products to optimize their part of the business.12 It is managers who are the ultimate users or nonusers of data products and insights. It is managers who most need to understand what opportunities particular data, data sets, and data products offer, how to best develop them, how to remove barriers to their adoption, and how to make the most of them.1'
In this chapter, we provide an introduction to data, data management, and data issues from a managerial perspective. Our intention is to provide a starting point to enable managers to understand the value of data and data management, the key data/data management concepts and terminologies, the managers role in data management, and examples of common data management platforms and vendors.
Data as the new oil
As it relates to digital technologies, the term data refers to a collection of the smallest units of information that can be stored, processed, or transmitted by a computer (datum is the singular form of data). What constitutes data can range from numbers and letters to pictures, sounds, and videos. Although we only see the video, for example, within digital technologies, data are represented as a series of binary digits or bits. (To be more specific, we should talk about data that "are,” not data that "is," since data is the plural of datum,14 but data used as a singular is more common in speaking about data. We use a mixture of both approaches throughout the book to balance technical accuracy with ease of comprehension.) The term “information assets” can also be used instead of data, as it is seen to encompassing data, information, and knowledge. We use data throughout this book to improve understandability, although as we use it we are referring to data, information, and knowledge. Further, with regard to data being represented as a series of binary digits or bits, each binary digit is either a one or zero, so that at the most basic level, all data are a bunch of ones and zeros referred to as binary data. This enables it to be stored, processed, and transmitted by computers. Data can be stored on a physical or virtual computer (e.g. virtual machine) and can be transmitted between computers via a network connection. It can also be stored on a physical storage device (e.g. USB, hard drive) and manually transferred onto another storage device or computer.
In the early days of computing, usable data was limited to the few internal information systems or software applications within an organization (e.g. accounting systems, HR systems, procurement systems) as well as limited external statistical data. But over time, there has been a proliferation in the number of devices and software applications collecting data. These include millions of devices with sensors, cameras, and audio recording; as well as the digitization and facilitation of more and more business processes and workflows through software applications. In addition to this, many of these devices, processes, and workflows are increasingly connected to the internet and therefore able to interact with each other, with physical or virtual computers, and with people. This results in vast amounts of data being created, stored, and available to use every second. The opportunities for organizations that are able to effectively manage this data are almost unlimited. For example, organizations can format, integrate, organize, analyze, and leverage insights from this data as a source of vast revenues (e.g. Google and Facebook), as a source of operational and strategic intelligence, to drive product innovation, to enhance customer experience, to form and better manage strategic partnerships, to disrupt industry offerings, and much more. Given this proliferation in available data and
Data management, data analytics and data science 137 vast power possible from effectively leveraging it, some experts have contended that “data is the new oil" (i.e. that data may be an even greater source of global power and prosperity than oil).15,16 Unverified data refers to data that managers are not sure is true or untrue. Trusting such data and making decisions based on it can be highly dangerous. For example, imagine having unverified data about customer preferences and investing in capabilities to satisfy those preferences, only to discover after the investment that the data was inaccurate and the preferences were completely wrong.