DATA SCIENTISTS—WHERE DO THEY BELONG?
A marketer collects the data from all the IT trash boxes using “bit buckets” all over the organization, synthesizes it with external data, using a set of machine-learning algorithms, and then the result is a well-organized understanding of the customers, the products, and the customer interfaces. It sounds like magic! Fortunately, there is a human face to this magic, which makes sense out of all this data. It is termed the “data scientist.”
As social media and big data companies went after their initial public offerings, media stories catapulted the importance of the data scientist job and the acute shortage of these workers. The data scientist grew from the unappreciated nerd in the back room to a business strategist, a quantitative genius who could consume data for lunch and dinner, and make sense of it. John Whittaker, in his blog at Dell, describes the hype about the data scientist in terms of its similarity to the webmaster in the early Internet days.
Just as there is great demand today for someone to guide companies through Big Data decisions, I recall when the No. 1 job was the almighty webmaster—the person who could ease the transitions to ecommerce and ensure the success of Internet infrastructure projects. Business leaders, in a desperate attempt to gain value that was promised by connecting their organizations to the web, paid handsomely for a webmaster with experience to get them there. Today, the same thing is occurring with the Data Scientist role. Again, a new class of technology has emerged with incredible promise and a boatload of complexities.14
According to a report published by McKinsey, there is a problem. “A significant constraint on realizing value from Big Data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from Big Data" the report said. “There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions"15
So, who is this mythical data scientist and how does he/she differ from the business analysts and statisticians currently employed by the marketing departments? Successful data scientists bring a triangulation of skills that are not easily blended together—an ability to understand business problems and strategy, a knack for numbers and statistical or qualitative analytics, and a dexterity in dealing with big data tools and techniques. As I watch my colleague, Tommy Eunice, who has done much of my location analytics number crunching described in chapter 3, I am amazed at how he brings all three skills to the table in a single conversation. I realized the enormity of the skill gap in my last round of recruiting for IBM. While there were a sizable number of people who claimed to be data scientists, most of them were software engineers who lacked business problem-solving skills. I have also come across many business analysts in the course of my work who understand marketing, but are too afraid to embrace quantitative techniques or new information technologies. So, how do we clone Eunice, who has all the three? Or, do we form a team with complementary skills?
To divide and conquer, we must differentiate between data scientists and data engineers. A data engineer builds the data pipes from which the data can be collected from a variety of sources and transformed so that it can be collectively analyzed. A data scientist works on the data lake and discovers insights. However, most people do a combination of these two jobs. A data engineer is typically an IT person who may report either to the IT or the marketing department, and who has strong skills in data integration. A data scientist is more than likely a consultant or a marketing department employee who has spent a fair amount of time learning machine-learning or statistics and is able to tear through massive heaps of data to find the needles in the haystacks.
The data engineer works with big data integration tools. A number of tools have been contributed to the Apache site for data sourcing, real-time data analytics, and data reorganization. In addition, big data vendors have introduced a number of proprietary tools, which work well with the open-sourced components, but offer the necessary secret sauce to realize the integrated architecture. The data engineer also works with internal and external data feeds, and has a good understanding of how these feeds can be used to identify and merge records based on selected identifiers and how the data quality can be improved. Unlike the structured business intelligence applications, especially in finance and revenue, the big data sources may be comparatively messy in their data quality. It is important for the data engineer to maintain a delicate balance between quality and latency, sometime some inaccuracy in targeting a customer may be tolerable as long it meets the latency criteria
Is the data scientist the same as the statistician? Today’s big data is often unstructured and lacks the formal statistical disciplines. A data scientist must carry a fair amount of an exploratory mindset and a machine-learning background to work with unstructured data in order to glean useful structures from it. As compared to statistics, the unstructured data analytics is a relatively less understood area. The data scientist offers a blend of business and quantification skills, but may have, additionally, skills in qualitative algebra. Often, it is hard to convert unstructured data to a quantified set for statistical analysis. Techniques like graph theory deal with how masses collaborate, and it is important for the data scientist to know how to combine a variety of techniques to seek patterns from data. In my discussions with successful data scientists, I have found that most of them did not have computer science degrees. People with a functional education-engineering, business, or liberal arts, with a big dose of business experience, were more likely to be successful data scientists.
A data scientist armed with good insight can easily earn a fair amount of respect from senior management. After all, we have been data starved for decades. The data scientist brings bottoms-up insight by using real data. The knowledge of marketing function and customers is an important prerequisite in developing meaningful insight. However, marketers do require a data- and analytics-driven culture to appreciate and cultivate the data science.
“The enterprises that will achieve a competitive edge and win will have a blend of a healthy data-science culture, enterprising data scientists who can bend the ear of C-level decision makers, and the right combination of technology that will surface the data that make sense in the context of the business" says Anjul Bhambhri, vice president of development for big data projects at IBM.16
To help educate the community, a number of universities and big data companies are offering educational and training programs. Coursera, started by two Stanford professors offers a series of data science courses with contributions from University of Washington.17 In addition, a number of meetups are emerging in different cities, which facilitate idea sharing among budding data scientists. As much as big data has created the demand, the community is rising to the challenge through unprecedented collaboration.