My son and I have learned analytics across two generations. I think I belong to the second generation of computer scientists, as I grew up with personal computers and did not invest any of my time in the IBM 360-class machines. My son grew up with the Internet and the open- source community. As you can well imagine, he has typical Silicon Valley gut reactions, while I am still trying to weigh-in with my DNA from corporate America. We had an interesting discussion regarding his algorithms. I suggested that he should file a patent for them so that he can protect his intellectual properties. However, he was more interested in contributing them to an open-source community, so he could get others to work with him.

Although open-source software sources have been around since much of the 1980s and 1990s, their popularity grew exponentially with the World Wide Web. The “open-source” label was created at a strategy session held on February 3, 1998, in Palo Alto, California, shortly after the announcement of the release of the Netscape source code. The strategy session grew from the realization that the attention around the Netscape announcement had created an opportunity to educate and advocate for the superiority of an open development process. The conferees believed that the pragmatic, business-case grounds that had motivated Netscape to release their code illustrated a valuable way to engage with potential software users and developers, and convince them to create and improve source code by active participation in a community. The conferees also believed that it would be useful to have a single label that identified this approach and distinguished it from the philosophically and politically focused label “free software.” Brainstorming for this new label eventually converged on the term “open source,” originally suggested by Christine Peterson, co-founder of Foresight Institute.13

The Apache site owes its genesis to the HTTP daemon. In February 1995, the most popular server software on the Web was the public domain HTTP daemon, developed by Rob McCool at the National Center for Supercomputing Applications (NCSA), University of Illinois, Urbana-Champaign. However, development of that httpd stalled after McCool left NCSA in mid-1994, and many webmasters developed their own extensions and bug fixes that were in need of a common distribution. A small group of these webmasters, contacted via private email, gathered together for the purpose of coordinating their changes (in the form of “patches”). Brian Behlendorf and Cliff Skolnick put together a mailing list and shared information space and logins for the core developers on a machine, with bandwidth donated by HotWired.14 As of June 2013, Apache was estimated to serve 54.2 percent of all active websites and 53.3 percent of the top servers across all domains.15

Google released the Google File System paper in October 2003 and the MapReduce paper in December 2004, which attracted the attention of Doug Cutting and Mike Cafarella at the University of Washington, who were developing the Nutch, an open-source search engine. In 2006, Cutting went to work at Yahoo, which was equally impressed by the Google File System and MapReduce papers and wanted to build open-source technologies based on them. They spun out the storage and processing parts of Nutch to form Hadoop (named after Cutting’s son’s stuffed elephant) as an open-source Apache Software Foundation. However, although Yahoo was responsible for the vast majority of development during its formative years, Hadoop did not exist in a bubble inside Yahoo’s headquarters. It was a full-on Apache project that attracted users and contributors from around the world.16

Hadoop gained the attention and mindshare of a large number of developers globally who used the Apache server to share their ideas and code. Since the entire program is open source, it is now gathering a fair amount of momentum at corporate information technology (IT) organizations, and has risen as a serious competitive threat to the traditional software for data storage, integration, and analytics. It has attracted a large technical community, which is contributing to the development of software. In addition, the academic community has created a large number of online courses.

As much as data about marketers was crowdsourced by social media organizations, much of the storage and analytics techniques are also crowdsourced. It has been a fascinating experiment of public sharing, which has attracted market leaders, such as IBM, who are contributing and benefiting from this open-source technology development program.

Unlike a patent, which protects ideas but creates pockets of innovations requiring expensive integration (some of which are also patented), the open-source experimentation allows a developer to share initial ideas and use a community to improve the original idea. Each developer has access to the source code and can use it to create commercial products, which, through the very nature of their development, are far better integrated.

< Prev   CONTENTS   Source   Next >