Big Data Analytics

Sujni Paul

Higher Colleges of Technology


Characteristics of Big Data................................................................................124

Big data in Fighting COVID-19.......................................................................124

Big Data in Artificial Intelligence......................................................................126

Big Data in Social Media and Internet of Things...............................................128

Big Data in Customer Interactions....................................................................129

Big Data in Data Science...................................................................................131


Some slogans of famous big data scientist is given below to throw more light on big data.

“Big Data: The next frontier for innovation, competition, and productivity" (McKinsey Global Institute)

Characteristics of Big Data

Every year 2.5 quintillion bytes of data is produced. Big data are extremely large data sets that are so complex and unorganized. These large data sets are analyzed computationally to reveal patterns, trends related to human behavior and interactions. The three types of data are structured, semi-structured, and unstructured.

Structured data are organized and labeled into a formatted repository typically a database, so that its elements can be addressed for analysis that is more effective. Example is an Excel database.

Unstructured data are unknown forms, which are difficult to organize and very hard to classify. Typical examples of unstructured data are heterogeneous data source words in a text, emails, pictures, videos, and also the output produced by a Google search. Algorithms are used to identify this called Natural Language Processing. Semi-structured is a combination of the two. Sometimes it is seen as a structured form but it is actually not defined, for example, table definition in RDBMS, XML data. Also in Twitter the number of followers and the number of tweets are structured, whereas the content or images shared are unstructured (Figure 7-1).

Started up with the 5 Vs and now it is suggested to be a 7 V big data as described below (Figure 7-2).

Big data in Fighting COVID-19

Every big data has played a vital role in fighting COVID-19- Many countries used big data that helped them to reduce cross-infections, allowing the countries to get back to work and stabilize their economy. Big data was used to collect information

The 5 Vs of big data [1]

Figure 7.1 The 5 Vs of big data [1].

The 7 Vs of big data [2]

Figure 7.2 The 7 Vs of big data [2].

for different other types of health crises also earlier [3]. Algorithms were used to estimate the probability that any given individual has chances of getting COVID- 19 by matching a users mobile location to known infected hotspots. Different apps were used to see on a map the reported latest cases and give them the ability to avoid potential infection. In spite of different privacy issues, people tend to embrace big data because of its high-tech approach in dealing and fighting with COVID-19- It makes them understand the affected areas and necessity steps to be taken to slow down the spread of the infection (Figure 7-3).

#Global spread on the world map using Folium # creating world map using Map class

world_map = folium.Map(location=[11,0], tiles="cartodbpositron", zoom_start=2, max_zoom = 6, min_zoom = 2)

# iterate over all the rows of confirmed_df to get the lat/long for i in range(0,len(confirmed_df)): folium.Circle( location=[confirmed_df.iloc[i] [1lat1], confirmed_df.iloc[i] ['long']], fill=True,


1.00001)))+0.2)*50000, color='red', fill_color='indigo',). add_to(world_map) world_map

Simple data analysis done in Python with the big data [4]

Figure 7.3 Simple data analysis done in Python with the big data [4].

Big Data in Artificial Intelligence

Ten years down the line, we were deep in the big data revolution when the volume, velocity, and variety of data completely overwhelmed the systems used to store, manipulate, and analyze that data. Now were in the midst of an artificial intelligence (AI) revolution, but it’s important to remember that big data hasn’t gone away. Instead, big data has become the new normal. It’s everywhere and in fact, it’s only the big data that makes AI possible. When people think about advanced data work these days, the mind is immediately drawn toward AI and machine learning the way that the mind was automatically drawn towards big data a few years ago. Hence AI is a field in computer science, and it’s the field that focuses on techniques that allow computers to do things typically done by humans like play a game of Go or classify photos or analyze MRIs, and all these are usually done in a way that shows adaptability to new circumstances.

Machine learning, on the other side, is a collection, a rubric of algorithms that can find patterns in data to predict outcomes. For example, whose face is this and is this the person that owns the phone? Machine learning improves over time as and when new data comes in and especially when there is more labeled data where you know what the particular outcome is. Also machine learning algorithms go from the relatively simple technique like a linear regression to the amazingly complex technique like a deep learning neural network. The connection between these fields can be understood clearly by looking at Google Trends search data over time, so this is showing us the relative popularity of searched terms over the last eight years from 2011 to 2019. The most important part is the one where the data scientist’s needs generate a demand for change in data architecture, because this is the part where big data projects fail. When algorithms are computationally expensive or when infrastructure is not ready for ML algorithms. For instance, lately big banks in Brazil are hiring mainframe specialists to deal with this issue [5].

Schematic view of Al, big data, machine learning [6]

Figure 7.4 Schematic view of Al, big data, machine learning [6].

But here’s what you need to know about the relationship between AI, machine learning, and big data. It shows that if AI and machine learning have thrived over the last few years, and they have, it is because they have stood on the shoulders of big data. So the relationship between these fields is in addition to that machine learning and AI are in addition to big data, not instead of it. They rely on big data; they can’t do their work without it (Figure 7-4).

To elaborate it little further, AI requires massive data; the theory of neural networks has existed in the last few years that there has finally been enough data to make this work well. And that has to do with the volume, the first V of big data. Second, the data is streaming constantly especially with social media data and sensor media data, it’s coming in tremendously, that’s the velocity of big data that makes so much of the machine learning possible. And then finally, AI and machine learning involve many other types of novel data like images, movies, audio, and so many other things that don’t fit into standard relational databases; that’s the variety of big data where AI has been so useful [7]. And so, AI and machine learning have thrived because of the contributions of big data; the volume, velocity, and variety have made extraordinary developments within these respective fields.


The intelligent Piece of Paper

I am a highly intelligent piece of paper. Let's play tic-tac-toe.

Turing Test

This activity aims to get students thinking critically about what makes humans intelligent, and how computer scientists are designing computers to act more like us.

Big Data in Social Media and Internet of Things

The data revolution is growing very fast leading to a barely controlled explosion. The first major cause for this explosive growth is social media, and the second one is the Internet ofThings. Though there are many others, these really play a starring role in the extraordinary growth of the big data world. To understand this more we have the uroboros, the snake that’s feeding on itself which is just the same as social media because social media causes or begets more social media. Let’s imagine a simple context, first a person puts post online, that post gets liked. Each like is an additional piece of data with a fair amount of metadata that goes with it. Another person puts an image online. That image gets tagged and it gets shared with anybody who’s in it, and it gets put into forms that use the same tags, or somebody has a follower, they put something online and those followers share that content and multiply it, the reach of the post. The fact that more and more people are online and more and more people have social media profiles, and so there’s this incredible growth. It’s not just that it’s doubling, it’s multiplying across all dimensions. It just goes absolutely through the roof with social media.

Now, another one is the Internet ofThings, that’s IoT, and the classical examples of IoT include things like smart homes. Let’s take an instance that I’ve got a smart thermostat that knows when we’re home, it talks to our phones. It can tell what’s going on. And maybe you’ve got lights and a security system and maybe you got a smart lock, and maybe you’ve got all sorts of other systems that are connected with each other. They’re communicating constantly. And outside of the home, you can have a smart grid, where the city knows about the traffic levels; it knows about how much electricity is going between the generators and the various buildings and houses. It knows what’s going on with the water system. There is so much information being exchanged here through sensors and networks which again leads to an explosive growth.

Another one that falls into this category is self-driving cars, which gather an extraordinary amount of data from the sensors they have all around them and they communicate to each other, and the time will soon come when they communicate directly with the road and with the traffic signals. There are so many different ways to gather all this data, and it gets even more complicated because the quality of the data has changed over time. You know, text comes first. When you send text messages, you started with your little flip phone and you sent your little, tiny text message, which actually was text. Text can be measured in kilobytes. But soon, for instance, you learn how to do audio and you can send a voice message or a sensor for your home can give you the audio about any information. That’s usually measured in megabytes. And then, eventually, you get to video. Say, for instance, you have a doorbell. It’s now giving you HD video over your phone of who’s there at the door. Video can be measured in gigabytes. And so, we’re not only increasing the number of things that provide the data, but we’re climbing high up the ladder to kinds, or qualities, of data that are much, much larger and also might move from the structured to the large and unstructured format, again, from text to audio to video, and what those correspond to are the three Vs of big data: volume, velocity, and variety. And so, the social media revolution, which is still going on very actively, and the IoT, which again is still just at the beginning, have contributed massively to the explosive growth of data that constitutes both the challenge and really makes the fertile ground for the promise of big data.

Big Data in Customer Interactions

In customer interaction, we see about what other people who are similar to them in some ways have done previously. You can do something like what have they searched for online, even when they weren’t exactly in your own e-commerce sight. When you put together these profiles and you’re trying to identify what a person is likely to buy, you need to be concerned about the accuracy of your predictions. If they’re likely to respond to an advertisement and buy something, and if they actually are in your target audience, make sure to show it to them. So when you have a real candidate, make sure it triggers the response and you show them the thing that they want to see and that they want to get. With big data using Datameer reduces the customer acquisition cost. For example, a company correlated data on customer purchase histories, customer profiles, and customer behavior collected from social media sites indicating their personal interests. This data was then correlated with transaction histories and data on things that customers “liked” on Facebook to identify hidden patterns. These patterns enabled management to see that a large percentage of their high-value customers regularly watch the Food Network and shop at Whole Foods [9] ■

To balance things, you can only show advertisements to people who are likely to respond and buy something. You’re wasting time and you’re wasting money if you are showing advertisements and doing these engagements with people who have no interest in what you’re doing. I mean, truthfully, anybody who done this and had an ad follow them around on the Internet, that’s a false positive where they think you’re a target and you’re not. That makes customers feel like they’re being stalked by some organization, and truthfully it’s a creepy feeling. So you don’t want to do that. You need to be more precise in your targeting and your categorization of potential customers.

So this is one of the main things that big data lets you do. The reason you can do that is because big data gives you more data sources. It’s not just the database of who purchased what but they visited the site here. Here you get the unstructured data. Maybe you get some sort of information about the other sites they visit, maybe about information that they make public through comments on websites or posts that they share with other people. You get a wide range of more data sources that you can’t analyze with normal methods. And by all these means when you have more data, you can identify things with greater precision, more accuracy. You can look at the edge cases and still have enough information to make a good decision. Also, you can be more sensitive to changes in a person’s interest, the things that they’ve already bought, and the things that they’ve said online, the things that are happening all around them, and you can adapt to those changes in a more agile manner.

Here you can use your big data algorithms and results in particular situations. Now there are a few major benefits of using big data in these ways. First, your customers are more likely to feel heard, because they know that you have seen what they’ve said about you. They know you’re paying attention to what they want. So feeling heard is mainly to be noted here. Then feeling understood. Not only are you monitoring what’s going on, it makes sense to you and you’re able to respond appropriately. Finally, you are also able to help your customers feel respected by offering them what they want in a manner that they want and not deluging them with other things that are irrelevant to them. So by using big data carefully in your customer interactions, your customers can feel heard, they can feel understood, and they can feel respected. And all together, those make you a company that they trust and a company that they are more likely to do business with in the future, which, after all, is your goal.

Big Data in Data Science

First, you must know the problem needed to solve. You may have a lot of data that you may think you will get some valuable insight from it. Hence for sure, patterns will emerge from those data.

The amount of digital data that exists is growing at a rapid rate, doubling every two years, and changing the way we live. An article by Forbes states that data is growing faster than ever before. By the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet, which makes it extremely important to know the basics of the field at least [10].

Data science deals with structured and unstructured data that has everything that is related to data cleaning, preparation, and analysis (Figure 7-5)-

There are a lot of data science tools to help in analysis. Windows Azure is a Spark and Hadoop service in the cloud. It offers good enterprise-grade security. Also Azure can be integrated with other productivity applications. It is very easy to deploy Hadoop in the cloud without purchasing new hardware or paying any other type of cost.

Data science vs. big data [10]

Figure 7.5 Data science vs. big data [10].







  • 1. Ravi Kiran, “Big Data Characteristics: Know the 5 Vs of Big Data,” August 20, 2019. Available at [Accessed 10-10-2020].
  • 2. Prathap Kudupu, Web Snippets, “7 Vs of Big Data,” December 2018. Available at http:// [Accessed 20-10-2020].
  • 3. Bernard Marr Influence^ “The Vital Role of Big Data in the Fight against COVID- 19” (Coronavirus) Published on April 19, 2020. Available at https://www.linkedin. com/pulse/vital-role-big-data-fight-against-covid-19-coronavirus-bernard-marr/ [Accessed 10-10-2020].
  • 4. Fred N. Kiwanuka, “Data Analysis in Python using Big data”, July 2020.
  • 5. Matthew Mayo, “KDnuggets Machine Learning with Big Data Is, in Many Ways, Different than “Regular” Machine Learning,” April 2020. Available at https:// www.kdnuggets.corn/2017/07/machine-learning-big-data-explained.html [Accessed 10-10-2020].
  • 6. Okiriza Wibisono, Hidayah Dhini Ari, Anggraini Widjanarti, Alvin Andhika Zulen and Bruno Tissotl, “The Use of Big Data Analytics and Artificial Intelligence in Central Banking,” 2019. Available at [Accessed 20-10-2020].
  • 7. Randy Bean, “How Big Data Is Empowering AI and Machine Learning at Scale?” May 08, 2017. Available at Empowering-AI-and-Machine-Learning-at-Scalepdf/ [Accessed 10-08-2020].
  • 8. Fun Activity on IoT. Available at with-iot [Accessed 20-10-2020].
  • 9. 5 Big Data Use Cases to Understand Your Customer Journey, Datameer, Customer Analytics E-book. Available at to-understand-your-customer-journey-customer-analytics-ebook.html [Accessed 10-10-2020].
  • 10. Avantika Monnappa, “Data Science vs. Big Data vs. Data Analytics,” June 12, 2020. Available at article [Accessed 20-10-2020].
< Prev   CONTENTS   Source   Next >