The Emergence of Big Data
Today, large amounts of complex and heterogeneous digital data are generated daily. By some estimates, in a single day, 2.5 quintillion bytes (i.e., 2.5*10ls bytes) of data are produced. This is likely to increase as the number of devices capturing data and digital activities continue to increase. The estimated size of the digital universe—theamount of data created and copied each year—by 2020 is almost 44 zettabytes (i.e., 44*10-' bytes).1 While we may associate the availability of more data with more detailed analyses, it is also true that new skills are needed to organize, process, and analyze data that are orders of magnitude greater in size than was previously the case (Kitchin, 2013).
Definitions of big data vary. One definition, provided by Batty (2013) is that big data are related to quantities that do not fit into an Excel sheet (approximately 1 million). In 2012, a survey conducted by IBM revealed that most of the respondents consider data bigger than one terabyte as big data (Schroeck et al., 2012). Eric Scharnhorst, a data scientist at Redfin, believes that data is big if it cannot be stored on a single hard drive (Barkham et al., 2018). It should be noted, however, that storage capacities of several terabytes are quite common today on laptops as well as desktop computers.
Regarding the type of the data, in the big data discourse, two datasets of the same size might require different techniques of data mining and data management technologies. For instance, the size of one minute of ultra-high-definition (Ultra-HD) video might be the same as millions of the comments people post on Facebook and Twitter. Hence, these factors make it impractical to define a universal threshold for big data.
Planning organizations and academic researchers may categorize the main sources of big data differently. In planning, one of the best classifications is presented by Thakuriah, Tilahun, and Zellner in 2017. They divide the primary sources of urban big data into six categories: (1) sensor systems, (2) user-generated content, (3) administrative data in both open (e.g., data on transactions, taxes, and revenue) and confidential micro-data (e.g., data on employment, health, education) formats, (4) private sector transactions data (customer transactions data from store cards and business records), (5) data from arts and human collections (e.g., repositories of text, images, sound recordings, linguistic data, film, art, and material culture), and (6) hybrid data sources (e.g., linked surveysensor data and census-administrative records). These types provide an illustration of the wide-ranging categories of data that will be increasingly available to planners.
Sensors are an important source of data in planning that can be embedded in both inanimate objects (e.g., building structure, infrastructure) and animate objects and agents (e.g., cars, people, animals). In general, data generated from sensors will help planners get real-time situational awareness and do modifications and adjustments accordingly.
For instance, sensors in parking lots or on-street parking can expedite the process of finding a parking space and, as a result, reduce vehicle emissions. Real-time interaction between sensors shows the empty spaces, and the end user can access this data using applications developed for this purpose, like Parker, SpotHero and ParkMe. Smart street lamps in the city of Glasgow, Scotland, can adjust their brightness according to the number of people in the area. Internet-connected sensors in the trash containers in Barcelona, Spain, can detect how full the bins are and inform the trash truck drivers to only collect the ones that are full. Sensors are also an example of the emerging Internet of Things (loT), where data collection devices will stream data within urban areas.
Challenges in Using Big Data
Big data’s benefits to planning and planners are not yet clear. In addition, there are a number of challenges associated with the use of it. These challenges include data
Planning Data and A nalysis 71 preparation and data quality, data analytics, data confidentiality and security, data access, and privacy. All of these challenges will need attention to address both the technical and policy aspects involved.
Open Data
The Internet vastly increased our capacity to share data. And as governments began to generate more and more digital data, calls for transparency and openness also increased (Tauberer, 2014). “Open government” and “open data" initiatives have created opportunities for civic access and analysis of government information. Data sharing from private organizations has also created opportunities for innovation and commercialization (Thakuriah et al., 2017). For instance, data generated from ridesourcing services (e.g., Uber, Lyft) at the granular level can be incorporated into travel demand modeling conducted by metropolitan planning organizations. While the U.S. Census has been releasing digital data for many years in the form of count data and GIS (Tiger/Line) files, open government and open data have been more focused on information related to organizational operations, decision-making, and oversight. This includes data from budgeting, citizen complaints, and internal performance metrics. One hope is that citizen activists can perform their own analyses that are more specific to their own interests, and also save governments costs associated with labor for data handling and analysis.