The Promise and Security Challenges of Open Access Big Data
The term “big data” increasingly refers to the use of advanced data analytics methods that extract value from data. According to the 2018 Thales Data Threat Report, compared to traditional relational databases, the data generated and stored within big data environments can be orders of magnitude larger, less homogeneous, and change rapidly. There are a number of concepts associated with big data, including the three top attributes what are often referred to as the “Three ‘V’s: Volume, Variety and Velocity.” Some experts (including Jain 2019; Van Rijmenam 2018) go on to add two more Vs to the list, variability and value.
It would be difficult to define what these 5Vs mean in ways that can work in various contexts. When it comes to handling big data, different disciplines or organizations might use the same tools for collecting and manipulating the data at their disposal, but there are significant differences in how they use technologies to organize, analyze, interpret, and put the output data to work in general. The following brief description provides some points about the five Vs and their impacts on information professionals:
- 1. Volume: Data is being produced at astronomical rates, and size in this case is measured as volume. As Cano (2014) noted, with the Internet of Things (IOT) and all kinds of smart devices that feed smart living, the sheer volume of the data continues to grow every second. No wonder 90% of all data ever created was created in the past two years.
- 2. Velocity: In the context of big data, velocity refers to the speed at which huge amounts of new data are being created, collected, and analyzed in near real-time using various technological tools. Big data technology helps to cope with the enormous speed the data is created and used in near real time.
- 3. Variety: With increasing volume and velocity comes increasing variety. Big data technology allows structured and unstructured diverse data to be harvested, stored, and used simultaneously (George 2017).
- 4. Variability: It refers to the inconsistency, which is the quality or trustworthiness of the data. According to Van Rijmenam (2018), variability is the variance in meaning, or the meaning is changing (rapidly). In indexing the same term or word can have a different meaning. In the same way, to perform proper sentiment analysis, algorithms need to be able to understand the context and be able to decipher the exact meaning of a word in that context.
- 5. Value: This refers to the worth of the data being extracted. Big data can create enormous value for the global economy, driving innovation, productivity, efficiency, and growth. Despite the size, unless big data can be turned into value, it is useless (cost-benefit). In other words, the value is in the transformation and how the data is turned into information and then into knowledge.
Firican (2019) emphasized the importance of understanding the characteristics and properties of big data to prepare for both the challenges and advantages of big data initiatives. Some used the term complexity to refer to the complex process in which large volumes of data from multiple sources is collected, linked, connected, and correlated to be reliable in order to grasp the information that is supposed to be conveyed by in original data.
Unintended Consequences of Open Access
Good intentions of open access may results in deleterious consequences. As mentioned earlier, most modern information institutions attempt to offer hosted data, information, and knowledge as openly as possible. Yet, such open access can result in data and information descending into the possession of sinister individuals. For example, copious libraries now offer data repositories for their researchers, faculty, and students (University Libraries, Data Repository Services 2018). Faculty place their raw data, both quantitative and qualitative, into such a repository. Uploading their data benefits faculty by assuring that it will be accessible to colleagues who may comment, utilize, question, and otherwise implement their raw data; their data will be preserved and safe from corrupt jump drives or personal drives; and their data will be harvested and visible globally. However, since these researchers’ data is globally accessible, danger of a data parasite obtaining and misusing this data is also possible.
Data parasites are individuals who through little or no achievement of their own obtain other people’s data and use it maliciously to ultimately publish articles or other written documents, and fail to give attribution to the original data gatherers or creators. For example, data parasites will troll several different data repositories from universities, colleges, and other research institutions in hopes of gathering specific data about new technologies that could promote cleaner forms of energy for automobiles. They will then piece this data together and attempt to publish a paper or offer a conference presentation using the fragmentary data, while offering no credit to the original data gatherers (Longo and Drazen 2016). Thus, they take credit for proposing some form of this new technology and convey it as their idea and research - a form of plagiarism (Helge and McKinnon 2013). Such parasitic pseudo-research harms the original creators of the data and scientific research as a whole.
Plagiarism of others’ data, information, and knowledge occurs frequently and for various reasons. Sometimes, researchers accidentally use another’s research and data without giving proper attribution. Other times, such as with data parasites, plagiarism is intentional. Such malevolent intent can occur because a student researcher simply believes he or she will not get caught in such a malicious act. Other reasons for plagiarism include not taking an academic course seriously; not understanding self-plagiarism, improper conceptualization of what common knowledge is; and not knowing how to accurately cite scholarship, research, and data (Helge 2017). Dissertations and theses also often become the target of cybercriminals preying on academic informational institutions.
Intellectual Property Rights and Pirated Theses and Dissertations
Many benefits arise when students and faculty place their dissertations or theses into an open access scholarship repository. Their research is instantly accessible to anyone around the globe; sharing their research globally results in personal and professional benefits; they have a permanent and convenient hyperlink with which to refer prospective employers, research collaborators, and other research entities; they may receive invaluable constructive criticism from many researchers globally; and other altruistic researchers have perpetual and efficient access to this invaluable scholarship (Abrizah et al. 2015). Despite these benefits, as with open data, negative ramifications may manifest with open access to dissertations and theses as well. Serving as a scholarly communications librarian at the University of North Texas, I was approached by a faculty member who had just obtained her Ph.D. from North American university. She deposited her recently completed dissertation into her university’s digital scholarship repository and was excited about the potential benefits of such a deposit. However, she discovered her dissertation had been pirated and was being sold in China. She queried whether anything could be done to stop the scholarship bootlegging. The response given to her explained that, unfortunately, legally not much could be offered. In the United States, one may be sued for copyright infringement and other intellectual property crimes when a dissertation is improperly reproduced, distributed, displayed, or when illegal derivatives arc created within the borders of the United States of America (17 U.S.C. 2018). However, when such intellectual property crimes occur outside of the United States of America, such as in China, the US courts do not have legal personal jurisdiction to allow a prosecution to proceed, without proper extradition. Pennoyer v. Neff, 95 U.S. 714 (1878). Obtaining proper extradition from China is very cumbersome, especially for a stolen dissertation or thesis. So, at best for faculty or students whose dissertations or theses are stolen and sold, they should be happy someone is actually reading their scholarship.
Internet Crimes Complaint Center (IC3) Roles in Protecting IP Rights
Besides being elated someone is actually reading and paying money for their dissertation or thesis, victims of scholarship piracy may also file a complaint with the Internet Crimes Complaint Center (IC3) www.ic3.gov/default.aspx. IC3 is a branch of the Federal Bureau of Investigation and examines Internet-facilitated criminal activity (Federal Bureau of Investigation, Internet Crime Complaint Center 2018). There is no guarantee victims of scholarship theft will receive any equitable or monetary relief from filing a complaint with the IC3; however, filing a complaint with this federal entity could help in such recovery. Although it is difficult to legally punish cybercriminals who steal and misuse intellectual property outside of specific legal jurisdictions, some countries such as the European Union (EU) formed alliances and passed legislation that protects the use of certain individual data.