Big Data Management

The complex nature of Big Data resources requires not only the use of various techniques to process them, but also to organize the entire process of managing such data. There are five identified phases in Big Data analysis system which include (Agrawal et al., 2011): (a) acquisition/recording; (b) extraction/cleaning/ annotation; (c) integration/aggregation/representation; (d) analysis/modeling; and (e) interpretation.

The data acquisition stage is a particularly important because the quality of the results of the analysis depends on its performance. The main difficulty at this stage is to reach relevant and reliable data sources. NoSQL databases are frequently used to acquire and store Big Data. Such systems just extract all data and do not categorize them or parse them by designing a schema. There exists a big challenge to generate right metadata to make a description of all data that are recorded, and the ways in which they are recorded and measured.

The second phase refers to cleaning and extracting the information that has already been received. It is necessary to change the format of the distributed data and prepare it for further analysis. The information that can be extracted from the data depends on its quality. It means that poor-quality data will almost always lead to poor results (“garbage in, garbage out”). Therefore, data cleaning (or scrubbing) is highlighted as one of the most important steps that should taken before data analysis is conducted. This often involves significant costs as the whole process can take from 50 to 80% of a data analyst’s time together with the actual data collection costs (Reimsbach-Kounatze, 2015).

The next step involves preparing and processing the data by using specific programs and programming languages, in other words organizing data. All data must be comprehensible for the computers. It has to be noticed that there is more than one way to store the information, which means that depending on the purpose the data can be presented differently in a more effective way.

The step of analysis/modeling refers to the use of different data mining techniques (Schmarzo, 2013; Zhao, 2015). They include mainly (Chen et al., 2012) clustering, classification and prediction, outlier detection, association rules, sequence analysis, time series analysis, text mining, and also some new techniques such as social network analysis and sentiment analysis. Every data mining model relies on machine learning—supervised or unsupervised.

At the last stage, a critical assessment of the results obtained should be made. First of all, it should be decided whether the results obtained can be considered reliable, taking into account the scope of the sources analyzed. If the results obtained do not raise any doubts, they can be proceeded to their descriptive formulation and conclusions can be drawn which is the basic goal of the whole process.

The management of Big Data would not be successful if it was not for an appropriate environment that could support Big Data in dealing with storage, analytics, reporting, and applications. The environment must include all considerations of hardware, infrastructure software, operational software, management software, well-defined application programming interfaces, and even software developer tools (Hurwitz et al., 2013). The appropriate employment of Big Data algorithms to the analysis of the data of sufficient quality can provide numerous opportunities for improvements in the whole society. In addition to the market-wide benefits such as defining a more effective way of matching products and services to consumers, Big Data can also create opportunities for low-income and underserved communities (Ramirez et al., 2016).


Agrawal, D. et al. (2011). Challenges and opportunities with Big data 2011-1. Purdue University Labraries: Purdue e-Pubs. Retrieved from viewcontent.cgi?article= 1000&context=cctech.

Aridhi, S., & Nguifo, E. (2016). Big graph mining: Frameworks and techniques. Big Data Research, 6, 1—10.

Barber, D. (2008). Clique matrices for statistical graph decomposition and parame-nite marries. Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 26—33.

Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. Proceedings of the Third International Conference on Weblogs and Social Media, ICWSM, 361-362.

Baumman, D., & Smith, N.A. (2015). Contextualization sarcasm detection on Twitter. International AAAI Conference on Web and Social Media, North America, 574—577. Retrieved from 5/paper/ view/10538/10445.

Bean, R. (2017). How Companies Say They’re Using Big Data. Retrieved from https://hbr. org/2017/04/how-companies-say-theyre-using-big-data.

Bezerianos, A., Chevalier, F., Elmqvist, N., & Fekete, J.D. (2010). GraphDice: A system for exploring multivariate social networks. Computer Graphics Forum, 29, 863-872.

Blake, C. (2011). Text mining. Annual Review of Information Science and Technology, -0(10), 121-155.

Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market .Journal of Computational Science, 2(1), 1-8.

Buchholtz, S., Bukowski, M., & Sniegocki, A. (2014). Worldwide Big Data Technology and Services 2013—2017 Forecast, report, IDC, December 2013, Big and open data in Europe. A growth engine or a missed opportunity1 Warsaw Institute for Economic Studies, report commissioned by demosEUROPA, 2014.

Callut, J., Francoisse, K., Saerens, M., & Dupont, P. (2008). Semi-supervised classification from discriminative random walks. Lecture Notes in Artificial Intelligence, 5211, 162— 177. Berlin Heidelberg: Springer.

Capgemini (2013). Consulting. Technology. Outsourcing. Retrieved from http://www.cap- default/files/ resource/ pdf/Search-Based_BI .pdf.

Catteneo, G. (2014). The European Data Market, raport IDC, presentation given at the NESSI summit in Brussels on 27 May 2014. Retrieved from http://www.nessi-europe. eu/?Page=nessi_summit_2014.

Chang, G., Healey, M., McHugh, J.A.M., & Wang, T.L. (2001). Mining the world wide web. An Information Search Approach. Springer.

Chen, C., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Science, 275, 314-347.

Chen, H., & Chau, M. (2004). Web mining: Machine learning for web applications. Annual Review of Information Science and Technology, 38, 289-329.

Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business Intelligence and analytics: from Big Data to big impact. MIS Quarterly, 36(A), 1-24.

Chen, Y., & Fonseca, F. (2004). A Bipartite Graph Co-Clustering Approach to Ontology Mapping. Retrieved from ISWC03.pdf.

Cios, K. J., Pedrycz, W., & Swiniarski, R. W. (2000). Data mining methods for knowledge discovery. New York: Springer.

Connick, H. (2017). Turning Big Data into Big Insights. Retrieved from https://www.ama. org/marketing-news/turning-big-data-into-big-insights.

Cox, M., & Ellsworth D. (1997). Managing Big Data for Scietific Visualization. Retrieved from https://www.


Davenport, T. H., Barth, P., & Bean, R. (2012). How big data is different. MIT Sloan Management Review, 54(1), 22-24.

Davenport, T. H., & Harris, J. G. (2007). Competing on analytics. The new science on winning. Boston Massachusetts, MA: Harvard Business School Press.

Deb, K. (1999). An introduction to genetic algorithms. Sadhana, 24(4-5), 293-315.

Dewey, J.P. (2014). Big Data. Salem Press Encyclopedia.

Dias, C. R., & Ochi, L. S. (2003). Efficient evolutionary algorithms for the clustering problem in directed graphs. Proceedings of the 2003 IEEE Congress on Evolutionary Computation, 1, 983-988.

Eichinger, E, & Boehm, K. (2010). Software-bug localization with graph mining. In С. C. Aggarwal, H. Wang (Eds.), Managing and Mining Graph Data, Advances in Database Systems, vol. 40, 515—546. New York Dordrecht Heidelberg London: Springer.

Erl, T, Khattak, W., & Buhler, P. (2015). Big Data fundamentals: concepts, drivers & techniques. Boston: Prentice Hall.

Fatat, G.D., & Berthold, M.R. (2005). High performance subgraph mining in molecular compounds. HPCC, 866-877.

Firican,G. (2017). Tire 10 Vs of Big Data. TDWI. Retrieved from articles/2017/02/08/10-vs-of-big-data.aspx.

Furnkranz, J. (2010). Web mining. In: O. Maimon, L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook. Springer.

Ghemawat, S., Gobioff, H., & Leung, S.T. (2003). The Google File System, SOSP’03, October 19-22, Bolton Landing, New York, USA. Retrieved from https://storage. I4bc94cffl527999.pdf.

Ghemawat, S.,Gobioff, H., & Leung, S.T. (2003). Proceedings of the 19th ACM symposium on operating systems principles. Bolton Landing, NY: ACM, 20—43.

Grossner, К. E, Goodchild, M. E, & Clarke, F. C. (2008). Defining a digital earth system. Transactions in GIS, 12(1), 145—160.

Halaweh, M., & Massry, A. E. (2015). Conceptual model for successful implementation of Big Data in organizations. Journal of International Technology and Information Management, 24(2), 21-29.

Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concept and techniques. New York: Morgan Kaufmann.

Hassan, A.K. (2014). Big Data: Techniques and technologies in geoinformatics.

Himmi, K., Arcondara, J., Guan, R, & Zhou, W. (2017). Value oriented Big data strategy: analysis & case study. Proceedings of 50lh Hawaii International Conference on System Sciences. Hawaii.

Hopikins, B. (2016). Think you want to be “data-driven”. Insights in the new data. Retrieved from insight_is_the_new_data/.

Hsu, H. H., Chang, C. Y., & Hsu, С. H. (2017). Big data analytics for sensor-network collected intelligence. Academic Press.

Hurwitz,J., Nugent, A., Halper, E, & Kaufman, M. (2013). Bigdata for dummies. New Jersey, Hoboken: John Wiley & Sons, Inc.

Jacomy, M., Venturini, T, Heymann, S., & Bastian, M. (2014). ForceAtlas2, A Continous Graph Layout Algorithm for Handy Network Visualisation for the Gephi Software, PLoS ONE,9(6). Retrieved from journal.pone.0098679.

Jensen, T. R. (2010). Graphs. In: C. Samrnut, G. I. Webb (Eds.), Encyclopedia of Machine Learning, 479-482. Berlin Heidenberg: Springer.

Kanungo, T, Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient к-means clustering algorithm: Analysis and implementation. Pattern Analysis and Machine Intelligence. IEEE Transactions. 24(7), 881-892.

Kashima, H., & Inokuchi, A. (2002). Kernels for graph classification. In: Proceedings of the ICDM Workshop on Active Mining, 31-36. Maebashi, Japan.

Ketkar, N.S., Holder, L.B., & Cook, D.J. (2009). Empirical Comparison of graph classification algorithms. IEEE. Symposium on Computational Intelligence and Data Mining. Washington State University. Retrieved from—cook/pubs/ cidm09.2.pdf.

Koga, H., Tomokazu, T, Yokoyama, T, & Wanatabe, T. (2010). New application of graph mining to video analysis. In: C. Fyfe, P. Tino, D. Charles, C. Garcia-Osorio, H. Yin (Eds), Intelligent Data Engineering and Automated Learning - IDEAL 2010. Lecture Notes in Computer Science, 6283, 86—93. Berlin-Heidelberg: Springer.

Konak, D. Coit, W., & Smith, A. E. (2006). Multi-objective optimization using genetic algorithms: a tutorial. Reliability Engineering & System Safety, 91(9), 992—1007.

Kordon, A. (2010). Applying computational intelligence:How to create value. Berlin-Heidelberg: Springer.

Kraus, J. M., Palm, G., & Kestler, H. A. (2007). On the robustness of semi-supervised hierarchical graph clustering in functional genomics. 5th International Workshop on Mining and Learning with Graphs, 147-150. Italy, Florenz.

Kumar, G.D., &C Gosul, M. (2011). Web mining research and future directions. In: D. C. Wyld, M. Wozniak, N. Chaki, N. Meghanathan, D. Nagamalai. (Eds.), Advances in Network Security and Applications. CNSA 2011. Communications in Computer and Information Science, 196. Berlin, Heidelberg: Springer.

Lapkin, A. (2012). Hype cycle for big data. Gartner, Inc. Retrieved from h ttps: / / www.gartncr. com/en/documents/2100215/hype-cycle-for-big-data-2012.

Larose, D. T. (2005). Discovering knowledge in data: An introduction to data Mining. New York: John Wiley & Sons, Inc.

La Valle, S., Lesser, E., Shockley, R., Hopkins, M., & Kruschwitz, N. (2011). Big data, analytics and the path from insights to value. MIT Sloan Management Review, 52(2), 21-31.

Le,T. V, Kulikowski, C. A., & Muchnik, I. B. (2008). Coring methods for clustering a graph. In: 19th International Conference on Pattern Recognition (ICPR 2008), December 2008, 1-4. New York: IEEE.

Lorek, P. (2017). Metody i narzedzia do projektowania systemow wspomagania tworczosci organizacyjnej. In: С. M. Olszak (Eds.), Twdrcza organizacja. Komputerowe wspo- maganie tworczosci organizacyjnej. Warszawa: CH. Beck.

Lu, Q., Shanshan Li, S., & Zhang, W. (2015).Genetic algorithm based job scheduling for Big data analytics. 2 015 International Conference on Identification, Information, and Knowledge in the Internet of Tilings. Beijing, China: IEEE, 33-38. Retrieved from https://www. Job_Scheduling_for_Big_Data_Analytics.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. KY: McKinsey Global Institute.

Maybury, M. (2004). New directions in question answering. Cambridge. US: MIT Press.

Mayer-Schonberger, V, & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.

McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review, October, 59-69.

Mills, M. P., & Ottino, J. M. (2012).lhe coming tech-led boom. Retrieved from www.wsj .com.

Mitchell, M. (1998). An introduction to genetic algorithms. MIT Press.

Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press.

O’Connor, B., Bamman, D., & Smith, N. A. (2011). Computational Text Analysis for Social Sciences: Model Complexity and Assumptions. Second Workshop on Computational Social Scienceand Wisdom of the Crowds (NIPS 2011).Retrieved from http://breno- lcss.text_analysis.pdf.

O’Driscoll, T. (2014). Can big Data deliver added value. Training, 51(2), 51.

Office of Science and Technology Policy, Executive Office of the President (2012a). Fact sheet: Big dataacross the federal government. Retrieved from


Office of Science and Technology Policy, Executive Office of the President (2012b). Obama Administration Unveils “Big data" Initiative: Announces $200 Million in New RdrD Investments. Retrieved from

Olszak, С. M. (2016). Toward better understanding and use of Business Intelligence in organizations. Information Systems Management, 33(2), 105—123.

Olszak, С. M., Bartus, T, & Lorek, P. (2017). A comprehensive framework of information system design to provide organizational creativity support. Information & Management, 55, 94-108.

Olszak, С. M., & Zurada, J. (2019). Big Data-driven Value Creation for Organizations, Proceedings of Hawaii International Conference on System Sciences (HICSS-52). University of Hawai'i at Manoa, Scholar Space for the University of Hawai' i at Manoa, January, 8-11. pp 164—173, DOI:

Omar, A. H., & Salleh, M. N. M. (2012). Integrating spatial decision support system with graph mining technique. In: V. Khachidze, T. Wang, S. Siddiqui, V. Liu, S. Cappucio, & A. Lim (Eds.), Contemporary Research on E-business Technology and Strategy. Communications in Computer and Information Science, 332, 15-24. Berlin- Heidelberg: Springer.

Oracle (2014). Information Management and Big Data. A Reference Architecture. Retrieved from bigdatarefarchitecture-2297765. pdf.

Osowski, S. (2006). Sieci neuronowe do przetwarzania informacji. Warszawa: Oficyna Wydawnicza Politechniki Warszawskiej.

Ozaki, T, & Ohkawa, T. (2008). Mining correlated Subgraphs in Graph Databases. Proceedings of 12th Pacific-Asia Conference. Japan, Osaka: PAKDD.

Paharia, R. (2014). Lojalnosc 3.0: jak zrewolucjonizowac zaangazowanie klientow i pra- cownikow dzieki big data i rywalizacji. Tlum. D. Gasper, Warszawa: MT biznes Ltd.

Parise, S., Iyer, B., & Vesset, D. (2012). Four strategies to capture and crate value from big data. Ivey Business Journal, Issue July 2012. Retrieved from http://iveybusiness- value-from-big-data/.

Pence, H. E. (2014). What is Big data and why it is important. Journal of Education Technology Systems, 43(2), 159-171.

Poul, S., Gautman, N., & Balint, R. (2003). Preparing and data mining with Microsoft SQL Server 2000 and Analysis Services. New York: Addison-Wesley.

Ramirez, E., Brill, J., Ohlhausen, M., & McSweeny, T. (2016). Big Data: A tool for inclusion or exclusion? Understanding the issues. Federal Trade Commission. Retrieved from: exclusion-understanding-issues/160106big-data-rpt.pdf.

Raymond, B., & Belbin, L. (2006). Visualization and exploration of scientific data using graphs. In: G. J. Williams, S. J. Simoff (Eds.), Data Mining, Lecture Notes in Computer Science, 3755, 14—27. Berlin - Heidelberg: Springer.

Rehrnan, S.U., Khan, A. Ul., & Fong, S. (2012). Graph mining: A survey of graph mining technique. Seventh International Conference on Digital Information Management (ICDIM), IEEE, 88-92. Retrieved from publication/233801707_Graph_mining_A_survey_of_graph_mining_techniques.

Reimsbach-Kounatze, C. (2015). The proliferation of Big data and implications for official statistics and statistical agencies. OECD Digital Economy Papers, 245, Paris: OECD Publishing.

Reshef, D. N., Reshef, Y. A., Finucane, Hilary K., Grossman, S. R., McVean, G. Turnbaugh, P.J., Lander, E. S., Mitzenmacher, M., & Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science, 334(6062), 1518-1524.

Riad, A., & Hassan, Q. (2008). Service oriented-architecture: A new alternative to traditional integration methods in b2b applications. Journal of Convergence Information Technology, 3(1), 41.

Schenker, A., Last, M., Bonke, H., & Kandel, A. (2003). Classification of Web Documents Using a Ggraph Mode. Proceedings of the Seventh International Conference on Document Analysis and Recognition, 1-5. IEEE.

Schmarzo, B. (2013). Big data: Understanding how data powers big business. Indianapolis: John Wiley and Sons.

Senthilnath, J. S., Omkar, N., & Mani, V. (2011). Clustering using firefly algorithm: performance study. Swarm and Evolutionary Computation 1, 3, 164—171.

Shekhar, S., Zhang, P, Huang,Y., & Vatsavai, R.R. (2005). Spatial data mining. In: Maimon, O., Rokach, L. (Eds.), The Data Mining and Knowledge Discovery Handbook. Heidelberg: Springer.

Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. New York: Addison-Wesley.

Thede, S. M. (2004). An introduction to genetic algorithms. Journal of Computing Sciences in Colleges, 20( 1), 115—123.

United Nations Global Pulse. (2012). Big data for development: Challenges & opportunities. UN Global Pulse. Retrieved from big-data-for-development-opportunities-and-challenges-white-paper/.

Vercellis, C. (2009). Business Intelligence. Chichester: Wiley.

Wang, K„ Yu, W., Yang, S„ Wu, M„ Hu, Y.H., & Li, S.J. (2015). A Method of estimating online social media location in Big data environment. Journal of Software, 26( 11), 2951-2963.

Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco, US: Morgan-Kaufland.

Xu, Z., Frankowick, G. L., & Ramirez, E. (2016). Effects of big data analytics and traditional marketing analytics on a new product success: a knowledge fusion perspective. Journal of Business Research, 69(5), 1562-1566.

Yada, K., Motoda, H., Washio, T., & Miyawaki, A. (2004). Consumer behavior analysis by graph mining technique. In: M. G. Negoita, J. R. Howlett, L. C. Jain (Eds.), Knowledge-Based Intelligent Information and Engineering SystemsKES 2004. Lecture Notes in Computer Science, 3214, 800-806. Berlin-Heidelberg: Springer.

Zhang, K., Bhattacharyya, S., & Ram, S. (2016). Large-scale network analysis for online social brand advertising. MIS Quarterly, 40(4), 849-868.

Zhao, P., & Yu, J.X. (2007). Mining closed frequent free trees in graph databases. Proceedings of Databases Systems for Advance Application, 91-102.

Zhao, Y. (2015). R and data mining: Examples and case studies. Academic Press, Elsevier. Retrieved from:

< Prev   CONTENTS   Source   Next >