Distributed Data Mining

At present, most data mining techniques are essentially centralized. These approaches, however, may be inappropriate or inefficient for many distributed data applications because of the long response times and the inappropriate use of distributed resources (Park and Kargupta 2003; Denham 2019). Data transmission from distributed sensors or microgrids to a central processing unit may create heavy traffic over the bandwidth of communication systems and also increase latency. With the advent of distributed renewable generation and more complex system configurations, there has been a surge of interest in utilizing distributed data mining and data fusion architectures for monitoring system behavior.

Modern distributed architectures and algorithms offer the possibility to incorporate more complex information to global system behavior that needs to be processed taking into account that data may be inherently distributed geographically across the power system.

A conceptual illustration of a distributed data mining architecture is shown in Figure 9.4. Typically, in this architecture, data mining is performed at a local level resulting in two or more local models that need to be aggregated. Several issues make the analysis of distributed data difficult:

  • • The existence of various constraints such as limited bandwidth, geographical separation, and data privacy (proprietary data) to mention a few issues (da Silva et al. 2005).
  • • Geographical data may exhibit local trends and be inherently heterogeneous.
  • • Further, local data may be unrelated to global system behavior and therefore techniques are needed for feature selection prior to data mining and data fusion.

These problems can be efficiently addressed using distributed data mining techniques. As discussed in (Denham et al. 2019), however, this requires the development of specialized algorithms for true distributed data mining. In this regard, SVM and other classifiers can be employed for feature section and feature extraction and data classification prior to the application of data mining techniques. Other associated problems include distributed clustering and distributed data fusion.

Application of distributed data mining architectures has recently been investigated using multiblock analysis techniques. Other approaches based on tensor representations hold promise for the analysis of more complex data sets as discussed in the Sections 9.3 and 9.4.

Dimensionality Reduction

One important application of data fusion techniques is dimensionality reduction (Engel et al. 2011; Li et al. 2012). Major research issues are briefly summarized in Subsections 9.5.1 and 9.5.2.

Limitations of Existing Dimensionality Reduction Methods

Nonlinear dimensionality reduction methods discussed in Chapter 6 are constructed based on a specific time window. While these approaches can be easily modified to capture local behavior, they are based on matrix representations which make them unsuitable for the study of high-dimensional data. In addition, these methods are prone to higher computational cost.

In this regard, the use of sparse methods for data fusion with the ability to learn from the data is especially interesting. Methods include LLE and other sparse representations (Van der Maaten 2009).

Deep Learning Multidimensional Projections

The application of dimensionality reduction methods to large, high-dimensional data sets is computationally challenging and suffers from various problems. Recent alternatives based on deep learning projections have the capability to handle out-of-sample data in an efficient manner and can be used to learn any projection technique (Espadoto et al. 2019, Gisbrecht et al. 2012). Figure 9.5 illustrates schematically the nature of this approach.


Distributed data mining architecture. (Based on Park, B., Kargupta, H., Distributed data mining: Algorithms, systems, and applications, in Ye, N. (ed.), The Handbook of Data Mining, pp. 341- 358, Lawrence Erlbaum Associates, Mahwah, NJ, 2003.)


A graphical depiction of a deep learning projection model. (Adapted Espadoto, M. et al., arXiv.1902.07958, February 2019.)

A property shared by these methods is that they allow learning from the data itself (Worden et al. 2011) and can be more efficient that their counterpart using spectral projection methods which is illustrated in Figure 9.6.

Other contributions include methods to derive low-dimensional models for data visualization based on deep neural networks (Becker et al. 2017) that extend previous work based on the application of neural networks to reduce data dimensionality (Hinton and Salakhutdinov 2006).

Bio-Inspired Data Mining and Data Fusion

Many recent computational algorithms for data mining and data fusion have been motivated by biological processes. These include neural-network based approaches, evolutionary approaches, artificial intelligence and machine learning techniques (Worden et al. 2011). Interest in this subject is reflected in recent special publications (Olario and Zomaya 2006).

Unlike more conventional methods, deep-learning techniques such as convolutional neural networks, deep belief networks, and recurrent neural networks can learn from the data itself. Therefore these deep-learning techniques can, in some situations, outperform more traditional techniques.

Table 9.2 summarizes some analytical models used in recent applications. Each category may include several subcategories.


Overview of Bio-Inspired Data Mining and Data Fusion Techniques




Data fusion

Diffusion maps

Spatiotemporal clustering

Bio-inspired clustering application Nonlinear PCA Data visualization Decision support algorithms Information visualization

Artificial intelligence and machine learning

Anomaly detection and prediction Health monitoring systems Predictive and real time analytics Data mining

Situation and threat assessment

Deep learning

Anomaly and change detection

Data fusion


Fuzzy Kalman filter

Multisensory data fusion architectures, clustering and classification

Other Emerging Issues

Other emerging issues include data fusion under imprecise or unknown environments (Fouratti, 2015), predictive learning, visual analytics, and data fusion via intrinsic dynamic variables (Williams 2015), and anomaly detection using data mining techniques (Agrawal and Agrawal 2015).

Envisaged applications include:

  • • Clustering based anomaly detection,
  • • Classification-based anomaly detection, and
  • • Time scale separation in dimensionality reduction.

Application to Power System Data

One of the major application areas for data mining and data fusion techniques is power system monitoring. In Fusco et al. (2017), a computational framework is proposed for power systems data fusion—based on probabilistic graphical models and capable of combining heterogeneous data sources with classical state estimations. Arvizu and Messina (2016) explored the use of diffusion maps to characterize the collective dynamics of transient processes in power systems.

Efforts have also been made to develop techniques to jointly analyze multitype, multisource data. The joint use of frequency and voltage signals has already been investigated for diverse applications. Work such as that shown in Dutta and Overbye (2014) is seminal in this context.

Wind and Renewable Energy Generation Forecasting

With the fast development of the highly variable and uncertain field of renewables generation, accurate forecasting is growing in importance. Figure 9.7 shows one-week of wind generation at a 115 kV point of common coupling of a large wind farm. Developing forecasting techniques for such measurements poses major conceptual and practical challenges due to the natural uncertainty and variability of the wind itself.

Within the last decade several attempts have been made to develop forecasting techniques for renewables generation. In Mohan et al. (2018), a data-driven strategy for short-term electric load forecasting using DMD was proposed. Recently, Zavala and Messina et al. (2014) and Messina et al. (2017) discuss the application of dynamic harmonic regression to predict wind generation in large-scale power systems (Figure 9.8).


Weekly data for a 115 kV wind farm.


Illustration of a forecasting technique.

Application to Distribution Systems

Another significant area of continuing research is the application of data mining and data fusion techniques to distribution systems. Because of the increased availability of smart sensors and distribution PMUs (DPMUs), the development of fusion techniques is of great interest (Donde and Mohamed 2016).

Representative applications in power system modal analysis appear in several recent publications (Joseph and Jasmin 2017; Pinte et al. 2015; Meier et al. 2017; Roberts et al. 2016; Donde and Mohamed 2016; Zhang et al. 2018; Wang et al. 2016; Guikema et al. 2010).

Other recent applications include:

  • • Identification of phase connectivity using data mining and the Open Distribution System Simulator,
  • • Power quality monitoring,
  • • Distribution system monitoring,
  • • Characterization of distributed generation, and
  • • Phasor-based control, among other issues.

Other emerging applications include monitoring, protection, and control of distribution networks and estimation of power outage risks.


Abdulhafiz, A. A., Khamis, A., Handling data uncertainty and inconsistency using mul- tisensory data fusion, Advances in Artificial Intelligence, Article ID 241260,2013.

Acar, E., Kolda, T. G., Dunlavy, D. M, All-at-once optimization for coupled matrix and tensor factorizations, ArXiv.org, 1105.3422, e-print, May 2011.

Agrawal, S., Agrawal,J., Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, 708-713, 2015.

Arvizu, С. M. C., Messina, A. R., Dimensionality reduction in transient simulations: A diffusion maps approach, IEEE Transactions on Power Delivery, 31(5), 2379-2389,2016.

Atluri, G., Karpatne, A., Kumar, V., Spatio-temporal data mining: A survey of problems and methods, arXiv:1711.04710,2017.

Becker, M., Lippel, J., Stuhlsatz, A., Regularized nonlinear discriminant analysis: An approach to robust dimensionality reduction for data visualization, 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 3, Porto, Portugal, 2017.

Bisantz, A. M., Finger, R., Seong, Y., Llinas, J., Human performance and data fusion based decision aids, Proceedings of the 2nd International Conference on Information Fusion based Decision Aids - Fusion'99. International Society of Information Fusion, Chicago, IL, 1999.

Chen, X. C, Faghmous, J. H., Khandelwal, A., Kumar, V., Clustering dynamic spatio-temporal patterns in the presence of noise and missing data, Proceedings of the Tzventy-fourth International Conference on Artificial Intelligence (IJCAI2015), Buenos Aires, Argentina, pp. 2575-2581,2015.

Da Silva, J. C, Giannella, C, Bhargava, R., Kargupta, H., Klusch, M., Distributed data mining and agents, Engineering Applications of Artificial Intelligence, 18, 791-805, 2005.

Della Mura, M. D., Prasad, S., Pacifici, F., Gamba, P, Benediktsson, J. A., Challenges and opportunities of multimodality and data fusion in remote sensing, Proceedings of the IEEE, 103(9), 11(4), 1585-1601,2015.

Denham, B., Pears, R., Asif Naee, M., HDSM: A distributed data mining approach to classifying vertically distributed data streams, Knowledge-Based Systems, 189, 105114,2019.

Donde, V., Mohamed, S., Data fusion and analytics applications for PG&E's power distribution systems, i-PCGRID workshop, https://ipcgrid.ece.msstate.edu/ presentations/2016/, Mississippi State University, 2016.

Dutta, S., Overbye, T., Feature extraction and visualization of power system transient stability results, IEEE Transactions on Power Systems, 29(2), 966-973, 2014.

Engel, D., Huttenberger, L., Hamann, B., A survey of dimension reduction methods for high-dimensional data analysis and visualization, Visualization of Large and Unstructured Data Sets: Applications in Geospatial Planning, Modeling and Engineering-Proceedings ofIRTG 1131, Dagstuhl Publishing, Germany, pp. 135-149, Workshop 2011.

Esling, P, Agon, C, Time-series data mining, ACM Computing Surveys, Association for Computing Machinery, 45(1), A:l-A:31, 2012.

Espadoto, M, Hirata,l N. S. T., Telea, A. C, Deep learning multidimensional projections, arXiv.1902.07958, February 2019.

Fouratti, H. (Ed.), Multisensor Data Fusion - From Algorithms and Architectural Design to Applications, CRC Press, Boca Raton, FL, 2015.

Fusco, E, Tirupathi, S., Gormally, R., Power systems data fusion based on belief propagation, 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Torino, Italy, September 2017.

Ghamisi, R, Rasti, B., Yoyoka, N., Wang, Q., Hofle, B., Bruzzone, L., Bovolo, F. et al., Multisource and multitemporal data fusion in remote sensing, IEEE Geoscience and Remote Science Magazine, IEEE Geoscience and Remote Science Magazine, IEEE Geoscience and Remote Sensing Magazine, pp. 6-39, March 2019.

Gisbrecht, A., Lueks, W., Mokbel, B., Hammer, B., Out-of-sample extensions for nonparametric dimensionality reduction, ESANN 2012 Proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, April 2012.

Guikema, S. D., Quiring, S. M., Han, S. R., Prestorm estimation of hurricane damage to electric power distribution systems, Risk Analysis, 30(12), 1744-1752,2010.

Guyon, I., Elisseeff, A., An introduction to variable and feature selection, journal of Machine Learning Research, 3,1157-1182,2003.

Hinton, G. E., Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks. Science, 313,504-507, 2006.

Joseph, S., Jasmin, E. A., Big data analytics for distribution system monitoring in smart grid, International Journal of Smart Home, 11(5), 21-32,2017.

Kisilevich, S., Mansmann, F., Nanni, M., Rinzivillo, S., Spatio-temporal clustering: A survey, in Maimon, O., Rokach, L. (Eds.), Data Mining and Knowledge Discovery Handbook, Springer Science, New York, 2010.

Kolda, T. G., Bader, B. W., Tensor decompositions and their applications, SIAM Review, 51(3), 455-500,2009.

Kolda, T. G., Sun, J. Scalable tensor decompositions for multi-aspect data mining, 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, December 2008.

Korycinski, D., Crawford, M. M., Barnes, J. W., Adaptive feature selection for hyperspectral data analysis, IEEE International Geoscience and Remote Sensing Symposium, Toulouse, France, July 2003.

Lahat, D., Adali, T, Jutten, C., Multimodal data fusion: An overview of methods, challenges and prospects, Proceedings of the IEEE, 103(9), 1449-1477, September 2015.

Li, W., Prasad, S., Fowler, J. E., Bruce, L. M., Locality-preserving dimensionality reduction and classification for hyperspectral image analysis, IEEE Transactions on Geoscience and Remote Sensing, 50(4), 1185-1198,2012.

Lunga, D., Prasad, S., Crawford, M. M., Ersoy, O., Manifold-learning based feature extraction for classification of hyperspectral data, IEEE Signal Processing Magazine, 31(1), 55-66, 2014.

Meier, A., V, Stewart E., McEachern, A., Andersen, M., McEachern, L., Precision Micro-Synchrophasors for Distribution Systems: A Summary of Applications, IEEE Transactions on Smart Grid, 8(6), 2926-2936, November 2017.

Messina, A. R., Castellanos, R., Castro, С. M., Barocio, E., Jimenez Zavala, A., Large- scale wind generation development in the Mexican power grid: Impact studies, in Handbook of Distributed Generation, Vol. 13, pp. 109-148, Springer International Publishing, Cham, Switzerland, 2017.

Messina, A. R., Wide-Area Monitoring of Interconnected Power Systems, IET Power and Energy Series 77, London, UK, 2015.

Mohan, Nv Soman, K. P., Kumar, S. S., A data-driven strategy for short-term electric load forecasting using dynamic mode decomposition model, Applied Energy, 232, 229-244, 2018.

Olario, S., Zomaya, A. Y. (Eds.), Handbook of Bioinspired Algorithms and Applications, Chapman Hall & Hall/CRC Computer and Information Science Series, Boca Raton, FL, 2006.

Papalexakis, E. E., Faloutsos, C, Sidiropoulos, N. D., Tensors for Data Mining and Data Fusion: Models, Applications, and Scalable Algorithms. ACM Transactions on Intelligent Systems and Technology, 8(2), 16:1-16:44, 2016.

Park, B., Kargupta, H. Distributed Data Mining: Algorithms, Systems, and Applications, In Ye, N. (ed.) The Handbook of Data Mining, pp. 341-358, Lawrence Erlbaum Associates, Mahwah, NJ, 2003.

Pinte, B., Quinlan, M., Reinhard, K., Low voltage micro-phasor measurement unit (pPMU), 2015 IEEE Power and Energy Conference at Illinois (PECI), Champaign, IL, February 2015.

Poslad, S., Middleton, S. E., Chaves, F., Tao, R. Necmioglu, O., Bugel, A. R., A semantic IoT early warning system for natural environment crisis management, IEEE Transactions on Emerging Topics in Computing, 3(2), 246-257,2015.

Roberts, С. M., Shand, С. M., Brady, K. W., Stewart, E. M., McMorran, A. W., Taylor, G. A., Improving distribution network model accuracy using impedance estimation from micro-synchrophasor data, 2016 IEEE Power Engineering Society General Meeting, Boston, MA, July 2016.

Shekhar, S., Jiang, Z., Ali, R. Y., Eftelioglu, E., Tang, X., Gunturi, V. M. V., Zhou, X., Spatiotemporal data mining: A computational perspective, ISPRS International Journal of Geo-Information, 4,2306-2338, 2015.

Sun, J., Tao, D., Faloutsos, C, Beyond streams and graphs: Dynamic tensor analysis, Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, August 20-23,2006.

Taniar, D., Data Mining and Knowledge Discovery Technologies, IGI Publishing, Hershey, PA, 2008.

Tipping, M. E., Bishop, С. M., Probabilistic principal component analysis, Journal of the Royal Statistical Society B, 61(3), 611-622,1999.

Treinish, L. A., Visual data fusion for decision support applications of numerical weather prediction, Proceedings of the Conference on Visualization'00, pp. 477-480, Los Alamitos, CA, 2000.

Van der Maaten, L. J. P, Postma, E. O., Ven den Herik, H. J., Dimensionality reduction: A comparative review, Tilburg University Technical Report, Holland, TiCC-TR 2009-005, 2009.

von Meier, A., Stewart E., McEachern, A., Andersen, M., McEachern, L., Precision micro-synchrophasors for distribution systems: A summary of applications, IEEE Transactions on Smart Grid, 8(6), 2926-2936, 2017.

Wang, W., Yu, N., Foggo, B., Davis, J., Phase identification in electric power distribution systems by clustering of smart meter data, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, December 2016.

Williams, M. O., Rowley, C. W., Mezicm I., Kevrekidis, I. G., Data fusion via intrinsic dynamic variables: A application of data-driven Koopman spectral analysis, Europhysics Letters (EPL), 109(4), 40007pl-p6,2015.

Worden, K., Staszewski, W. J., Hensman, J. J., Neural computing for mechanical systems research: A tutorial overview, Mechanical Systems and Signal Process, 25,4-111,2011.

Wu, J., Lin, Z., Zha, H., Essential tensor learning for multi-view spectral clustering, IEEE Transactions on Image Processing, 28(12), 5910-5922, December 2019.

Zavala, A. J., Messina. A. R., A dynamic harmonic regression approach to power system modal identification and prediction, Electric Power Components and Systems, 42(13), 1474-1483, 2014.

Zhang, J., Multi-source remote sensing data fusion: Status and trends, International Journal of Image and Data Fusion, 1(1), 5-24,2010.

Zhang, Y., Huang, T, Bompard, E. E, Big data analytics in smart grids: A review, Energy Informatics, 1,1-8, 2018.

< Prev   CONTENTS   Source   Next >