Processing Big Data by Means of Artificial Intelligence

Part of the power of big data comes from the potential for using artificial intelligence, in particular machine learning and deep learning, approaches to identify coherent patterns in the data. Importantly, human observers are needed in these processes to monitor automatically created results to determine which patterns are meaningful and which are random (Woo, Tay, Jebb et al., 2020).

Machine Learning

Machine learning methods are a collection of statistical methods with the goal of creating reliable and replicable prediction models from datasets that may have a large number of variables. With a large number of variables, traditional approaches, such as multiple regression, may not yield reliable models of what to expect. To handle the large number of variables efficiently, machine learning algorithms can be used to search the predictor space and highlight variables with explanatory power (see Chapters 5 and 6). The methods do not guarantee that an ideal model is found, but instead they can find a model that performs well under a variety of conditions. Given the exploratory nature of these approaches, internal cross-validation methods are needed to prevent overfitting. Overfitting occurs when the model fits too closely to a specific dataset, such that the model accounts for the unique features of the data, which makes the model less generalizable when applied to a new dataset (Grimm, Stegmann, Jacobucci, & Serang, 2020). Moreover, the large number of participants and variables makes it possible to split the data randomly into smaller samples, allowing some subsets to be used to train models and other subsets to be used to test those models, making it possible to replicate findings within the same study. The accuracy of automatic models can also be compared with human annotations of a subset of data, providing an additional indication of accuracy (Woo, Tay, Jebb et al., 2020). Machine learning approaches can be generally categorized into supervised or unsupervised learning methods (Alpaydin, 2009).

< Prev   CONTENTS   Source   Next >