MERGING MACHINE LEARNING WITH BLOCKCHAIN

Machine learning relies on vast quantities of data to build models for accurate prediction. A lot of the overhead incurred in getting this data lies in collecting, organizing and auditing the data for accuracy. This is an area that can significantly be improved by using blockchain technology. By using smart contracts, data can be directly and reliably transferred straight from its place of origin. For example, a machine learning model for self-driving trucks would require several hundred terabytes of actual truck driving data. Traditionally, all of the data, like driving speeds, fuel consumption, breaks, etc., would first be collected using different trackers. It would then be sent to a processing facility where auditors would sift through the data to make sure it was authentic before sending it to be processed by data scientists. Smart contracts could, however, improve the whole process significantly by using digital signatures. By using blockchains to ensure the security and ownership of the collected data, we could program smart contracts to directly send the data from the truck driver to the data scientists who would use the data for building machine learning models.

This means that this fusion of blockchain technology and machine learning is a game changer for the self-driving research as it can help create a marketplace for data for research. The finance and insurance industries have a lot to gain as well because together they can be used to design tools to identify and prevent fraud. Using machine learning to improve supply chain solutions can help corporations around the world save billions of dollars every year by reducing wastage and theft.

BLOCKCHAIN + MACHINE LEARNING: DEMOCRATIZING DATA ACCESS

Having access to superior models over those of your competitors can provide great competitive advantages when using these models as either services or as backend components to various applications. For example, with something like image recognition services, the market is effectively won by the company with the best performance. These models require little to no person-to-person contact to use, and are simple to hook into programmatically, so it would seem that there is little need for loyalty if a competitor’s product were more effective. Therefore, since performance is the key indicator of success in this arena, is it in the best interest of these entities to ensure that their competitors cannot match their performance.

One may assume that superior machine learning ability is a function of mathematical prowess. However, it is widely understood that this is not typically the case. Most technical progress in the field is publicly available and is presented at conferences open to everyone. Instead, advantages in the field of machine learning primarily come from having more, or better, data to train models with. A model can be extremely sophisticated, but if it is trained with low-quality data or not enough data, it will nonetheless be limited in its effectiveness. Conversely, a relatively simple model, given very high-quality data, can often outperform a more complex one that was trained with bad data. Therefore, the ones who will hold the power in the field of machine learning are the ones who have control over, and access to, large amounts of data. Coincidentally, the entities that tend to have the data, such as large tech companies, like Google or Facebook, also tend to have the best researchers and modelers available. They keep private, centralized data repositories that they collect from user data (much of which is voluntarily entered in by users), and can then use these large data-sets to train their cutting-edge models (Figure 2.34).

Thus enters the potential of blockchain technology in machine learning, primarily in the context of data ownership, collection, and access. If we were to decentralize data collection and allow everyone to access useful data-sets, the competitive moats that these large corporations have would be erased. Given that these data-sets are made from user-contributed activity logs and content, it is only fair that the data is made available to the users who effectively created them.

Synapse Al

This is what the Synapse AI project aims to accomplish with its platform. It is creating a platform in which data contributors are fully aware of the data that they are contributing, and ensures that they are compensated for their contributions (Figure 2.35). For example, users will be able to knowingly contribute their social photos and their tags, or their GPS data, in exchange for compensation. Users of the platform can then pay to access these curated data-sets or trained models in the form of micro-services. The platform aims to create a cyclical economy in which: (1) agents contribute data, (2) data is pooled, (3) models are created using this data, and then (4) agents consume the models. The Synapse ML team hopes that this enables agents in the world to exponentially increase their capabilities by compounding their knowledge of the world through this cyclical process. You can think of this as a sort of automated active learning in which the agent itself autonomously queries for additional information or modeling capabilities. The tokens themselves are used for payments in the platform, for bonding to ensure quality is maintained, and for staking in order to support services (Figure 2.36).

AI tools

FIGURE 2.36 AI tools.

 
Source
< Prev   CONTENTS   Source   Next >