SOLUTIONS TO BIG DATA USING BLOCKCHAIN
The popularity of blockchain technology and its application results in much ongoing research in different practical and scientific areas. Although still the new and the in experimenting phase, the blockchain is being seen as a revolutionary solution, addressing modern technology concerns like decentralization, trust, identity, data ownership, and data-driven decisions. While at the same time, the world is facing an expansion in the quantity and diversity of digital data that are generated by both users and machines. In actively searching for the best way to store, organize and process big data, the blockchain technology comes in providing significant input. The solution to decentralized management of private data, digital property solutions, Internet of Things-based communication and public institution’s reforms are having a numerous impact on how big data is evolved .
Implementation of SHA 256 with Blockchain for Big Data Technology
Hashing is an important technique in blockchain technology. It is a mathematical process that is used for writing new transactions into a blockchain. The process is executed by a hash function with the help of a hashing algorithm (SHA 256). A hash function is a function in which input of any variable length of data or string will give an output of a fixed length. The output that we get from a hash function is known as hash. In hash function, the size of the input is not a matter; whether it is 2 or 2000, it will give the output of the same length. For example, if we use the SHA 256 algorithm for hashing it will always produce an output of 256-bits length.
For a hash function to be secure, it must have the following features:
- • Deterministic: Which means the output or the hash should be the same even if the user execute the same input two or more times.
- • The hash function using should be able to produce the output or hash as quickly as possible.
- • For every hash—say H(f)—it should be infeasible to find an input f from H(f). Suppose we are using a 128-bit hash where the data is very huge and the Brute Force method is the only way to find the original input. In the Brute Force method, a random input is selected and hashed and is compared to the target hash. This process is repeated until it matches the input. Generally, it is not a practical method as there is a huge volume of data.
- • For every small change in the input, it should make a huge change in the hash.
- • The hash values should be unique, for example, for any two inputs A and B, the output hashes of H(A) H(B) should not be equal.
For every output Y, it should be infeasible to find an input X, such that where к is a random value of high minimum entropy.
There is a number of hash functions available in the blockchain technology most commonly used once are:
Basically, data is stored in blockchains then, of course, it will have a data structure also. There are mainly two data structures used in blockchain one is pointers and other is linked list. Pointers are variables that store the address of the other variables as illustrated in Figure 5.8. In blockchain technology, more than the address the pointers also store the hash value of the previous block.
In the data structure level view, we can say that blockchain is basically a linked list in which each node stores a hash pointer and a data header. While the data header
FIGURE 5.8 Simplified blockchain structure.
will store the data of that block, the hash pointer will have the address of the preceding block as well as the hash value.
Mining is simply a process in which new blocks are added to the blockchain. The miners verify a transaction that has been pushed to blockchain and add it to the blockchain if it is a valid transaction.
Each blockchain network has a time limit for the creation of a block (In Bitcoin, it is 9-10 minutes as of now). If the blocks are created at a faster rate, it will result in the generation of more hash functions in a short time which may result in the collision of hashes. Also, when new blocks are created faster, certain blocks shall not be the part of the main chain as more blocks are added simultaneously to the chain. So, to avoid this problem certain difficulty level is set for the hash. If we take the example of Bitcoin, when a new block arrives, the hash function hashes all the contents of the block. Later, the hashed output is concatenated with a nonce (a random string). Now, the entire concatenated string is hashed again and performed a difficulty level comparison.
If the new output is less than the difficulty level, the block is added to the chain; otherwise, the nonce is changed and the process is repeated until it passes the standards.
In Bitcoin, for every output W, it is infeasible to find an input X such that:
where: к = nonce,
X = hash of previous block, and W = difficult level.