IoT Data Management Versus Traditional Database Management Systems

Based on the IoT data lifecycle discussed earlier, IoT data management system is divided into an online, real-time frontend that interacts directly with the interconnected IoT objects and sensors, and associate offline backend that handles the mass storage and in-depth analysis of IoT information. The data management frontend is communication-intensive, involving the propagation of query requests and results to and from sensors and sensible objects. The backend is storage-intensive, involving the mass storage of created data for later processes, as well as analysis and additional in-depth queries.

Although the storage elements reside on the backend, they interact with the frontend on a frequent basis through continuous updates, and are referred to as online. The autonomous edges in the lifecycle can be considered more communication-intensive than storage-intensive as they provide real-time data to certain queries.

This envisioned data management architecture is considerably different from the existing DBMS, which are mainly storage-centric. In traditional database systems, the bulk of data is collected from predefined and finite sources, and then stored in scalar form according to strict normalization rules in relations. Queries are used to retrieve specific “summary” views of the system or update specific items within the database. New information/data is inserted into the database when required via insertion queries. Query operations are usually local, with execution costs bound to processing and intermediate storage.

Transaction management methods guarantee the ACID properties to enforce overall data integrity. Even if the database is distributed over multiple sites, query processing and distributed transaction management are enforced. The execution of distributed queries is based on the transparency principle, which dictates that the database is viewed logically as one centralized unit, and the ACID properties are guaranteed via the two-phase commit protocol.

In IoT systems, the picture is dramatically different, with a massive and growing number of data sources: sensors, RFIDs, embedded systems, and mobile devices. Contrary to occasional updates and queries submitted to traditional DBMSs, data streams constantly from a multitude of “Things” to IoT data stores, and queries are more frequent with more versatile needs.

Hierarchical data reporting and aggregation may be needed for scalability guarantees as well as to enable more prompt processing functionality. The strict relational database schema and the relational normalization practice may be relaxed in favor of more unstructured and flexible forms that adapt to diverse data types and sophisticated queries. Although distributed DBMSs optimize queries based on communication considerations, optimizers base their decisions on fixed and well-defined schemas. This may not be the case in IoT where new data sources and streaming, localized data create a highly dynamic environment for query optimizers. Striving to guarantee the transparency requirements imposed in distributed DBMSs on IoT data management systems is challenging, if not impossible. Furthermore, transparency may not even be required in IoT because innovative applications and services may require location and context awareness. Although maintaining ACID properties in bounded IoT spaces (subsystems) while executing transactions can be managed, challenges exist for more globalized space. However, mobile data sources and how their generated data can be incorporated into the already established data space is a novel challenge that is yet to be addressed by IoT data management systems [5,9].

Common Problems in IoT Data Management

Working with IoT, information processing requires a shorter time span than with information collected from humans, with the following issues:

Scalability and Agility: The sheer size of IoT data traffic and its immediacy makes this data management issue most pressing.

Security: Security is a significant challenge for organizations planning and implementing IoT solutions. According to estimates, through 2022, half of all security budgets for IoT will go to fault remediation. Preventing unauthorized access has become forefront.

Applications of Database Management

Various applications are used in IoT database management [9]. Applications such as HBase, Cassandra, CouchDB, DynamoDB, and MongoDB databases used by IoT devices, that store large amounts of data and access them in a random manner, is discussed below.

i. HBase: HBase is a distributed column-oriented database built on top of the Hadoop File System (HDFS). It is open-source and horizontally scalable. HBase is a data model that provides quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the HDFS. It is a part of the Hadoop ecosystem. It provides random real-time read/ write access to data in the HDFS. Data is stored in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS by randomly using HBase. HBase is laid on top of the HDFS and provides read and write access. HBase Databases are used in devices with COIB frameworks because they help in scalability and consistency in the data that is needed. Table 10.2 lists the differences between HBase and RDBMS.

Features of HBase

  • - Scalability: It supports scalability in both linear and modular form.
  • - Sharding: It supports automatic sharding of tables, and is also configurable.
  • - Distributed Storage: It supports distributed storage like HDFS.

Table 10.2 Differences between HBase and RDBMS

HBase

RDBMS

Is schema-less and doesn't have the concept of fixed columns schema

Is governed by its schema, which describes the entire table structure

Built for wide tables, and is horizontally scalable

Thin and built for small tables, and is hard to scale

No transactions

Transactional

Has de-normalized data

Has normalized data

Good for both semi-structured and structured data

Good for structured data

  • - Consistency: It supports consistent read and write operations.
  • - Failover Support: It supports automatic failover.
  • - API Support: It supports Java APIs so clients can access it easily.
  • - MapReduce Support: It supports MapReduce for parallel processing of large data volumes.
  • - Back Up Support: It supports back up of Hadoop MapReduce jobs in HBase tables.
  • - Real-Time Processing: It supports both block cache and Bloom filters. So, real-time query processing is easy.

ii. Apache Cassandra: The Apache Cassandra is a powerful open-source distributed database system that works extremely well to handle huge volumes of records spread across multiple commodity servers. It can be easily scaled to meet sudden increase in demand by deploying multinode Cassandra clusters, meets high availability requirements, and there is no single point of failure. It is one of the most efficient NoSQL databases available today. Table 10.3 lists the differences between Cassandra and RDBMS.

Features of Cassandra

  • - Elastic scalability
  • - Always on architecture
  • - Fast linear-scale performance
  • - Flexible data storage
  • - Easy data distribution
  • - Transaction support
  • - Fast writes

iii. CouchDB: CouchDB is an open-source NoSQL database based on common standards to facilitate web accessibility and compatibility with a variety of devices. NoSQL databases are useful for very large sets of distributed data,

Table 10.3 Differences between Cassandra and RDBMS

Cassandra

RDBMS

Used to deal with unstructured data

Used to deal with structured data

Has a flexible schema

Has a fixed schema

A table is a list of nested key-value pairs

A table is an array of arrays

Keyspace is the outermost container comprising data corresponding to an application

Database is the outermost container comprising data corresponding to an application

Tables or column are the entities of a keyspace

Tables are the entities of a database

Row is a unit of replication

Row is an individual record

Column is a unit of storage

Column represents the attributes of a relation

Relationships are represented using collections

There is a concept of foreign keys, joins, etc.

especially for large amounts of unstructured data in various formats, a characteristic of big data. Data in CouchDB is stored in a format called JavaScript object notification (JSON), and is organized as key-value pairs. The key is a unique data identifier and the value is the data itself, acting as a pointer to the data’s location. All standard database functions are performed by JavaScript. Table 10.4 lists the differences between CouchDB and RDBMS.

Features of CouchDB

  • - Easy cross-server replication through instances.
  • - Support for conflict resolution and master set-up.
  • - Quick indexing and search and retrieval.
  • - Documents are accessed through browsers, and indices can be queried through HTTR
  • - Index, combine, and transform operations are performed with JavaScript.
  • - Advanced MapReduce.

Table 10.4 Differences between CouchDB and RDBMS

CouchDB

RDBMS

Data is stored in documents

Data is stored in tables

Replication is easy

Replication is difficult

iv. DynamoDB: DynamoDB uses the NoSQL model, that is, it is a nonrelational database system. DynamoDB is a hosted NoSQL database offered by Amazon Web Services (AWS). It offers reliable performance even as it scales, a managed experience, so you won’t be SSH-ing into servers to upgrade the crypto libraries and a small, simple API, allowing for simple key-value access along with more advanced query patterns. Table 10.5 lists the differences between DynamoDB and RDBMS.

Features of Dynamo

  • - DynamoDB spreads data and requests traffic to multiple servers to provide better throughput and storage.
  • - Data is stored on a solid state drive and is replicated over multiple availability zones to provide high availability and fault tolerance.
  • - Allows you to decide the expiry time of items by allowing you to set a time-to-live parameter. The item will be deleted after this time expires, which makes storage management more efficient.
  • - We only have to pay for throughput, making it cost-effective.

v. MongoDB: MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. MongoDB is used in all modern applications that require big data, fast features development, and flexible deployment. Table 10.6 describes the difference between MongoDB and RDBMS.

Table 10.5 Differences between DynamoDB and RDBMS

DynamoDB

RDBMS(SQL)

Uses HTTP/HTTPS requests and API operations

The SQL database system uses determined connection and SQL commands

Uses the primary key, and a schema is not required to be defined in advance. Uses various data sources

Fundamental structure is a table, and its schema must be defined in advance before any operation

Only the primary key is available for querying. For more flexibility in querying data, one must use secondary indexes

All table information is accessible, and we can query exactly all data. SQL is rich in query processing

Information is stored as items in a table, and the item structure can vary as it is schema-less

Information is stored in rows of tables

Table 10.6 Difference between MongoDB and RDBMS

MongoDB (NoSQL)

RDBMS (SQL)

Non-relational and document- oriented database

Relational database

Coding starts without worrying about tables. Objects can be modified later at a low development cost

Need to design tables, data structure, and relations before coding

Provides JavaScript client for querying

Does not provide JavaScript client for querying

Collection-based and key-value pair

Table-based

Does not support foreign key, joins, and triggers

Supports foreign key, joins, and triggers

Provides only one level of locking

Provides very fine granularity of locking

Contains dynamic schema

Contains a predefined schema

Features of MongoDB

  • - Supports ad hoc queries.
  • - Supports map reduce and aggregation tools.
  • - Uses JavaScript instead of Procedures.
  • - It is a schema-less database written in C++.
  • - Provides high performance.
  • - Stores files of any size easily without complicating your stack.
 
Source
< Prev   CONTENTS   Source   Next >