Theoretically, a microservices-based architecture should be less reliable than other architectural approaches. Microservices are, after all, distributed systems, so there is an inherent risk of network failures adding to the usual sources of errors. Also, microservices run on several servers, increasing the likelihood of hardware failures.

To ensure high availability, a microservices-based architecture has to be correctly designed. The communication between microservices has to form a kind of firewall: The failure of a microservice should not propagate. This prevents problems from arising in an individual microservice and leading to a failure of the entire system.

To achieve this, a microservice which is calling another microservice has to somehow keep working when a failure occurs. One way to do this might be to assume some default values. Alternatively, the failure might lead to a graceful degradation such as some sort of reduced service.

How a failure is dealt with technically can be critical: the operating-system-level timeout for TCP/IP connections is often set to five minutes, for example. If, due to the failure of a microservice, requests run into this timeout, the thread is blocked for five minutes. At some point, all threads will be blocked. If that happens, the calling system might fail, as it cannot do anything else apart from wait for timeouts. This can be avoided by specifying shorter timeouts for the calls.

These concepts have been around much longer than the concept of microservices. The book Release It[1] describes, in detail, these sorts of challenges and approaches for solving them. When these approaches are implemented, microservice-based systems can tolerate the failure of entire microservices and therefore become more robust than a deployment monolith.

When compared to deployment monoliths, microservices have the additional benefit that they distribute the system into multiple processes. These processes are better isolated from each other. A deployment monolith only starts one process, and therefore a memory leak or a piece of functionality using up a lot of computing resources can make the whole system fail. Often, these sorts of errors are simple programming mistakes or slips. The distribution into microservices prevents such situations, as only a single microservice would be failing in such a scenario.

  • [1] Michael T. Nygard. 2007. Release It!: Design and Deploy Production-Ready Software. Raleigh, N.C.:Pragmatic Programmers.
< Prev   CONTENTS   Source   Next >