Experiences with JVM-Based Microservices in the Amazon Cloud (Sascha Mollering)
By Sascha Mollering, zanox AG
During the last months zanox has implemented a lightweight microservices architecture in Amazon Web Services (AWS), which runs in several AWS regions. Regions divide the Amazon Cloud into sections like US-East or EU-West, which each have their own data centers. They work completely independently of each other and do not exchange any data directly. Different AWS regions are used because latency is very important for this type of application and is minimized by latency-based routing. In addition, it was a fundamental aim to design the architecture in an event-driven manner. Furthermore, the individual services were intended not to communicate directly but rather to be separated by message queues respectively bus systems. An Apache Kafka cluster as message bus in the zanox data center serves as central point of synchronization for the different regions. Each service is implemented as a stateless application. The state is stored in external systems like the bus systems, Amazon ElastiCache (based on the NoSQL database Redis), the data stream processing technology Amazon Kinesis, and the NoSQL database Amazon DynamoDB. The JVM serves as basis for the implementation of the individual services. We chose Vert.x and the embedded web server Jetty as frameworks. We developed all applications as self- contained services so that a Fat JAR, which can easily be started via Java -jar, is generated at the end of the build process.
There is no need to install any additional components or an application server. Vert.x serves as basis framework for the HTTP part of the architecture. Within the application work is performed almost completely asynchronously to achieve high performance. For the remaining components we use Jetty as framework: These act either as Kafka/Kinesis consumer or update the Redis cache for the HTTP layer. All called applications are delivered in Docker containers. This enables the use of a uniform deployment mechanism independent of the utilized technology. To be able to deliver the services independently in the different regions, an individual Docker Registry storing the Docker images in a S3 bucket was implemented in each region. S3 is a service that enables the storage of large file on Amazon server.
If you intend to use Cloud Services, you have to address the question of whether you want to use the managed services of a cloud provider or develop and run the infrastructure yourself. zanox decided to use the managed services of a cloud provider because building and administrating proprietary infrastructure modules does not provide any business value. The EC2 computers of the Amazon portfolio are pure infrastructure. IAM, on the other hand, offers comprehensive security mechanisms. In the deployed services the AWS Java SDK is used, which enables it, in combination with IAM roles for EC2, to generate applications that are able to access the managed services of AWS without using explicit credentials. During initial bootstrapping an IAM role containing the necessary permissions is assigned to an EC2 instance. Via the Metadata Service the AWS SDK is given the necessary credentials. This enables the application to access the managed services defined in the role. Thus, an application can be that sends metrics to the monitoring system Amazon Cloud Watch and events to the data streaming processing solution Amazon Kinesis without having to roll out explicit credentials together with the application.
All applications are equipped with REST interfaces for heartbeats and health checks so that the application itself as well as the infrastructure necessary for the availability of the application can be monitored at all times: Each application uses health checks to monitor the infrastructure components it uses. Application scaling is implemented via Elastic Load Balancing (ELB) and AutoScaling to be able to achieve a fine-grained application depending on the concrete load. AutoScaling starts additional EC2 instances if needed. ELB distributes the load between the instances. The AWS ELB service is not only suitable for web applications working with HTTP protocols but for all types of applications. A health check can also be implemented based on a TCP protocol without HTTP. This is even simpler than an HTTP healthcheck.
Still the developer team decided to implement the ELB healthchecks via HTTP for all services to achieve the goal that they all behave exactly the same, independent of the implemented logic, the used frameworks, and the language. It is also quite possible that in the future applications that do not run on JVM and, for instance, use Go or Python as programming languages, are deployed in AWS.
For the ELB healthcheck zanox uses the application heartbeat URL. As a result, traffic is only directed to the application respectively potentially necessary infrastructure scaling operations are only performed once the EC2 instance with the application has properly been started and the heartbeat was successfully monitored.
For application monitoring Amazon CloudWatch is a good choice as CloudWatch alarms can be used to define scaling events for the AutoScaling Policies, that is, the infrastructure scales automatically based on metrics. For this purpose, EC2 basis metrics like CPU can be used, for instance. Alternatively, it is possible to send your own metrics to CloudWatch. For this purpose, this project uses a fork of the project jmxtrans-agent, which uses the CloudWatch API to send JMX metrics to the monitoring system. JMX (Java Management Extension) is the standard for monitoring and metrics in the Java world. Besides metrics are sent from within the application (i.e., from within the business logic) using the library Coda Hale Metrics and a module for the CloudWatch integration by Blacklocus.
A slightly different approach is chosen for the logging: In a cloud environment it is never possible to rule out that a server instance is abruptly terminated. This often causes the sudden loss of data that are stored on the server. Log files are an example for that. For this reason, a logstash-forwarder runs in parallel to the core application on the server for sending the log entries to our ELK-Service running in our own data center. This stack consists of Elasticsearch for storage, Logstash for parsing the log data, and Kibana for UI-based analysis. ELK is an acronym for Elasticsearch, Logstash, und Kibana. In addition, a UUID is calculated for each request respectively each event in our HTTP layer so that log entries can still be assigned to events after EC2 instances have ceased to exist.
The pattern of microservices architectures fits well to the dynamic approach of Amazon Cloud if the architecture is well designed and implemented. The clear advantage over implementing in your own data center is the infrastructure flexibility. This makes it possible to implement a nearly endlessly scalable architecture, which is, in addition, very cost efficient.
-  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
-  https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
-  https://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-add-elb-healthcheck.html
-  https://github.com/SaschaMoellering/jmxtrans-agent
-  https://dropwizard.github.io/metrics/
-  https://github.com/blacklocus/metrics-cloudwatch
-  https://github.com/elastic/logstash-forwarder