In this assignment, we will build the autoscaling functionality that many cloud management systems provide to the cloud users. Real-life systems and cloud applications today (e.g., web servers, database servers) are built to be scalable, so that system performance can keep up with increasing load. A cloud application deployed on VMs or containers can be scaled either horizontally (by adding more replica VMs) or vertically (by increasing the resources allocated to the existing VMs). We will focus on horizontal scaling in this assignment. The scaling can be triggered in many ways, e.g., when the CPU utilization of the VMs running the application crosses a threshold. Most cloud management systems today come with some autoscaling support. When you run an application on a VM in a cloud and enable autoscaling, the cloud management system monitors the utilization of various hardware resources (CPU, memory, disk, and so on), and spawns a new instance/replica of your application node if some resource utilization crosses a threshold. We will build a simplified version of this feature in this assignment.
In the first part of this assignment, you will build a simple autoscaling client-server application. You must setup a client that talks to "N" server replicas and sends multiple "requests" to these servers. The servers perform some computation for each request and send suitable responses back to the client. You may use any simple multi-threaded socket based client-server application that you may have developed in previous courses (e.g., a simple key-value store), or you may choose to use a more realistic application (e.g., a web client and server). The choice of the application, and which requests/responses to handle, are completely left to you. We are aiming to do autoscaling by monitoring the CPU utilization of server VMs, so it is best if you pick an application where the request processing is CPU-intensive.
You must design your server application to be horizontally scalable. That is, when requests get distributed across replicas, you must ensure that a server replica is capable of handling all requests coming to it. For example, if you are building a key value store, you must ensure that a server replica is able to handle all get/put requests coming to it, say, by sharing the key-value database across server replicas. You may make any simplifying assumptions as required to enable easy sharing of distributed state across replicas. Or, to make your life simple, you can use a stateless application server, where the server performs some CPU-intensive computation on the request to generate a response, without requiring any stored state to generate the response.
Your client must be capable of sending multiple concurrent requests to multiple server replicas in parallel, in order to fully load the servers. That is, your client should be able to saturate the CPUs of the N servers without becoming a bottleneck itself. You may use a multi-threaded architecture at your client to efficiently generate load. The client can use any policy to distribute requests to servers, e.g., round robin, or divert specific requests to specific servers using some intelligent logic. Further, your client must be capable of generating variable amounts of traffic. For example, in a "low load" mode, your client must generate a low amount of traffic such that none of the N servers are overloaded. In a "high load" mode, the client must be able to saturate the CPUs of all N servers. The client can shift between these modes at fixed time intervals, or on receiving external triggers. The client can run directly on the host or inside a VM.
The crux of this assignment is building a monitoring/autoscaling program. This program monitors the CPU utilization of all server VMs in order to detect overload and perform autoscaling. For this part of the assignment, we are expecting you to demonstrate autoscaling in the following manner. Start your server application on N replicas, and your client in a low load mode. The monitoring program must monitor the CPU utilizations at the server replica VMs. When the client shifts to a high load mode, the monitoring program should detect overload, must spawn a new server replica VM, and notify the client. The client must then start using the N+1 servers to serve requests, which should hopefully ameliorate the overload situation at the existing servers.
You are free to decide the communication/notification mechanism between the autoscaling program and your client, in order to let the client know that a new server replica is available. You can also implement the autoscaling program in any programming language of your choice that has support for the libvirt APIs. The only constraint we impose on the monitoring and autoscaling program is that it should perform the monitoring and autoscaling by invoking the libvirt APIs, and not by any other means (e.g., use the system command to run other commandline tools).
You must choose the load levels in the low and high modes of your client carefully. If your high load is much larger than your low load, the autoscaling application may continue to send out overload triggers even after spawning a new replica. For the sake of simplicity, you may want to make your high load level just slightly higher than your low load level, so that the overload can be mitigated by spawning just one extra server replica.
In this assignment, we are expecting you to demonstrate a simple scale up from N to N+1 servers, where N=1. That is, you must start your application with 1 server. When under overload, your autoscaling program should spawn the second server, which should mitigate the overload situation. We will not be evaluating your assignment for higher values of N, though you are welcome to build a generic autoscaling framework that works for all values of N, subject to the constraints of your system to host VMs.
You are also responsible for designing a "demo" to showcase your autoscaling logic. You must think about you will convince the instructor/TA that your code works correctly. You may want to plot some metrics before and after the autoscaling event to demonstrate that your solution has worked. For example, you can measure the CPU utilizations of the server replicas, and show that the overload has reduced. You may also measure the average throughput or latency of request processing at the servers, and show that the performance has improved after autoscaling.
Some helpful hints:
You can think of your monitoring and autoscaling application as a very simple cloud management system that is managing the various VMs of an application. You can extend this assignment in many ways to make it more realistic. Below are a few suggestions: