Example 2: Capacity Planning of a Three Tier Web Application

It is strongly recommended that you read Example 1 before reading this example.

Now that we have understood the basic workings of PerfCenter, lets apply that on a simple web mail application. We will try to model a typical web mail application, and try to analyze its performance. We will then try to do capacity planning of the same web mail system using PerfCenter by doing multiple iterations of simulations. Finally, extremely scaled up hardware architecture will be found out, along with software configuration that would make the web mail system extensively scalable for massive use.

Input file for the current example is availabe here for download.

The various software components required for this application are the Web server, the IMAP server, the authentication server, and the SMTP server. The application is hosted on one or more hosts. The hosts may be connected by a LAN, or may even be separated by WANs. This application can be used in various ways: we can login to the system, read messages, send messages, delete messages.

 

Figure 1: Message sequence chart for login scenario

If such an application is to be deployed in a data center, we would want to arrive at a satisfactory deployment and configuration that would maximize the performance seen by the users, while minimizing the data center resources used. Specifically, in such an application, the scenario response times would be a primary measure of interest for the users. From the point of view of the application owner, the capacity of the system in terms of throughput or number of users would be of most interest. From the point of view of the data center architect, optimal utilization of resources is the top priority. All these performance measures depend on various factors: how each scenario uses the software components at various tiers, the resources consumed at each step of execution of the scenario, how the hardware resources are shared, and how the network resources are shared. PerfCenter allows for easy specification of all these factors.

1   Scenario Descriptions

Figure 2: Activity diagram depicting the processing when a user logs in

 

Figure 3: Activity diagram depicting the processing when a user reads a message

 

Figure 4: Activity diagram depicting the processing when a user deletes a message

 

Figure 5: Activity diagram depicting the processing when a user sends a message

Note that, for a moment, if we ignore contention for hardware, the information required to estimate, say the scenario response times, would be: the probability that an arriving scenario is of certain type (e.g. login, or read), the execution time of the tasks (on all the devices that a task may use), the branching probabilities in the activity graph; the server characteristics such as number of threads, and buffer size (if any); and lastly the load on the system. The load may be either open (specified using scenario arrival rates) or closed (specified using number of users, and think time).

Now, let us consider the impact of contention for hardware resources, or equivalently, the problem of “sizing” the hosts, and deploying the software components on the hosts. Suppose we have four machines available for this application. Every machine can have different hardware. For example, a particular machine may have two or four CPUs, another machine may have CPUs of a different speed. How we deploy the Web, IMAP, SMTP and authentication server on these machines will impact the end-user performance, as well as the utilization of the machines.

Suppose that there are two data centers, on two LANs separated by a WAN, and suppose further that the authentication server must be housed in “Data Center 2”. In this case, we have to answer the additional question of whether the WAN link between the two LANs has enough capacity for the new application, and how this affects the scenario response times. PerfCenter abstract the WAN as a point to point link and for that link, parameters such as the MTU (Maximum Transmission Unit), transmission rate and propagation time need to be defined. The MTU is specified so that the packet queue at the link can be modeled. The network delay would then depend on the size of the message sent from one server to another when a remote call is made. Above figures show the average message size in bytes when a call is made from one server to another for processing a request. We assume that the network delay within a LAN is negligible.

 

Figure 6: Sample deployment of the email system

2   Deployment and Configuration Using PerfCenter

For the Webmail application of Section 3, the data center architect’s decision involves choosing appropriate host configurations, appropriate deployment of application servers on the hosts, sizing the network link, and finally, configuring the four servers.

We present two scenarios in the following. The first is that of sizing for a small user group and the second is that of scaling to an extremely large user group. For both these scenarios, we show PerfCenter’s usefulness in arriving at a satisfactory deployment and configuration design.

2.1   Small User Group Scenario

Consider the example of a company that needs to set up an internal Webmail system, for its 500 employees. We assume that at a given time not more than 100 employees access Webmail. Each user sends a request to the application after an average of 3 seconds after getting a response. An average scenario response time requirement of less than one second has been specified for this application.

An initial system deployment (DPLY1) to support this system is shown in Figure 6. PerfCenter can be run on the input file (Figure 7) corresponding to this deployment, to estimate the performance measures for this system. Note that this is a “closed arrivals” model. Table 1 shows predicted utilization of host CPUs and disks for this deployment and Figure 10 shows the response time as a function of number of users. As can be seen, the response time exceeds 1 second at just 30 users, which is unacceptable. Also, it can be noted that host H1 is over-utilized (98%). This suggests that the initial deployment is not suitable, and other deployments need to be tried. Initially, we will assume that software resources such as number of threads and buffer size are not a bottleneck (set to a very high number in the input file). Similarly, we assume the WAN link capacity to be 100Mbps, which is high enough to not be a bottleneck.

2.2   Identifying Server Deployment

Since a machine was getting over-utilized in DPLY1, in the next deployment we add two new machines to LAN1, hosts H3 and H4. Assuming that the Web server will require the most CPU capacity, we host the Web server on two machines H1 and H4. IMAP and SMTP are moved from host H1 to host H3. Figure 8 shows the new deployment. The input file is appended with the following lines to reflect the new deployment.

undeploy imap H1
undeploy smtp H1
deploy imap H3
deploy smtp H3
deploy web H4

Figure 8: DPLY2: Relieving host H1 bottleneck

Table 1 shows the performance predicted for this deployment (DPLY2). We observe that the disk of H3 (on which IMAP is hosted) is over-utilized (80.6%). We therefore, replace H3 by a faster machine that would double the CPU speed and the disk access rate. Further, for the purpose of consolidation, we host the Web server on one machine, H1, after adding one CPU to it. Figure 9 shows the new deployment (DPLY3). We analyze this deployment by appending the following lines to the input file:

undeploy web H4
set H1:cpu:count 2
diskspeedupfactor3 = 2
cpuspeedupfactor3 = 2

Figure 9: DPLY3: Disk upgraded, web server consolidated

PerfCenter predicts Web server to be more utilized with 77.6%. If we wish to leave some headroom, one more processor could be added to H1 resulting in deployment configuration DPLY4 (Figure 13). The summary of utilizations for all four configurations are shown in table 1.

Table 1: Host Device Utilizations

 

H1 CPU %

H2 CPU %

H3 CPU %

H4 CPU %

IMAP Host Disk %

H2 Disk %

DPLY1

98.1

8.2

NA

NA

41.2

8.8

DPLY2

67.5

15.9

48.6

75.0

80.6

17.0

DPLY3

77.6

17.7

23.9

NA

44.2

18.7

DPLY4

53.8

18.4

27.7

NA

47.1

19.5

 

Figure 10: Average scenario response time

Figure 10 shows a plot of the estimated average scenario response times for all four deployments. As we are designing for 100 users, response time is acceptable for the deployment alternatives DPLY2-DPLY4. However, DPLY4 best balances the desire for adequate utilization of resources (while still leaving some headroom), with the user requirement of scenario response times, and hence is chosen as the final deployment for this application.

2.3   Identifying Network Link Capacity

Now that the deployment is finalized, we move on to to estimating the network link capacity required for Webmail application (which was assumed so far to be 100 Mbps).

Table 2 shows link utilizations predicted using PerfCenter for transmission rates of 256Kbps and 1Mbps. In both cases, the propagation delay is assumed to remain at 1 ms. The table shows that a 256 kbps link should be sufficient for this application.

Table 2: Network Utilization

 

256Kbps

1 Mbps

lan1 to lan2

20.1%

5.1%

lan2 to lan1

18.7%

4.8%

2.4   Selecting Server Thread Count

The number of threads to be set for a server is important for performance delivered by the server. If this number is set too low, hardware resources remain underutilized. If this number is high it increases memory utilization and the influx of connections will bring the server to a standstill. Our goal is to determine the minimum number of server threads necessary to utilize the hardware resource, which still delivers acceptable response time, for a given load. In this case we do this sizing for 100 users (with think time of 3 seconds).

We focus on thread sizing for the Web server. This is because it is the front-end server - the thread sizes for the “downstream” servers can be determined easily once the Web server threads are set. The values of average scenario response time, Web server utilization, Web server host CPU utilization for number of threads varying from 1 to 15 are plotted in Figure 11 and Figure 12. In the figure, as the thread count increases, the utilization of the Web server threads decrease and that of the CPU increases.

Figure 11: Utilization vs web server thread count

 

Figure 12: Response time vs web server thread count

From Figure 12 we note that after thread count reaches seven, response time becomes constant. Increasing thread count further does not have any effect on the average response time. Also, from Figure 11 when thread count is seven the CPU utilization flattens out. Hence we can conclude that for this configuration, seven could be the ideal value of thread count.

The final deployment and configuration for the Web mail application is shown in Figure 13. PerfCenter can be used to determine the maximum capacity of this configuration. As shown in Figure 14, the maximum throughput achieved by the system is around 30 requests/sec. Figure 15 shows the response time performance for this system. Note that the 95% confidence intervals for these two measures have been shown in the corresponding plots.

 

Figure 13: DPLY4: Recommended configuration for small user group

 

Figure 14: Maximum throughput with 95% confidence interval

 

Figure 15: Response time behavior with increasing load, with 95% confidence interval

2.5   Web Mail Scaling

Suppose that we now want to upgrade the Webmail system to support requests arriving at rate of 2000 requests/sec (specified as open arrivals). The underlying systems should be upgraded to handle this heavy load.

To perform hardware upgrade, the host with the highest utilization is identified and upgraded. If this new configuration does not support the required load of 2000 requests/sec, then we again upgrade the host having highest utilization. These steps are repeated till a configuration is found that is predicted to support an arrival rate of 2000 requests/sec. For hardware upgrade we again assume that there are no software bottlenecks.

Step 1: In the earlier configuration as shown in Figure 13, maximum throughput achieved by the system was 30 requests/sec. The Web Server host H1 was the bottleneck resource.

The scaled system should support almost 70 times the load of the earlier system. Thus significantly higher CPU and disk capacity is required for this system. We assume that two new disk assemblies are available to us which speed up the disk access by a factor of 80 in host H3 and by a factor of 20 in host H2. Similarly, we upgrade host H1 to a 32 processor machine. We assume that we have another machine, H4 of this type. We also upgrade H2 and H3 to have 12 processors each. The CPU speeds are also no longer the same, the speed up factors are specified in the input by modifying the “speedup” attribute of the CPUs. The speedup factor can be based on raw CPU speed, or the speedup achieved by some benchmark applications.

diskspeedupfactor2=20
diskspeedupfactor3=80
deploy web H4
set H1:cpu:count 32
set H4:cpu:count 32
set H3:cpu:count 12
set H2:cpu:count 12
cpuspeedupfactor1=2
cpuspeedupfactor3=4
cpuspeedupfactor4=2

PerfCenter predicts that this configuration fails at load of 2000 requests/sec, with H2 becoming over-utilized at 100%. Throughput achieved for this configuration is 1755 requests/sec. Resource utilizations predicted by PerfCenter for this configuration are as shown in Table 3.

Table 3: Resource Utilization for Scaled up Deployment

Step

 

H1

H2

H3

H4

H5

Step 1

CPU count

Utilization %

CPUSpeedup

Disk Speedup

Utilization%

32

88.0

2

12

100

1

20

51.6

12

75.8

4

80

46.5

32

87.7

2

 

Step 2

CPU count

Utilization %

CPUSpeedup

Disk Speedup

Utilization%

32

64.5

2

32

55.9

1

20

58

18

57.0

4

80

52.6

32

63.1

2

32

58.7

2

Step 2: Since host H2 CPUs are the bottleneck, we upgrade H2. We also upgrade H3 and add a 32-processor machine H5, and deploy the Web server on it.

host H5
cpu count 32
cpu buffer 99999
cpu schedP fcfs
cpu speedup 2
end

deploy web H5
set H2:cpu:count 32
set H3:cpu:count 18

For this deployment, PerfCenter estimates the utilizations as shown in Table 3. In this configuration, utilization of all hosts is below 70%. This configuration supports the specified arrival rate of 2000 requests/sec while keeping processor and disk utilizations uniform across all hosts.

Following the approach discussed in Section we can estimate the new thread count value for all the servers as shown in Table 4.

Table 4: Server Thread Sizing

Web-H1

Web-H4

Web-H5

IMAP

SMTP

Auth

45

45

45

135

135

135

2.6   Final Deployment

Final recommended deployment and configuration arrived at by using PerfCenter is shown in Figure 16.

Figure 16: Recommended configuration for scaled up system