Example 1: Modeling a Simple Two-Tier Web Application

The first example is aimed at acquainting you with the basic terminologies of PerfCenter. In order to do that, a simple two-tier web application is considered, and an input file for PerfCenter is built for the said web application.

1   Webcalendar

Figure 1

The networked application considered in this example is a web-calendar system (see Figure 1). The application has two servers, web and database. The two servers are deployed on separate, but identical, physical machines termed as hosts.  The hosts are kept in the same datacenter.

Webcalendar has various usage scenarios such as login, view, add, edit, and delete. As with any networked multi-tier application, scenarios are fulfilled by the web server and database server carrying out tasks while communicating with each other.

Figure 2

Let us first consider the successful login scenario. Figure 2 shows the message sequence chart (MSC) corresponding to this scenario. In the MSC there is one vertical line corresponding to each server in the system and time line is in the direction from top to bottom. A rectangle shows processing being done locally by the server and  horizontal arrows show message flow.

As the figure shows, when a user clicks the login button, the username and password are sent to the Web server. We assume here that the Web server itself verifies the login credentials. We call this processing a “task” (verify_credentials in this case).  If they are valid, the Web server calls the database server, which sends the details for the user’s default calendar view back to the Web server (send_calendar). The Web server then has to do some processing (prepare_calendar_page) to prepare the page that it sends  back through the network to the client browser.

The time required for such a login request to complete can include waiting time by the HTTP request at the web server, processing time at the web server and  waiting and processing times at the database server. It will also include queueing delays experienced by the server threads at the hosts devices such as at the CPU, disk, network, etc. PerfCenter accounts for all these delays in its estimation of end-to-end scenario response times.

With this background, we now show how the software and hardware details are to be specified in PerfCenter.

1.1   Hardware Architecture

PerfCenter offers a host  and a device abstraction to the modeler. Thus a data center is made of hosts, and hosts are made up of devices.  There is no pre-conceived notion of devices. The user declares devices in a device block, and then uses them  inside a host block.

device
intel_xeon_cpu
raid_disk
disk_array
end

The devices are then used in a host, with additional configuration details such as number of instances of the device, the buffer size available for requests waiting to use this device, and the scheduling policy. We can also specify the amount of RAM that the host machine has.

host server_host[2]
intel_xeon_cpu count 1
intel_xeon_cpu buffer 99999
intel_xeon_cpu schedP fcfs
raid_disk count 1
raid_disk buffer 99999
raid_disk schedP fcfs
ram 1000000000
end

The number in the square brackets (“[2]”) declares the number of hosts of this configuration that are available in the data center.

1.2   Software Architecture

To describe an application’s software architecture in Perfcenter, we must know the various software server involved and the message sequence charts should have been drawn for all relevant usage scenarios. In Perfcenter terminology, we use the term “software server” to describe the server application. This distinguishes it from a physical machine which runs this application. We term this machine as “host”.  Thus a Web server is a software server that runs as a server process on a specific physical host.

Once the MSCs are drawn, we can list the various “tasks” that each server does.

Figure 3: Display Calendar Event

 

Figure 4: Add event

 

Figure 5: Delete Event

For example, Figures 2 to 5 show the MSCs for all the relevant scenarios that we are choosing to focus on for Webcalendar. The MSCs  result in the following set of tasks for the Webserver:

  1. verify_credentials
  2. prepare_calendar_page
  3. verify_session
  4. display_success_message
  5. display_error_message

The following set of tasks are identified for the database server:

  1. send_calendar
  2. send_calendar_event
  3. save_event
  4. delete_event

Apart from the tasks that a server does, a server is also described by the following attributes:

  1. Maximum number of threads configured
  2. Maximum number of requests that can be waiting for a thread to be free
  3. Scheduling policy by which the next request is picked by a thread
  4. The static footprint of server in RAM
  5. Memory footprint of a request that is serviced by the server

Consider that the web server has static footprint of 256MB, with each request of 50KB. The db server has static footprint of 384 MB with each request of 60KB. This information can now be  specified as follows in PerfCenter:

server web
thread count 150
thread buffer 0
thread schedP fcfs
task verify_credentials
task prepare_calendar_page
task verify_session
task display_success_message
task display_error_message
staticsize 256000000
requestsize 50000
end

server db
thread count 150
thread buffer 0
thread schedP fcfs
task send_calendar
task send_calendar_event
task delete_event
task save_event
staticsize 384000000
requestsize 60000
end

Now we can specify task details. To be able to predict metrics such as response times, utilizations, etc PerfCenter requires the specification of service time  of each task on each hardware device. E.g if task verify_session requires 10 ms of processing time on the CPU of the machine on which the Web server is running, then we specify this detail as follows:

task verify_session
intel_xeon_cpu servt 0.010
end

We can specify how much service time a task requires on other devices also. E.g. if this task writes data to a disk and needs 5 ms to do this writing, we can specify this as follows:

task verify_credentials
intel_xeon_cpu servt 0.020
raid_disk servt 0.040
end

2   Scenario specification

 

This scenario description is captured with a scenario block, where the MSCs are specified in text form. Roughly speaking, the format interprets the MSC as a directed graph where the vertices are the tasks and edges are drawn from a calling task to a called task. The following scenario shows the always-successful login scenario.

scenario Successful_Login prob 0.02
    user verify_credentials 100 SYNC
    verify_credentials send_calendar 100 SYNC
    send_calendar prepare_calendar_page 30000
end

The keyword user represents the source of the first call. The number next to an edge declaration (which is a call) denotes the number of bytes sent in that call . The keyword “SYNC” shows that this was a synchronous call. This means that the caller thread waits (idly) for a response from the called server, before moving on to the next task.

Scenarios can have probabilistic branching. This is useful to capture random events such as validation/rejection of login credentials. Thus a login scenario which allows for login failing is specified as follows:

scenario Login prob 0.05
    user verify_credentials 100 SYNC
    branch prob 0.05
        verify_credentials display_error_message 100
    end
    branch prob 0.95
        verify_credentials send_calendar 100 SYNC
        send_calendar prepare_calendar_page 30000
    end
end

The scenario mix is specified by the scenario probability (next to the scenario name). This is the probability that an arriving scenario is of type “login”.

 

 

 

 

3   Server Deployment Architecture

Once the hardware and software details have been specified separately, we must tell PerfCenter how the servers are deployed on the host machines. This is done using the deploy statement as follows:

deploy web server_host[1]
deploy db server_host[2]

More than one software servers can also be deployed on the same host.

4   Network Architecture

PerfCenter has a simplistic model to capture network delays. We can declare LANs, and declare WAN links between LANs. Links are characterized by transmission rate, Maximum Transfer Unit (maximum packet size), and propagation delay. These are declared as follows:

lan
    lan1
    lan2
end

link link_name lan1 lan2
trans 100 Mbps
mtu 1495 bytes
prop 3 us
headersize 40 bytes
end

Once networks architecture is declared, we describe which hosts are on which LAN. In this example, as we have only one host machine, and only one LAN, we use just lan1, and there is no WAN link.

deploy server_host[1] lan1
deploy server_host[2] lan1

Note that PerfCenter does not model network contention on a LAN. Thus if two hosts are on the same LAN, we predict no network delays. Network delays are simulated only when hosts are separated by WAN links.

5   Load Parameters

Finally, we must specify the load parameters at which we want to predict performance of this system.  Load on the system can be specified in two ways. In case the system is being modeled with open arrivals, an arrival rate value can be used to indicate the load. E.g.

loadparams
arate 20
end

Otherwise, if the system is being modeled with closed arrivals, the number of users can be specified using the keyword noofusers and the average time between receiving a response and sending a request by a user is specified using the keyword thinktime. The keyword exp indicates that the thinktime is exponentially distributed with the number in paranthesis denoting the mean of this distribution.

loadparams
noofusers 20
thinktime exp(2)
end

6   Model Parameters

Each simulation / analytical modeling methog  has a set of parameters that determine the way the model itself behaves. For detailed information of all these parameters, refer the reference manual.

modelparams
method simulation
type open
noofrequests 10000
end

As the keywords suggest, in the above block, we specify that the modeling method should be “simulation”, the arrivals are “open” and the number of requests that should be generated for determining the metrics is 10000.

7   Output Script

Each PerfCenter script can have print statements. Standard performance metrics of queueing systems, such as  response time, throughput, utilization etc can be printed for either whole system, scenarios, servers or devices. In this block, we just print the basic throughput and response time of the whole webcal system. For varying the user load on the system from 5 to 65, we declare a variable named nusr, and run the simulation for the range of 5 to 65 users with step increase of 5 users per iteration.

loadparams
noofusers nusr
thinktime exp(2)
end

print "Users tput respt"
while(nusr<70)
print nusr+" "+tput()+" "+respt()
nusr=nusr+5
end

The entire input file for the above example is available here. When this is run as

$> ./PerfCenter input_example_1.perfcenter

the following output is produced:

8   Output

Input file input_example_1.perfcenter
Started at 2013-05-17 22:10:03
Method:Simulation
Users,respt,tput,util(server_host[1]:intel_xeon_cpu),util(server_host[2]:intel_xeon_cpu)
5.0,0.022581175485908076,0.11735637231079098,0.025029936441565672,0.020686034200702284
10.0,0.02272896494412552,0.22570817600342558,0.04472194853149975,0.03606095810490662
15.0,0.022487368448079574,0.321303834813409,0.06126913478926138,0.04936618032844422
20.0,0.02538911485390682,0.390278232139879,0.07630507707219691,0.06004891434677898
25.0,0.024986574796469808,0.45944890510605846,0.0872124970627657,0.06729065895806925
30.0,0.026641714521422676,0.5403470636219021,0.09771693577258239,0.07630081720285668
35.0,0.02877589498456244,0.6254634315145378,0.10656791955268094,0.0823934831092261
40.0,0.02878210145629887,0.7198412748328787,0.11544599526292654,0.08956691033324857
45.0,0.029019602750791698,0.8076035146668954,0.12247947120230626,0.09621975787926634
50.0,0.030950626622235234,0.8984730573339728,0.12796207713458882,0.1049721230550668
55.0,0.03812285129824021,0.9509061817752228,0.13331367155156165,0.11083717610096372
60.0,0.032805421466158516,0.9077479006788731,0.13720421525775955,0.11448412665099689
65.0,0.04454218277264946,0.9611699540550014,0.14153190177959976,0.1207920452455721
70.0,0.061860126002421106,0.9243967972043229,0.14416084979928823,0.13125483506958716
75.0,0.05450260823944052,0.9933893667677629,0.14910274543992827,0.13463460074582578
80.0,0.07608143334386598,0.9749274142395621,0.14981865201861905,0.14042617056117065
85.0,0.061564759729020375,0.9867724329737934,0.15153071076140415,0.14557179901187653
90.0,0.14127484914369498,1.0015694437576357,0.15464017310746073,0.15116351777026285
95.0,0.1387350628347294,0.9974741734523689,0.15678296361551355,0.15414895979148183

It is obvious how easy it is to play with various “what-if” scenarios of deployment and configuration, with this model. E.g. let’s see what happens if both the software servers are deployed on one host. Here is the output:

With both servers on the same host:
Users,respt,tput,util(server_host[1]:intel_xeon_cpu)
5.0,0.019322623979013992,0.10797411307311244,0.04517829617058776
10.0,0.020773942377372862,0.19529898461327982,0.08076946597735711
15.0,0.022354204090509512,0.25304005654379225,0.11087360538001063
20.0,0.024962494441261644,0.3344248199058927,0.13291005386389534
25.0,0.026548411090889575,0.41460188212670457,0.1564875173655372
30.0,0.030919329838513643,0.47932395673730355,0.17413483366294655
35.0,0.03372012716005911,0.5128809815055085,0.19013010820707765
40.0,0.030334692073826646,0.6379842798121199,0.204855589053652
45.0,0.03767609678304071,0.6903692159766088,0.21696733437343504
50.0,0.0482591101846096,0.7432090196064101,0.231252193310386
55.0,0.05412046388419441,0.8455126316472297,0.2445190154573196
60.0,0.04947348312729072,0.8644173925692559,0.2556003177737833
65.0,0.1065512694637281,0.8890083311038941,0.267118371072928
70.0,0.12864680788906002,0.952096965698719,0.2741219575700587
75.0,0.08444823380181458,1.0536868107071202,0.28624231536651673
80.0,0.15133100816462866,0.9899099668157659,0.2934815858382934
85.0,0.14139234759672906,1.0852894553190962,0.30115921831838033
90.0,0.16167300235412568,0.9764669094345559,0.30633364688492687
95.0,0.16685385782948373,1.0407200122213707,0.3145909625850369
Completed at 2013-05-17 22:10:58

We can see that the server_host[1]’s cpu’s utilization is now a sum of the two host utilizations in the earlier scenario. However, response time increases are harder to predict, and PerfCenter does that for you. The response time at 40 users increases by a factor of 5%, but response time at 95 users increases by a factor of 20%.