Example 4: Timeouts and Retries

In the previous tutorials, we saw a way to model simple two tier application as a closed or open system. In this tutorial, we would try to use the request timeouts and user retries feature of PerfCenter.

Often in software servers, like (but not limited to) web servers, the requests from user end time out. These timeouts are due to technical reasons like TCP connection timeout etc, or because of human behavioral phenomenon like the user getting impatient. The impatient user might issue a new request, with a certain probability, or they may leave the system without issuing any request. Users also tend to do this re-issue of request only a certain maximum number of times. Once they have done the maximum number of retries, they just giveup on the system and leave.

When the user issues a new request when the old one is still being processed by the system, there is no need to process that old request, as its response is not going to be read by anyone on client side. Sadly, the fact that the user has infact issued a fresh request is not perceivable from within the system. The application has no way to differentiate between normal good requests, and such bad timed out requests. Hence, application wastes time in processing those bad requests, which are counted towards throughput. Clearly, now as the requests are of two types, bad and good, the throughput can also be divided into two parts: badput and goodput. Its quite intuitive that the badput is the average number of timed out "bad" requests processed by the system per unit time, and goodput is the average number of normal "good" requests processed by the system.

This whole phenomenon can be modeled in PerfCenter via the facility of timeout and retries. These parameters, being part of the load characteristic, are specified in the loadparams block. Following loadparam block specifies the timeout value of 900 ms, user retry probability of 0.4 and maximum 3 retries.

loadparam
timeout 0.9
retryprob 0.4
maxretry 3
end

The timeout value specifies the time after which the user requests will time out. Generally speaking, this should be greater than twice the average response time of the system under the given load conditions. The retryprob specifies the probability with which the customer would issue a new request once the current request is timed out. maxretryspecifies the maximum number of times the customer re-issues a timed out request.

The function bput() gives the badput. It accepts either the scenario name as argument, or no argument. The no-args version of the function gives out the badput of the whole system, otherwise it will give the badput of the scenario that is passed as argument. Function gput() operates similar to bput(), except that it gives the goodput of the scenario or the system, based on the argument.

There are two more functions, which give valuable insights to the software behavior. Once the system buffers are full, the system cannot accept any more requests, and more incoming requests are dropped silently. This is the droprate of the system, and the average value of which is given by droprate() function. It takes scenario name as argument or no argument. Another distinct metric calculable is the number of requests that timeout in the buffer itself. These requests join the queue when they arrive in the system, but they never get a chance to see real service, as the requests that came before it are still being serviced. Such requests end up getting timed out in the buffer. Any high value here is clear indication of the need of longer buffer size. This fraction of timed out request in buffer is given by the function bufftimeout(). This function also accepts the scenario name as argument or no argument.