Transport Layer: TCP analysis ================================== ** Outline ** - Ideal window size and bandwidth delay product - Buffer sizing for TCP - Simple model for TCP throughput - Understanding TCP fairness - RED gateways * Ideally, the window size of the transport protocol should be set to the BDP of the path, where bandwidth = rate of the bottleneck link, and RTT is end-to-end RTT from sending data to receiving ACK. Understand why. Suppose BDP = N packets. Then, after you have sent N packets, the ACK for the first packet would have come back, enabling you to send the next packet and so on. So in the ideal case, a window size of exactly N will give you just enough pipelining to fully utilize the bottleneck. With less than N packets in flight, bottleneck may be idle some times, leading to lower throughput. If you have more than N packets in flight, extra packets will get queued up at the bottleneck router. You won't increase throughput, but will increase delay per packet. So, in ideal world, maintain BDP worth of window size, and no buffer at bottleneck router. * Understand the graph of throughput vs. number of packets in flight (increases up to BDP, and flattens out). Understand graph of observed delay of packets vs number of packets in flight (flat initially as no packet sees queueing, increases linearly after numbre of pakcets crosses BDP due to queueing delays). * In practice, BDP is hard to measure, so TCP uses a self-learning mechanism. It increases window size when all is going well, and reduces when it observes congestion (as indicated by packet loss for example). Ensures that it hovers around the ideal window size. So can undershoot and overshoot the ideal window. A buffer in the bottleneck link is needed to accommodate these fluctuations. * BDP is also variable due to cross traffic (other flows in the network). So, bottleneck buffer is needed to absorb these fluctuations from other traffic and ensure bottelneck is utilized and packets are not dropped when available bandwidth changes suddenly. * The bottleneck buffer size is a very important parameter in TCP performance. What is the ideal bottleneck buffer size? If no cross traffic and ideal world, its ok to not have any buffers (as we explain above, buffer only hurts delay with no throughput benefit in ideal case). However, given that BDP is not known and is variable, and TCP uses some heuristics to discover BDP and ideal window size, how to set buffer size in the real world? * Consider a very simple TCP model. TCP sender is in congestion avoidance steady state. Increases window up to W packets, packet loss happens, reduces window to W/2. Let's think in terms of packets for ease of analysis. * Now, packet loss happens when window reaches W packets. At this point, the bottleneck buffer and the pipe have both filled up and so bottleneck router has dropped a packet. Let the size of bottleneck buffer be B packets. Let the rate of bottleneck be R and end to end delay on the path be D. Therefore, W = R * D + B. * Now, the sender's window reduces to W/2. Under the sender gets at least W/2 acks, it cannot send next packet. In this time that the sender slows down, the bottleneck buffer should have enough packets to sustain the link. Since sender receives acks at the bottleneck rate (ack clocking), the time taken for sender to receive W/2 acks = time taken for bottleneck link to send W/2 pakcets. That is, the bottleneck buffer should be able to send at least W/2 packets at bottleneck rate before it starts getting new packets. Therefore, B should be at least W/2. From these two points, we conclude W/2 = R*D or W = 2 R*D and hence B = R*D. * If buffer is set at this value of R*D, it will empty exactly when the sender has gotten W/2 acks and starts sending new data. So buffer occupancy goes from R*D to 0, and starts filling up when the sender starts ramping up again. When queue is empty, queueing delay = 0. When queue is full, queueing delay = D. That is, the TCP segment RTT goes from D to 2D due to queueing delay, so average RTT = 1.5 * D. * If buffer is too large, it will lead to extra queueing delay. This is called buffer bloat problem. Exists in the Internet today to some extent, especially in wireless networks, home cable networks etc. If buffer size is too small, it leads to buffer underflow. That is, the buffer cannot keep the bottleneck link busy when TCP slows down. Buffer underflow is a potential factor that can cause low TCP throughput. * When multiple flows share a link of rate R, their throughputs will add up to R, so their BDPs will add up to R*D (asusming equal delays D). So conventional wisdom was that links should be provisioned for bandwidth * average delay expected. However, analysis ["Sizing Router Buffers" reference] shows that it can be lower values are enough because the peaks of all flows won't be synchronized. * Average window size = 3/4 * W. So average throughput = 3/4 * W * MSS / RTT. Clearly, TCP throughput achieved has an inverse relationship with RTT. This is called RTT unfairness of TCP. If two links share bottleneck link. Then the flows with higher RTT will get a lower share of the bottleneck link. * What is the relationship between TCP throughput and loss rate? Consider the sawtooth diagram. Consider one cycle where window goes from W/2 to W. Since window size increases by 1 segment every RTT, it increases by W/2 segments in W/2 * RTT. The number of pakcets transmitted in one cycle area under one "tooth" = (W/2)^2 + 1/2 * (W/2)^2. The expected number fo packets in one cycle is also 1/p if p is the probability of packet loss. Equating the two, we get W = sqrt(8/3p). Substituting in the throughput formula, we get throughput = sqrt(3/2) * MSS / ( RTT * sqrt(p) ) * Now, let's discuss why AIMD rule makes sense. The following is an intuitive explanation of why AIMD algorithm ensures efficiency and fairness. Consider two users sharing a link. Let (x,y) be their throughputs. We can represent their achieved rates on a graph at point (x,y). Now, for both flows to utilize the link capacity C fully, we want x+y = C. For fairness, we want x=y. We want the congestion control algorithm to converge to the intersection of the two lines. We can show via graphical reasoning that Additive Increase Multiplicative Decrease (AIMD) converges to the ideal point, no matter what value of x and y we start with. * Is dropping packets the correct signal? Should we wait till packet drop to learn about congestion. Another idea: Random Early Detection (RED) gateways. When a router implements RED, it notices when the queue length is building up (meaning congestion is about to happen), and randomly drops packets so that TCP adjusts itself. Alternately, it can also "mark" packets, and TCP sender should look for this Explicit Congestion Notification (ECN) bit, and react to it like a packet loss. * RED should be careful to mark/drop only in response to persistent congestion, and not transient congestion. So should use average queue size. Should not be biased towards bursty traffic, and should not cause synchornization of all flows. So RED drops/marks packets randomly. * How does RED work? RED uses two thresholds of queue sizes. If queue length below the lower mark, do nothing. If netween lower and higher threshold, then use a drop probability that increases linearly proportional to queue length. If queue length above higher threshold, drop always. Setting these various thresholds is a tricky problem. * Further reading: the following references cover most of the analysis described above. - "Sizing Router Buffers", Appenzeller et al. Describes the reasoning behind the BDP rule for buffer sizing, and how it can be lower in practice. - "Bufferbloat: Dark Buffers in the Internet", Gettys and Nichols. Describes the buffer bloat problem that happens due to excessive buffering. - "The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm", Mathis et al. Describes the TCP throughput model. - "Random Early Detection Gateways for Congestion Avoidance", Floyd and Jacobson. - "Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks", Chiu and Jain. Discusses the fairness of AIMD algorithm. The graphical explanation is particularly interesting.