Transport Layer: TCP analysis
==================================

** Outline **

- Ideal window size and bandwidth delay product
- Buffer sizing for TCP
- Simple model for TCP throughput
- Understanding TCP fairness
- RED gateways


* Ideally, the window size of the transport protocol should be set to
  the BDP of the path, where bandwidth = rate of the bottleneck link,
  and RTT is end-to-end RTT from sending data to receiving
  ACK. Understand why. Suppose BDP = N packets. Then, after you have
  sent N packets, the ACK for the first packet would have come back,
  enabling you to send the next packet and so on. So in the ideal
  case, a window size of exactly N will give you just enough
  pipelining to fully utilize the bottleneck. With less than N packets
  in flight, bottleneck may be idle some times, leading to lower
  throughput. If you have more than N packets in flight, extra packets
  will get queued up at the bottleneck router. You won't increase
  throughput, but will increase delay per packet. So, in ideal world,
  maintain BDP worth of window size, and no buffer at bottleneck
  router.

* Understand the graph of throughput vs. number of packets in flight
  (increases up to BDP, and flattens out). Understand graph of
  observed delay of packets vs number of packets in flight (flat
  initially as no packet sees queueing, increases linearly after
  numbre of pakcets crosses BDP due to queueing delays).

* In practice, BDP is hard to measure, so TCP uses a self-learning
  mechanism. It increases window size when all is going well, and
  reduces when it observes congestion (as indicated by packet loss for
  example). Ensures that it hovers around the ideal window size. So
  can undershoot and overshoot the ideal window. A buffer in the
  bottleneck link is needed to accommodate these fluctuations.

* BDP is also variable due to cross traffic (other flows in the
  network). So, bottleneck buffer is needed to absorb these
  fluctuations from other traffic and ensure bottelneck is utilized
  and packets are not dropped when available bandwidth changes
  suddenly. 

* The bottleneck buffer size is a very important parameter in TCP
  performance. What is the ideal bottleneck buffer size? If no cross
  traffic and ideal world, its ok to not have any buffers (as we
  explain above, buffer only hurts delay with no throughput benefit in
  ideal case). However, given that BDP is not known and is variable,
  and TCP uses some heuristics to discover BDP and ideal window size,
  how to set buffer size in the real world?

* Consider a very simple TCP model. TCP sender is in congestion
  avoidance steady state. Increases window up to W packets, packet
  loss happens, reduces window to W/2. Let's think in terms of packets
  for ease of analysis.

* Now, packet loss happens when window reaches W packets. At this
  point, the bottleneck buffer and the pipe have both filled up and so
  bottleneck router has dropped a packet. Let the size of bottleneck
  buffer be B packets. Let the rate of bottleneck be R and end to end
  delay on the path be D. Therefore, W = R * D + B.

* Now, the sender's window reduces to W/2. Under the sender gets at
  least W/2 acks, it cannot send next packet. In this time that the
  sender slows down, the bottleneck buffer should have enough packets
  to sustain the link. Since sender receives acks at the bottleneck
  rate (ack clocking), the time taken for sender to receive W/2 acks =
  time taken for bottleneck link to send W/2 pakcets. That is, the
  bottleneck buffer should be able to send at least W/2 packets at
  bottleneck rate before it starts getting new packets. Therefore, B
  should be at least W/2. From these two points, we conclude W/2 = R*D
  or W = 2 R*D and hence B = R*D.

* If buffer is set at this value of R*D, it will empty exactly when
  the sender has gotten W/2 acks and starts sending new data. So
  buffer occupancy goes from R*D to 0, and starts filling up when the
  sender starts ramping up again. When queue is empty, queueing delay
  = 0. When queue is full, queueing delay = D. That is, the TCP
  segment RTT goes from D to 2D due to queueing delay, so average RTT
  = 1.5 * D.

* If buffer is too large, it will lead to extra queueing delay. This
  is called buffer bloat problem. Exists in the Internet today to some
  extent, especially in wireless networks, home cable networks etc. If
  buffer size is too small, it leads to buffer underflow. That is, the
  buffer cannot keep the bottleneck link busy when TCP slows
  down. Buffer underflow is a potential factor that can cause low TCP
  throughput.

* When multiple flows share a link of rate R, their throughputs will
  add up to R, so their BDPs will add up to R*D (asusming equal delays
  D). So conventional wisdom was that links should be provisioned for
  bandwidth * average delay expected. However, analysis ["Sizing
  Router Buffers" reference] shows that it can be lower values are
  enough because the peaks of all flows won't be synchronized.

* Average window size = 3/4 * W. So average throughput = 3/4 * W * MSS
  / RTT. Clearly, TCP throughput achieved has an inverse relationship
  with RTT. This is called RTT unfairness of TCP. If two links share
  bottleneck link. Then the flows with higher RTT will get a lower
  share of the bottleneck link.

* What is the relationship between TCP throughput and loss rate?
  Consider the sawtooth diagram. Consider one cycle where window goes
  from W/2 to W. Since window size increases by 1 segment every RTT,
  it increases by W/2 segments in W/2 * RTT. The number of pakcets
  transmitted in one cycle area under one "tooth" = (W/2)^2 + 1/2 *
  (W/2)^2. The expected number fo packets in one cycle is also 1/p if
  p is the probability of packet loss. Equating the two, we get W =
  sqrt(8/3p). Substituting in the throughput formula, we get

  throughput = sqrt(3/2) * MSS / ( RTT * sqrt(p) )

* Now, let's discuss why AIMD rule makes sense. The following is an
  intuitive explanation of why AIMD algorithm ensures efficiency and
  fairness. Consider two users sharing a link. Let (x,y) be their
  throughputs. We can represent their achieved rates on a graph at
  point (x,y). Now, for both flows to utilize the link capacity C
  fully, we want x+y = C. For fairness, we want x=y. We want the
  congestion control algorithm to converge to the intersection of the
  two lines. We can show via graphical reasoning that Additive
  Increase Multiplicative Decrease (AIMD) converges to the ideal
  point, no matter what value of x and y we start with.

* Is dropping packets the correct signal? Should we wait till packet
  drop to learn about congestion. Another idea: Random Early Detection
  (RED) gateways. When a router implements RED, it notices when the
  queue length is building up (meaning congestion is about to happen),
  and randomly drops packets so that TCP adjusts itself. Alternately,
  it can also "mark" packets, and TCP sender should look for this
  Explicit Congestion Notification (ECN) bit, and react to it like a
  packet loss. 

* RED should be careful to mark/drop only in response to persistent
  congestion, and not transient congestion. So should use average
  queue size. Should not be biased towards bursty traffic, and should
  not cause synchornization of all flows. So RED drops/marks packets
  randomly.

* How does RED work? RED uses two thresholds of queue sizes. If queue
  length below the lower mark, do nothing. If netween lower and higher
  threshold, then use a drop probability that increases linearly
  proportional to queue length. If queue length above higher
  threshold, drop always. Setting these various thresholds is a tricky
  problem.

* Further reading: the following references cover most of the analysis
  described above.

- "Sizing Router Buffers", Appenzeller et al. Describes the reasoning
  behind the BDP rule for buffer sizing, and how it can be lower in
  practice.

- "Bufferbloat: Dark Buffers in the Internet", Gettys and
  Nichols. Describes the buffer bloat problem that happens due to
  excessive buffering.

- "The Macroscopic Behavior of the TCP Congestion Avoidance
  Algorithm", Mathis et al. Describes the TCP throughput model.

- "Random Early Detection Gateways for Congestion Avoidance", Floyd and Jacobson.

- "Analysis of the Increase and Decrease Algorithms for Congestion
  Avoidance in Computer Networks", Chiu and Jain. Discusses the
  fairness of AIMD algorithm. The graphical explanation is
  particularly interesting.