Socket Programming and Server Architectures
===========================================

*** Outline ***
- Socket programming API
- Techniques for concurrency (I/O multiplexing) at the server
- High performance servers

-------------------------------------------------------------------------

* Applcation layer interfaces with the lower layers via the socket
  interface. For example, a web server opens a socket, browser opens a
  socket, both sockets are linked to provide a datapipe
  abstraction. That is, whatever data is pushed in at one end comes
  out the other and vice versa. For example, all the HTTP request and
  response messages seen in last lecture and sent and received through
  sockets on browsers and web servers. This lecture, we will examine
  the socket interface in more detail. In PA1, you will build a simple
  client-server application using sockets.

* Any user space process can invoke the socket system call to create a
  socket. Sockets belong to two families: Unix domain sockets (for
  process to process communication on the same machine) and Internet
  sockets (what we will use). Internet sockets are further of three
  main types: stream socket (uses TCP to send the message given to the
  socket), datagram socket (uses UDP), and raw socket (sends the given
  packet as-is without any additional processing). We will focus on
  TCP sockets in this lecture, and in the programming assignment.

* Every socket has a socket address: the IP address and a port
  number. Port number is a 16-bit number used to distinguish sockets
  on the same machine. Servers open sockets on well known ports, so
  that clients know how to contact them. For example, web servers
  listen on port 80, 20/21 for FTP data/control, 22 for SSH, 23
  Telnet, 25 SMTP (email) etc. Ports 0-1023 are reserved for these
  well-known services.

* Inside a process / application, open sockets are referenced by a
  number called the socket file descriptor (much like a regular file
  descriptor). Whenever you have to refer to a socket to the kernel in
  a system call, you must quote this number. The socket file
  descriptor / handle is obtained when any process opens a socket
  using the "socket" system call.

* After a server creates a socket (byspecifying its type, family etc),
  it "binds" the socket to a particular well-known IP address and port
  number.  After server binds to a particular port, it issues the
  "listen" system call to tell the lower layers to start listening for
  incoming requests. Whenever an incoming request arrives on the
  server's socket, it must do an "accept" system call. If the server
  calls accept, it will block till a request arrives.

* Clients create sockets, but don't need to pick an IP address or port
  number. Client sockets are automatically assigned a random unused
  port number that is not reserved by the system. Once a client socket
  is created, the client "connects" to the server socket by specifying
  the server IP and port. This connection involves the three-way TCP
  handshake for TCP sockets, and nothing for UDP sockets.

* When the client connects, and the TCP server accepts the connection,
  a new socket is created at the server for communication with this
  client. The original listening socket continues to listen for new
  requests, and the new socket is used to send and receive from a
  particular client. Note that a TCP server with N active clients will
  have N sockets to read/write for each client, in addition to the
  main listening socket. All these N+1 sockets are all supposed to be
  on the same port number (the server's well known port number), but
  they have different socket file descriptors inside the server code.

* Once the client and server sockets have been connected, both
  endpoints can read and write from sockets, much like they do from
  files. For example, you call read(socket number,...) and this call
  returns when there is some data to be read on the socket. Similarly,
  the write call writes data into a socket.

* Summary: main system calls: socket (for creation), bind, listen,
  accept (at server), connect (at client), read, write (at both).

* Note that a lot of these system calls (especially accept and read)
  block. That is, when you make the system call, the process regains
  control and can proceed forward only if a reasonable response is
  received. For example, accept returns only if a new request
  arrives. Connect request only after it tries the TCP handshake. Read
  returns only if data is found or some error occurs.

* Alternately, you can set socket options to make a socket
  non-blocking. For example, if you read from a non-blocking socket,
  it returns immediately, even if there is no data. However, a read on
  a regular blocking socket will stall the execution of the program
  until some data to read arrives.

* How does a server handle multiple clients? Suppose the server has
  only one process/thread. Initially server blocks on accept. Accept
  returns with a new client request. Now, when server is handling that
  client request and waiting for a read on that socket, it cannot
  accept any other new connections. Similarly, if server is waiting
  for accept, it cannot read and write from client socket.

* Simple solution for concurrency: whenever a new client request
  arrives, the server forks a new process. That child process will
  handle the client for as long as the client is talking to the
  server. The main server process will continue to listen for new
  connections. You can do something similar with multiple threads
  instead of processes (threads are more lightweight, but care must be
  taken to synchronize them). The popular Apache server works this
  way.

* Another solution for concurrency: non-blocking sockets. The server
  maintains a bunch of non-blocking sockets corresponding to different
  clients (and one main socket for new connections), and periodically
  "polls" (i.e., does read/accept etc) all the sockets for any
  incoming requests/data. This polling can be done in several
  ways. The naive way of checking all sockets all the time can be
  wasteful, because the server is always busy checking all the time.

* A better idea than polling: the select system call. You give the
  select system call a bunch of sockets, and ask it to monitor
  them. The system call will return when one or more sockets have data
  to report. When select completes, we can check which socket(s) had
  data, and act accordingly (accept the new connection, read data from
  socket etc.). This is called an "event-driven" architecture. There
  is another system call called "poll" that lets you do something
  similar. Note that the select system call can work with blocking or
  non-blocking sockets though non-blocking is preferred (can you think
  of a reason why?).

* Several advanced techniques that perform better than select exist
  today for what is called Input/Output multiplexing in
  applications. For example, lookup "epoll" system call. These system
  calls are more efficient version of the select system calls, and
  enable servers to handle many concurrent clients with little
  overhead.

* In general, designing servers to handle a large number of clients is
  a challenging problem. I/O multiplexing (avoiding blocking on the
  socket calls of a single client) is only one part of the
  puzzle. Typical web servers today do extensive processing on each
  request, e.g., read/write files or database tables, perform
  computation etc. For example, consider the web server of a online
  travel portal. The web server has to receive the user's request,
  check a backend database for information like ticket availability
  etc., run some computations for cheapest travel options, and return
  the response to the user. Each client request involves multiple
  steps. Therefore, in addition to accept, read on sockets, a client
  request can block at several places, like disk access or database
  access. So a good web server needs to be able to multiplex multiple
  requests in an efficient manner without blocking on any one, and
  serve as many clients as it can for a given capacity. 

* However, no matter how much multiplexing you do, at some point, some
  system resoruces of the server are bound to run out. That is, after
  the number of clients exceed a certain capaity, a single server
  cannot handle the load anymore, because say, the CPU is too busy or
  disk is the bottleneck or no more free sockets are available. This
  is called server overload. When the server is overloaded, its
  performance suffers. Requests fail, response times are high, this is
  called a "server crash".

* Several techniques exist to scale web servers. The most common idea
  is to have several "replicas" of the server in a server farm, and
  have a "load balancer" to distribute load between the replicas. The
  load balancer can do redirection at the DNS level (return multiple
  IP addresses for a DNS name), at the application layer by looking at
  HTTP requests and sending a certain kind of requests to a certain
  replica (L7 load balancer), based on source or destination IP (L3
  load balancer) etc. Load balancers also need to ensure same user
  goes to same server replica in some cases ("stickiness").

* Content Distribution Network (CDN) - manages several replica servers
  that hold content, across several websites/content providers. Each
  user is directed to closest replica of the CDN via DNS.

* Further reading:

- "Flash: An Efficient and Portable Web Server", Pai et al.  This
   paper describes a new architecture for multiplexing requests at a
   web server. Please skim through the paper to appreciate the problem
   it is trying to solve. The paper addresses the fact that while the
   select system call can check for I/O operations on multiple
   sockets, it cannot check whether multiple disk operations have
   completed or not. Therefore, even if you use select to not block on
   socket system calls, you may still end up blocking on disk
   operations. Therefore, this paper proposes a new architecture for
   servers that need to access the disk. The Flash web server has one
   process with a select loop to handle all clients initially. When a
   client requests disk operations, a separate process is spawned and
   the client is handed off to that process.

- "Handling Flash Crowds from your Garage", Elson and Howell. This
  paper describes several techniques you can use to easily build a
  scalable web server that can handle high volumes of client requests.

- "A Scalable and Explicit Event Delivery Mechanism for UNIX", Banga
  et al. This paper introduced the idea behind the "epoll" system
  call, which is a much more efficient version of select.