Network stack design for high performance. * We will consider two examples of high performance network stacks: Megapipe and mTCP. * Megapipe is a modification to the Linux network stack. What are the problems it solves, and how does it solve them? - Contention on accept queue and lack of cache locality: both problems are solved by partitioning the accept queue into per-core queues (as seen before in the Mosbench paper). - VFS and filesystem overheads: sockets are handled in the same way as files, so incur filesystem related overheads (lock on global open file table, finding lowest numbered file descriptor etc.). However, sockets are not shared as often as files, and are ephemeral unlike files, so the filesystem mechanisms (though convenient) are an overkill. Megapipe proposes light weight sockets (lwsocket), a new type of socket that directly points to the TCP control block (TCB) without going through the file system. - System call overheads: Megapipe solves this problem by batching system calls using the concept of a channel. * A multithreaded application in Megapipe opens one channel on each core it is running. A socket/file/lwsocket is associated to the channel on its core. All read/write requests on a channel are batched by the Megapipe library, and sent in a queue to the Megapipe kernel. Once the request is completed, completion notifications are sent back to the process. Note that this completion notification model is slightly different from the readiness model of epoll, and this model can handle disk I/O as well. * API: see code example in the paper. * Disadvantages of Megapipe: requires changes to kernel and application program code. Also, the kernel has other shared datastructure bottlenecks at other places. * The next paper is multicore TCP (mTCP): this is a userspace TCP/IP stack, that is designed to run over a kernel bypass I/O mechanism like netmap or DPDK. It incorporates all the optimizations of Megapipe, and a few more, but without requiring changes to the kernel. API is also very similar to epoll, so porting effort is minimal. * Architecture of mTCP: one application thread per core, and mTCP is invoked either as a library from the application, or runs as a separate thread that exchanges buffers with the application thread. NICs with multiqueue support distribute packets to multiple cores using RSS, and mTCP on each core accepts and processes connections in parallel. * mTCP uses all the standard optimizations: per-core datastructures, cache friendly, lockfree, preallocate buffers, batching of packets etc. * Why does mTCP support epoll model rather than blocking I/O? The goal is to limit locking, so if multiple threads on a core have to share TCP processing, then this imposes a synchronization cost. Hence the model of one application thread per core, which means the thread cannot block and must use event-driven I/O. * mTCP compares against Megapipe and shows that it improves over Megapipe as it avoids the kernel overheads that Megapipe does not account for. * Overview of Arrakis design.