Threads and Concurrency.

* Threads are light weight processes. Multiple threads of a process share most of the memory image (code, data), and hence a system can support many concurrent threads in the same amount of memory. Threads of a process can execute independently over the same memory image. Threads have separate user stacks (to execute independently) and separate kernel stacks (to save context and registers during user/kernel mode transitions). Threads can also have optional thread specific data.

* Introduction to the pthreads library: the most common way of creating threads in Linux is to use the pthreads library. The function pthread_create takes the thread function to run as argument. The newly created thread starts execution in its given function. A structure of arguments can also be passed to the thread at creation. A parent process/thread can wait for threads to exit and "join", or they can be detatched and run independently.

* In Linux, when a process creates a thread (say, using the pthreads library), a separate executable entity is created at the kernel. That is, a new PCB like structure (thread control block, or TCB) is created, and the thread is scheduled independently by the scheduler. Such threads are called kernel threads. 

* Some other libraries allow for creating threads, but these do not map to separate schedulable entities at the kernel level. Such threads are called user threads. The threading library manages the mapping of multiple user threads onto one or more kernel threads. User level threads do not give concurrency benefits. Then why are they used? (Ease of programming - this point will become clear when we study the alternative to multithreaded programming, using event-driven I/O.)

* Concurrency and parallelism. Multiple threads of an application can execute concurrently even on a single CPU, interleaving their executions with each other. When multiple CPUs are available, the executions can happen in parallel. Understand the subtle difference between the concepts of concurrency and parallelism. 

* When multiple threads of an application access shared data concurrently (even if not in parallel), race conditions can result. Classic example: two threads increment the same shared variable, but their executions interleave, resulting in an incorrect update.

* Critical section: part of a program that must be executed in an atomic mutually exclusive manner by multiple threads. Locks must be used to protect critical sections and ensure mutual exclusion. 

* All threading libraries provide lock variables that can be locked and unlocked. How are these lock/unlock functions implemented? Using hardware atomic instructions.

* The atomic instruction in the Intel x86 architecture: oldvalue=xchg(&lock, newvalue). The following piece of code acquires a lock:

while(xchg(&lock,1) != 0);

* A thread that is waiting for a lock can either spin busily (spinlock, as shown in the code above) or can be context switched out by the OS and returned when the lock is ready (mutex). The pthreads library has both flavors of locks. 

* Other types of locks: read write locks, recursive locks and such are provided by the pthreads library, though the simple mutex should suffice for all practical purposes.

* Deadlocks can occur when dealing with multiple locks (e.g., one process locks A and waits for lock B, and another process holds lock B and waits for A). How to avoid such deadlocks? Acquire multiple locks in the same order.

* The kernel code also uses spinlocks to protect its internal data structures from concurrent access by kernel threads on other CPUs. (Note that the kernel code cannot use blocking mutexes: if the kernel blocks, what else will run?) Note that kernel code running on single processor machines need not use spinlocks. In such cases, it is simply enough to disable interrupts during critical sections, because nothing else can interrupt the kernel. However, user processes on single processor machines must still use locks, because the kernel can interrupt such processes. On multiprocessor machines, both user processes and kernel code must use locks to protect critical sections. 

*  When a spinlock is held in the kernel, the kernel must not be interrupted, as the interrupt handler could deadlock on the same spinlock. Therefore, all interrupts are disabled on a particular core for the duration of holding a spinlock.