Lockfree Datastructures * Locking imposes serialization among threads accessing shared data structures, resulting in reduction in parallel speedup. Lockfree datastructures or non-blocking data structures let multiple threads operate concurrently and correctly on shared data. * For example, let us consider a simple stack, consisting of struct nodes, each containing a pointer 'next' to the next element. The 'top' points to the top of the stack. push(node *n) { n->next = top; top = n; } node *pop { node *result = top; if(result != NULL) top = result->next; return result; } This is not 'thread-safe' when multiple threads access. Why? What happens when two nodes execute the push function in an atomic manner? * Of course, once you use a lock over the entire push and pop operation. To avoid the overhead of locking, the following correct version uses the CAS atomic operation to make the push and pop functions atomic. Note that a CAS operation also serializes accesses, however, it reduces the extent of serialization as compared to a coarse-grained lock over the entire function. push(node *n) { do n->next = top; while(!CAS(&top, n->next, n); } What happens when two nodes push at once? Only the first CAS will succeed. The second CAS will see a mismatch of the old value (which is now the new node and not the old top) and the thread trying the second push will have to retry. node *pop { node *result; while( (result=top) != NULL) if(CAS(&top, result, result->next)) break; return result; } * What is problem with the above code? The CAS operation only checks that no one else changed top after the pop started. But what if top is still pointing to the identified result node, but the rest of the list has changed? For example, some other thread could have done the following after entering the while loop of pop, but just before executing the CAS statement: x = pop(); push(new_element) / pop(y); push(x). So the old value of top that CAS is checking is correct, but the entire stack is no longer the old stack. This is called the ABA problem, and can lead to incorrect results. * How do we fix this issue? We need a version number for the entire stack, which gets updated on every change. Further, this version number along with the top pointer (a double word) should all be updated atomically in the CAS operation. Such an atomic instruction is called DCAS (double CAS) because it operates on two adjacent memory words. Note that this insstruction is not as widely supported as the CAS. Now, the global stack variable will include the top pointer (stack.top) as well as a version number (stack.version). struct stack { node *top; int version; } S; We will now have to atomically update all of S during any push/pop operation. push(node *n) { struct stack old, new; do { old = S; new.top = n; n->next = old.top; new.vresion = old.version + 1; } while(!CAS(&S, old, new)); } node *pop() { struct stack old, new; do { old = S; if(old.top == NULL) return NULL; new.top = old.top->next; new.version = old.version + 1; } while(!CAS(&S, old, new)); return old.top; } * Is the above code enough? Consider the issue of freeing memory of popped elements. When can the memory be freed? What happens if it is freed immediately? Suppose two pop operations interleave. One thread has read the 'top' value into the 'old.top' variable, after which another thread has popped the top. Now, the CAS operation of the first thread will not succeed because the version numbers of the old stack won't match. So the above code is correct. However, if the top variable has been freed, then old is pointing to reclaimed memory, and accessing old.top may lead to segmentation fault. So, for the above code to work correctly, memory of popped elements should not be freed until some other thread is holding a pointer to it. * Similar CAS based algorithms exist for non blocking / lockfree linked lists as well. For example, to insert an new node into a linked list, we can atomically swing the next pointer of the previous node. However, such lists also suffer from correctness issues that cannot be detected by CAS alone, and need other elaborate fixes. * Takeaway: no straightforward way to develop lockfree data structures. New techniques need to be invented for every datastructure, unlike locking which can be used to make any datastructure thread-safe. * References: [Harris Nonblocking Linked List], Section 8.1 of [Drepper].