*** Q: What exactly is the kernel stack of a process? The main memory (RAM) stores many things. Some part of the RAM has the memory images of user processes - the code/data/stack/heap and so on of various processes. This memory image of a process is accessible to a process in user mode. Some part of RAM has the kernel code and all its data structures, and this is not accessible to processes in user mode. In this kernel part of memory, the OS stores data structures like the list of PCBs of all active processes, and so on. The PCB of a process is a collection of all information about a process in one place, maintained by the OS. We will see the PCB of xv6 next week, and we will see that one of the important pieces of information stored in the PCB is the kernel stack of a process. That is, the kernel stack is a piece of memory in the OS part of the RAM, not accessible to a process in user mode. Q: What is the kernel stack of a process used for? Recall that a user program uses the user stack to push arguments/stackframe/return address and so on to the stack during a function call. Similarly, the kernel stack is used to store state by the kernel when a process is running in kernel mode. That is, when you make function calls in kernel mode, you won't use the user stack, but instead, you will use the kernel stack. The kernel stack is also used to save CPU context (state of CPU registers) when a process moves from user mode to kernel mode, when we switch from one process to another during context switch, and so on. That is, every time the OS wants to save some state of a process in order to restore it later on, it will do so on the kernel stack of the process. *** Q: When a process goes from user mode to kernel mode due to a trap occurring (system call / interrupt / program fault), who saves the context of the process? What exactly happens? Let me make an attempt to explain this process in detail. Suppose a process P is running in user mode. At a certain instruction in its code (say, PC = x), the program has a trap, say due to a system call. Now, this PC=x must be saved somewhere, and PC must jump to a different address "y" in the kernel code, so that OS code can run to handle this trap. [Note: "x" and "y" are memory addresses at which instructions are stored. "x" is an address in the user's code within the memory image, "y" is some address of OS code] Q: Where does the CPU get this address "y" from? In a normal function call, the assembly code will have an instruction like "call
". That is, if you make a function call in user programs, the CPU knows which instruction to jump to in memory because the address is provided as argument to the instruction. A similar thing cannot happen for system calls. User cannot be trusted to provide an address of kernel code to jump to. Why? User may do bad things and jump to random addresses in the kernel. So, when user makes a system call, how does the CPU know which PC to jump to? This address is obtained from the IDT. During boot up, the kernel will tell the CPU a list of kernel addresses where kernel functions to handle traps reside. So, CPU will look up this address "y" in IDT and update PC to this address "y". But before updating PC to "y", it must store this PC value of "x" somewhere. Otherwise, how will we know where to go back to in the user program? Therefore, the CPU must store this old PC "x" somewhere, before it can update the PC to "y". Q: Where is this old PC / CPU context saved? This old PC (and few other registers) are stored on the kernel stack of the currently running process. Normally, any such information that needs to be preserved during a function call (e.g., return address) is stored on the user stack (which is part of the memory image of a process). Once again, OS doesn't trust the user much. So this old PC, and other registers which are part of the CPU's context, are stored on the kernel stack of the currently running process. Q: Who saves the old PC? Old PC must first be saved before the PC switches from "x" to "y". Now, who saves this old PC? Can the kernel do it? No! Why? Because unless PC reaches "y", kernel code cannot even run. So, someone must save the old PC, switch PC to "y", and only then the kernel code can run. So, who is this someone who saves the old PC? It is the CPU hardware. Q: So, finally, what happens on a trap? On a trap, the CPU hardware switches stack pointer to kernel stack of process (from user stack), saves old PC "x" on kernel stack, looks up IDT to obtain new PC "y" to jump to, and then jumps to code located at "y". Now, the kernel code is finally running. The kernel code can also go ahead and save more context (more CPU registers that it wishes to save, in order to fully capture user context). Then the kernel code can handle the trap, and do whatever else it wants to do. Eventually, it will return back from the kernel mode into user mode by restoring the user context (i.e., by restoring PC to "x" again). Q: So, finally, who saves user's CPU context when moving from user mode to kernel mode? I hope the above explanation makes it clear that some part of the context is saved by hardware, and some part by the kernel. But the hardware support is essential to saving context when moving from user mode to kernel mode. Now, let's continue the story. The kernel starts executing at address "y", runs some code, and decides it wants to context switch away from this process to some other process. By this point, the PC is pointing to some other instruction "z" in the kernel's code. Now, we need to store this address "z" also somewhere on the kernel stack, and then switch to another process, so that we can resume again at "z" later on. This is a second type of saving context that happens during a context switch, and this "kernel context" is different from the "user context" that was saved during switch from user mode to kernel mode. Q: So there are two different types of contexts saved on the kernel stack? Yes. When the process went from user mode to kernel mode, the hardware/OS saved the CPU register context of where the execution stopped in user mode. For example, this "user context" has saved the value of PC="x". Later on, after running in kernel mode for some time, the process stops execution in the kernel and CPU jumps to the kernel mode of another process. At this point, a second context is saved on kernel stack. For example, this "kernel context" saves the value PC="z". Both these contexts are saved on different locations on kernel stack. After the context switch, at a later point, the OS returns back to this process, resumes execution at kernel code at address "z". Later, it may also return to user code, located at address "x". All of this is possible because we saved both these contexts on the kernel stack.