The Address Space of a Process

- The virtual address space of a process has two main parts: the
  user part containing the code/data of the process itself, and the
  kernel code/data. For example, on a 32-bit x86 system, addresses
  0-3GB of the virtual address space of a process could contain user
  data, and addresses 3-4GB could point to the kernel. The page table
  of every process contains mappings for the user pages and the kernel
  pages. The kernel page table entries are common for all processes
  (as there is only one physical copy of the kernel in memory), while
  the user page table entries are obviously different.

- Note that every physical memory address that is in use will be
  "mapped" into the virtual address space of at least one
  process. That is, the physical memory address will correspond to a
  virtual address in the page table of some process. Physical memory
  that is not mapped into the address space of any process is by
  definition not accessible, since (almost) all accesses to memory go
  via the MMU. Some physical memory can be mapped multiple times,
  e.g., kernel code and data is mapped into the virtual address space
  of every process.

- Why is the kernel mapped into the address space of every
  process?  Having the kernel in every address space makes it easy to
  execute kernel code while in kernel mode: one does not have to
  switch page tables or anything, and executing kernel code is as
  simple as jumping to a memory location in the kernel part of the
  address space. Page tables for kernel pages have a special
  protection bit set, and the CPU must be in kernel mode to access
  these pages, to protect against rogue processes.

- The user part of the address space contains the executable code
  of the process and statically allocated data. It also contains a
  heap for dynamic memory allocation, and a stack, with the heap and
  stack growing in opposite directions towards each other. Dynamically
  linked libraries, memory-mapped files, and other such things also
  form a part of the virtual address space. By assigning a part of the
  virtual address space to memory mapped files, the data in these
  files can be accessed just like any other variable in main memory,
  and not via disk reads and writes. The virtual address space in
  Linux is divided into memory areas or maps for each of the entities
  mentioned above.

- The kernel part of the address space contains the kernel code
  and data.  For example, it has various kernel data structures like
  the list of processes, free pages to allocate to new processes, and
  so on. The virtual addresses assigned to kernel code and data are
  the same across all processes. 

- One important concept to understand here is that most physical
  memory will be mapped (at least) twice, once to the kernel part of
  the address space of processes, and once to the user part of some
  process. To see why, note that the kernel maintains a list of free
  frames/pages, which are subsequently allocated to store user process
  images. Suppose a free frame of size N bytes is assigned a virtual
  address, say V, by the kernel. Suppose the kernel maintains a
  4-byte pointer to this free page, whose value is simply the starting
  virtual address V of the free page. Even though the kernel only
  needs this pointer variable to track the page, note that it cannot
  assign the virtual addresses [V, V+N) to any other variable,
  because these addresses refer to the memory in that page, and will
  be used by the kernel to read/write data into that free page. That
  is, a free page blocks out a page-sized chunk of the kernel address
  space. Now, when this page is allocated to a new process, the
  process will assign a different virtual address range to it (say,
  [U, U+N)), from the user part of its virtual address space,
  which will be used by the process to read/write data in user
  mode. So the same physical frame will also have blocked out another
  chunk of virtual addresses in the process, this time from the user
  part. That is, the same physical memory is mapped twice, once into
  the kernel part of the address space (so that the kernel can refer
  to it), and once into the user part of the address space of a
  process (so that the process can refer to it in user mode).

- Is this double consumption of virtual addresses a problem? In
  architectures where virtual address spaces are much larger than the
  physical memory, this duplication is not a problem, and it is
  alright to have one byte of physical memory block out two or more
  bytes of virtual address space. However, in systems with smaller
  virtual address spaces (due to smaller number of bits available to
  store memory addresses in registers), one of the following will
  happen: either the entire physical memory will not be used (as in
  the case of xv6), or more commonly, some part of user memory will
  not be mapped in the kernel address space all the time (as in the
  case of Linux). That is, once the kernel allocates a free page to a
  process, it will remove its page table mappings that point to that
  physical memory, and use those freed up virtual addresses to point
  to something else. Subsequently, this physical memory will only be
  accessible from the user mode of a process, because only the user
  virtual addresses point to it in the page table. Such memory is
  called ``high memory'' in Linux, and high memory is mapped into the
  kernel address space (i.e., virtual addresses are allocated from the
  kernel portion of the virtual memory) only on a need basis.