CS695 Topics in Virtualization and Cloud Computing
Spring 2019

Lecture 7  25.1.2019
Lecture 8  30.1.2019
Lecture 9  01.2.2019
--------------------

0. Exercise #2 due 30th January 2019
https://www.cse.iitb.ac.in/~puru/
https://www.cse.iitb.ac.in/~puru/courses/spring19/cs695/exercises/ex2.html

references:
- [memvirt-survey]
- [vmware] [xen] [kvm]
- [x86virt] [hw-sw-mem]

1. Recap: 
   Linux digression 
   - processes and memory management basics
   - task_struct, mm_struct
   - process memory layouts---virtual address space and physical address space
   - "every process has a the kernel in it!"

* Homework: look inside task_struct and mm_struct and understant important fields.
* Homework: How do kernel threads work with no address space? (mm_struct == NULL)
* Homework: What does the call 'current' do? What is its return value? How is it implemented?

- kernel threads (ps -ef)
  - special processes
  - have a task_struct and hence schedulable
  - mm_struct = NULL (no address space of its own)

- protected mode of operation of the CPU (as opposed to real mode)
  enables support for virtual address space via paging (and MMU based hardware translation)

  - PTE_FLAGs: LSB 12-bits of PTE
    - PAGE_PRESENT, PAGE_PROTNONE, PAGE_RW, PAGER_USER, PAGE_DIRTY,
      PAGE_ACCESSED ...

  - Note: On a virtual address issue, the MMU in hardware attempts a
    translation and access. Interventions are possible only on exceptions!

- page faults
  - no v2p mapping. MMU generates a trap with page fault error code and faulting VA
  - page fault handler "handles fault"
  - find a page, add to page table
  - repeat faulting instruction

2. memory virtualization with virtual machines

- assumption of (guest) OS: linear zero starting physical memory

  multiplexed over processes, with process-view expecting a linear zero
  starting memory. 
  supported via the virtual memory abstraction.

- physical memory cannot be all "assigned" to a single VM
  - why?

- possible solutions
1. assign part of physical memory to each VM.
    - what about zero starting physical memory requirement?
    - also, (guest) OS expects the full physical address range
      for paging based v2p mapping.
    - will not work.

2. pseudo-physical memory as a logically continuous physical
   memory abstraction for each virtual machine.

   - VMs are assigned pseudo-physical memory (zero-starting)
   - pseudo-physical memory is used to multiplex actual physical/machine memory
   - guest OS paging works with pseudo-physical memory

   - two-lelvel mapping/translation

   - gva: guest virtual address
     ppa: pseudo-physical address
     mpa: machine physical address

     guest OS mapping: v2p , gva -> ppa   guest OS multiplexes ppa over processes (several gva)
     hypervisor mapping: p2m, ppa -> mpa  hypervisor  multiplexes mpa over VMs (several ppa)

- two challenges,
  (i) how to perform two translations with a single VMM? ... the virtualization mechanism.
      the MMU does a single translation using the page table!

  (ii) how to multiplex memory across VMs? ... the memory management question.
    - static vs. dynamic

- the virtualization mechanisms.

general idea: composition of v2p and p2m, denoted as v2m, and then loading
mapping directly into a page table for the the hardware Memory Management  
Unit (MMU) to use.
  

(i) direct mapping (the para-virtualization approach)

- the guest page tables store v2m mappings

Q: how to populate the guest OS page tables with v2m mappings?

- all page table pages of guest are write-protected, guest OS can
directly write/update page tables.
- all updates to page tables are via hypercalls.
  - mmu_update (ptr, value) 
  - update_va_mapping (va, value)

- VMM can lookup its p2m mapping and update guest page table
  with v2m mapping
- VM knows about current p2m mapping (para-virtualized), provides
  machine address to hypercall for update. hypercall checks and
  performs update.

- e.g.: 
  process P1 in guest wants to write to virtual address X
  - guest OS determines that X is mapped to ppa Y
  - either VMM or VM determines mapping to Y to mpa Z
  - VMM performs write on Z

- hypervisor also stores and shares m2p mapping for guest OS to 
determine usage of ppa space.
  e.g., page table walk in guest OS is over gva and ppa!

Questions:
1. How does VMM know which pages are page table pages? (for write-protection)
   - guest OS tells the hypervisor.

2. How are page faults (of gva) handled?
   - page fault handler in VMM checks nature of fault 
     - no gva to ppa mapping
       - VMM passes fault to guest OS for handling.
       - guest OS chooses ppa and uses hypercall for updating page table
     - no ppa to mpa mapping
       - VMM handles fault and restores p2m mapping and then v2m mapping,
         and contentfs from swap.
     

(ii) shadow paging (for fully virtualized VMs)

- MMU points to a per process/per guest page table stored in the VMM
- this table stores the per process v2m mappings
- guest OS not aware of hypervisor and the shadow page table

Q: how is the shadow page table built?
- need to identify page table pages and write-protect them.
- all updates to page table are trapped and handled by hypervisor.
  - on trap,
    use gva to determine mpa and creates/update shadow table with v2m entry
    also, updates v2p entry in guest

e.g., add mapping of gva v to page table, assuming v maps to p
  - need to deterine the m for v and write value p in it.
  - separately update v2m shadow table for v2m


* how to identify page table pages with guest OS support?
  - update to CR3 is not-privileged.
  - on update to CR3 trap, write-protect the page pointed by CR3, this
    the page directory. once page directory is protected, all updates
    to it and pages pointed can be identified and protected.

- consistency requirement: 
  no entry in shadow page table if no entry in guest page table.


(iii) hardware-assisted memory virtualization (EPT/NPT)

- MMU is virtualization aware! aware of two-level translation requirement. 
- MMU use two pointers for page table walks.

- for each gva, two levels of translation in hardware, v2p and p2m

Q: #page table walks required for a gva resolution?