CS695 Topics in Virtualization and Cloud Computing Spring 2019 Lecture 7 25.1.2019 Lecture 8 30.1.2019 Lecture 9 01.2.2019 -------------------- 0. Exercise #2 due 30th January 2019 https://www.cse.iitb.ac.in/~puru/ https://www.cse.iitb.ac.in/~puru/courses/spring19/cs695/exercises/ex2.html references: - [memvirt-survey] - [vmware] [xen] [kvm] - [x86virt] [hw-sw-mem] 1. Recap: Linux digression - processes and memory management basics - task_struct, mm_struct - process memory layouts---virtual address space and physical address space - "every process has a the kernel in it!" * Homework: look inside task_struct and mm_struct and understant important fields. * Homework: How do kernel threads work with no address space? (mm_struct == NULL) * Homework: What does the call 'current' do? What is its return value? How is it implemented? - kernel threads (ps -ef) - special processes - have a task_struct and hence schedulable - mm_struct = NULL (no address space of its own) - protected mode of operation of the CPU (as opposed to real mode) enables support for virtual address space via paging (and MMU based hardware translation) - PTE_FLAGs: LSB 12-bits of PTE - PAGE_PRESENT, PAGE_PROTNONE, PAGE_RW, PAGER_USER, PAGE_DIRTY, PAGE_ACCESSED ... - Note: On a virtual address issue, the MMU in hardware attempts a translation and access. Interventions are possible only on exceptions! - page faults - no v2p mapping. MMU generates a trap with page fault error code and faulting VA - page fault handler "handles fault" - find a page, add to page table - repeat faulting instruction 2. memory virtualization with virtual machines - assumption of (guest) OS: linear zero starting physical memory multiplexed over processes, with process-view expecting a linear zero starting memory. supported via the virtual memory abstraction. - physical memory cannot be all "assigned" to a single VM - why? - possible solutions 1. assign part of physical memory to each VM. - what about zero starting physical memory requirement? - also, (guest) OS expects the full physical address range for paging based v2p mapping. - will not work. 2. pseudo-physical memory as a logically continuous physical memory abstraction for each virtual machine. - VMs are assigned pseudo-physical memory (zero-starting) - pseudo-physical memory is used to multiplex actual physical/machine memory - guest OS paging works with pseudo-physical memory - two-lelvel mapping/translation - gva: guest virtual address ppa: pseudo-physical address mpa: machine physical address guest OS mapping: v2p , gva -> ppa guest OS multiplexes ppa over processes (several gva) hypervisor mapping: p2m, ppa -> mpa hypervisor multiplexes mpa over VMs (several ppa) - two challenges, (i) how to perform two translations with a single VMM? ... the virtualization mechanism. the MMU does a single translation using the page table! (ii) how to multiplex memory across VMs? ... the memory management question. - static vs. dynamic - the virtualization mechanisms. general idea: composition of v2p and p2m, denoted as v2m, and then loading mapping directly into a page table for the the hardware Memory Management Unit (MMU) to use. (i) direct mapping (the para-virtualization approach) - the guest page tables store v2m mappings Q: how to populate the guest OS page tables with v2m mappings? - all page table pages of guest are write-protected, guest OS can directly write/update page tables. - all updates to page tables are via hypercalls. - mmu_update (ptr, value) - update_va_mapping (va, value) - VMM can lookup its p2m mapping and update guest page table with v2m mapping - VM knows about current p2m mapping (para-virtualized), provides machine address to hypercall for update. hypercall checks and performs update. - e.g.: process P1 in guest wants to write to virtual address X - guest OS determines that X is mapped to ppa Y - either VMM or VM determines mapping to Y to mpa Z - VMM performs write on Z - hypervisor also stores and shares m2p mapping for guest OS to determine usage of ppa space. e.g., page table walk in guest OS is over gva and ppa! Questions: 1. How does VMM know which pages are page table pages? (for write-protection) - guest OS tells the hypervisor. 2. How are page faults (of gva) handled? - page fault handler in VMM checks nature of fault - no gva to ppa mapping - VMM passes fault to guest OS for handling. - guest OS chooses ppa and uses hypercall for updating page table - no ppa to mpa mapping - VMM handles fault and restores p2m mapping and then v2m mapping, and contentfs from swap. (ii) shadow paging (for fully virtualized VMs) - MMU points to a per process/per guest page table stored in the VMM - this table stores the per process v2m mappings - guest OS not aware of hypervisor and the shadow page table Q: how is the shadow page table built? - need to identify page table pages and write-protect them. - all updates to page table are trapped and handled by hypervisor. - on trap, use gva to determine mpa and creates/update shadow table with v2m entry also, updates v2p entry in guest e.g., add mapping of gva v to page table, assuming v maps to p - need to deterine the m for v and write value p in it. - separately update v2m shadow table for v2m * how to identify page table pages with guest OS support? - update to CR3 is not-privileged. - on update to CR3 trap, write-protect the page pointed by CR3, this the page directory. once page directory is protected, all updates to it and pages pointed can be identified and protected. - consistency requirement: no entry in shadow page table if no entry in guest page table. (iii) hardware-assisted memory virtualization (EPT/NPT) - MMU is virtualization aware! aware of two-level translation requirement. - MMU use two pointers for page table walks. - for each gva, two levels of translation in hardware, v2p and p2m Q: #page table walks required for a gva resolution?