2013年3月20日 星期三

Intel's Hardware Assisted Virtualization Technology

Intel's Hardware Assisted Virtualization Technology
Reference:
- Hardware Assisted Virtualization Intel Virtualization Technology, by Mat as Zabalj auregui


Background
- Intel processor uses 4 privileged-level (0 - 3), 0 for highest privileged and 3 for least privileged (user level program).
- For an OS to control a CPU, it must run with privilege 0.

- 0/1/3 model: let VMM run on level 0, guest VM kernel on 1, and guest VM user space on level 3. This is called ring deprivileging. However, ring deprivileging causes many challenges (ex: every component such as page table must be aware of the additional level 1. 通常都只瞭解level 0 and 3)
- Intel VT-x is aimed to solve these challenges by allowing guest to run on its intended level (ring 0) and guest software is also constrained "not by privileged level", but by non-root VMX operations.

- Privilege-based protection的缺點 --> overhead較高
IA-32 uses SYSENTER and SYSEXIT to support low latency system calls, however, in guest, execution of sysenter/sysexit will be transitioned to the VMM. The VMM must emulate every guest execution of sysenter/sysexit. --> 因此有了Intel VT-x

- Interrupt Virtualization
IA-32 architecture allows OS to mask/unmask the external interrupt, preventing incoming INT if it is not ready yet. The VMM needs to control these mask and deny guest when a guest is trying to access. Such mechanism could have performance issues since OS is frequently mask and unmask interrupt and complicate the design of VMM.

- Ring compression
VMM must have control of some amount of a guest's virtual address space for control structure. (These include IDT and GDT). Guest accessing IDT or GDT will generate transitions to the VMM, for VMM to do further handling.

有了以上講的這些缺點
下面解釋兩種目前解決方案

Paravirtualization v.s Binary translation
- Source level modification of guest OS such as Xen. However, not support MS windows system.
- Making modifications directly to guest-OS binaries, such as VMare and Virtual PC. Support broader range of OSes but higher overhead.
* VT-x的設計就是為了不要在使用binary translation, 並且讓VMM支援更多的作業系統


Virtual Machine eXtension (VMX)
VMM runs on VMX root and guest OS runs on VMX non-root. Transitions to VMX non-root are called VM entry while transitions to VMX root is called VM exit.

- VMX non-root: although it's on ring 0, VMX operation places restrictions so that guest software is under some control by VMM, which runs at VMX root level.


- VMM executes VMXON to enter VMX root mode.
- VMM put the guest software into VM by VM entries.  (or VMLAUNCH / VMRESUME). The VMM regains control when VM exit.
- When VM Exit, the VMM is able to take appropriate actions by reading the cause of VM exit from VMCS.


Virtual Machine Control Structure (VMCS)
每個logical CPU都有相對應的VMCS區域, VMCS是Host和VM之間用來溝通的橋梁
當VM exit時, Host可藉由VMCS來知道exit的原因
而當要VM entry or VMRESUME時, Host也可藉由VMCS來傳入event, 例如interrupt和exception

- Each logical process is associated with a VMCS region in its memory. Software makes a VMCS active by executing VMPTRLD.
- The format of a VMCS region includes header (identifier and abort indicator) and VMCS data.
- The VMCS data includes:
1. Guest-state area,
2. Host-state area,
3. VM-execution control fields
4. VM-exit control fields

x86 instruction

x86 http://en.wikipedia.org/wiki/X86_instruction_listings
STI: Set interrupt flag
IRET: Return from interrupt





沒有留言:

張貼留言