2013年3月20日 星期三

Interrupt Virtualization

Interrupt Virtualization

Reference:
- Hardware Assisted Virtualization Intel Virtualization Technology, by Mat as Zabalj auregui


VMX support for handling interrupts



External interrupt virtualization

mm
mm
mm


Example of handling external interrupts:

1. Guest setup: 
VMM必須要先設定當external interrupt發生時 Guest會產生VM exit
(set "external-interrupt exiting" bit in VMCS)

2. CPU對external interrupt的處理
Interrupt會被自動mask, 藉由clearing RFLAGS.IF. 
如果VMM使用acknowledge-on-exit的功能,  
The processor acknowledges the interrupts, retrieves the host vector, and saves the interrupt in the exist-interruption-information field before transitioning control to the VMM. 
再將控制權交給VMM之前, CPU會自動將Host vector取出, 把目前state存入VMCS

3.  VMM處理Interrupt
如前例, 此時若acknowledge-interrupt-on-exit有設定, VMM可以直接使用Host vector去呼叫相對應的interrupt handler. 此時就跟VM無直接關係.
若沒有設定, 則VMM必須要re-enable interrupt (by setting RFLAGS.IF) to allow vectoring of external interrupts through the monitor/host IDT. 此時考慮兩種情況

[a] Host owned I/O devices 
如果這個device是屬於VMM的, 那相對應的ISR會被呼叫, 此過程和一般interrupt service routine一樣. 但當ISR結束之後, VMM會檢查是此次的interrupt需要其他virtual interrupt的產生 (例如VMM接收到packet之後, 需要轉送給VM的虛擬網卡).
這時候對每個"affected virtual device", VMM會injects virtual external interrupt event. 

[b] Direct pass-through I/O devices
如果這個device是屬於VM的, 此時是由VM內部driver的ISR來處理此interrupt.
- Interrupt causes VM exits to the VMM and vectoring through Host IDT to a registered handler (應該是專門給passthrough device的handler)
- VMM此時會map host vector到corresponding guest vector to inject virtual interrupt into the assigned VM. 
- The guest software does EOI write to the virtual interrupt controller. 

如何inject virtual interrupt?


4. 產生Virtual Interrupt
[a] 首先要檢查processor interruptibility state. 
[b] 如果Processor屬於"not interruptible", VMM則使用"interrupt-window exiting"功能, 也就是說當processor變成可interrupt時, 會產生VM exit通知VMM
[c] 檢查virtual interrupt controller的狀態
- 有無使用Local APIC? 或routed through local vector table (LVT)? I/O APIC是否mask virtual interrupt?
[d] Priority: 
因為virtual interrupt是被queue在VMM並且利用VM entry送入, 所以VMM可以設計不同的priority機制.
[e] Update the virtual interrupt controller state
"When the above checks have passed, before generating the virtual interrupt to the guest, the VMM updates the virtual interrupt controller state (Local-APIC, IO-APIC and/or PIC) to reflect assertion of the virtual interrupt."
[f] Inject the virtual interrupt on VM entry
VMM藉由設定VMCS去產生virtual interrupt.
當VM entry時, Processor會執行相對應的guest IDT, 完成interrupt的處理



Intel's Hardware Assisted Virtualization Technology

Intel's Hardware Assisted Virtualization Technology
Reference:
- Hardware Assisted Virtualization Intel Virtualization Technology, by Mat as Zabalj auregui


Background
- Intel processor uses 4 privileged-level (0 - 3), 0 for highest privileged and 3 for least privileged (user level program).
- For an OS to control a CPU, it must run with privilege 0.

- 0/1/3 model: let VMM run on level 0, guest VM kernel on 1, and guest VM user space on level 3. This is called ring deprivileging. However, ring deprivileging causes many challenges (ex: every component such as page table must be aware of the additional level 1. 通常都只瞭解level 0 and 3)
- Intel VT-x is aimed to solve these challenges by allowing guest to run on its intended level (ring 0) and guest software is also constrained "not by privileged level", but by non-root VMX operations.

- Privilege-based protection的缺點 --> overhead較高
IA-32 uses SYSENTER and SYSEXIT to support low latency system calls, however, in guest, execution of sysenter/sysexit will be transitioned to the VMM. The VMM must emulate every guest execution of sysenter/sysexit. --> 因此有了Intel VT-x

- Interrupt Virtualization
IA-32 architecture allows OS to mask/unmask the external interrupt, preventing incoming INT if it is not ready yet. The VMM needs to control these mask and deny guest when a guest is trying to access. Such mechanism could have performance issues since OS is frequently mask and unmask interrupt and complicate the design of VMM.

- Ring compression
VMM must have control of some amount of a guest's virtual address space for control structure. (These include IDT and GDT). Guest accessing IDT or GDT will generate transitions to the VMM, for VMM to do further handling.

有了以上講的這些缺點
下面解釋兩種目前解決方案

Paravirtualization v.s Binary translation
- Source level modification of guest OS such as Xen. However, not support MS windows system.
- Making modifications directly to guest-OS binaries, such as VMare and Virtual PC. Support broader range of OSes but higher overhead.
* VT-x的設計就是為了不要在使用binary translation, 並且讓VMM支援更多的作業系統


Virtual Machine eXtension (VMX)
VMM runs on VMX root and guest OS runs on VMX non-root. Transitions to VMX non-root are called VM entry while transitions to VMX root is called VM exit.

- VMX non-root: although it's on ring 0, VMX operation places restrictions so that guest software is under some control by VMM, which runs at VMX root level.


- VMM executes VMXON to enter VMX root mode.
- VMM put the guest software into VM by VM entries.  (or VMLAUNCH / VMRESUME). The VMM regains control when VM exit.
- When VM Exit, the VMM is able to take appropriate actions by reading the cause of VM exit from VMCS.


Virtual Machine Control Structure (VMCS)
每個logical CPU都有相對應的VMCS區域, VMCS是Host和VM之間用來溝通的橋梁
當VM exit時, Host可藉由VMCS來知道exit的原因
而當要VM entry or VMRESUME時, Host也可藉由VMCS來傳入event, 例如interrupt和exception

- Each logical process is associated with a VMCS region in its memory. Software makes a VMCS active by executing VMPTRLD.
- The format of a VMCS region includes header (identifier and abort indicator) and VMCS data.
- The VMCS data includes:
1. Guest-state area,
2. Host-state area,
3. VM-execution control fields
4. VM-exit control fields

x86 instruction

x86 http://en.wikipedia.org/wiki/X86_instruction_listings
STI: Set interrupt flag
IRET: Return from interrupt