Virtualization hardware extensions

15 December 2005

Author: Havard Bjerke <hbjerkeREMOVE@cern.ch>, CERN Openlab

Due to the inherent problems in hardware-virtualizing on the traditional x86 architectures, Intel and AMD have developed hardware extensions to their x86 chips - Intel VT (formerly Vanderpool) and AMD Pacifica extensions, respectively. Pacifica is to be released early 2006 and VT in "the second half of" 2005 for Pentium 4 and Itanium 2 platforms and "first half of 2006" for Xeon and Centrino platforms. This document is a comparison of the VT and Pacifica extensions.

Comparison

The VMX Root privileged mode is an extra set of protection rings for the hypervisor, which ensures that the guest domains do not have access to the hypervisor's state, both in terms of registers and memory. Both ISAs feature the VMX Root privileged mode for the hypervisor, although ISAs may be different for accessing this mode. Thus, though VT's and Pacifica's ISAs are incompatible, their basic functionalities are comparable.

VMX root and nonroot modes

"Pacifica is a functional superset of Vanderpool." The two most significant extra functionalities, compared to VT, are optimisations for Nested Page Tables, the Device Exclusion Vector (DEV) and a tagged TLB.

Both architectures feature a data structure which is used to store register state when switching between domain and VMM. These structures are stored in memory and can be compared to process switch structures. This data structure also allows the CPU to control which interrupts and exceptions are to be intercepted by the VMM, that is, which instructions or interruptions exit to VMX root. The VT specification calls this structure the Virtual-Machine Control Structure (VMCS), while in Pacifica terms it is called the Virtual Memory Control Block (VMCB).

Memory management (MM)

Pacifica and VT can both facilitate for Shadow Page Tables (SPT). Using SPT, domains' pages tables contain translations directly from the domain's virtual address space to physical (bare metal) address space. Changes to the domains' page tables are monitored by the VMM, but reads from the tables are unmonitored.

The virtualization extensions allow for greater flexibility in intercepting these changes. Pacifica's VMCB allows the VMM to specify whether or not updates to the domains' page tables cause an entry into VMX root. Similarly, VT can control whether writes to cr3 cause VMX root entry: in the VMCS a set of legal cr3 values can be stored, and if the new value matches a legal one, the change is not intercepted.

Using Nested Page Tables (NPT), the domains page tables are nested on top of the VMM process' virtual address space. The NPT approach is less efficient, since each lookup in the domain's page tables may require three or four lookups - unless there is a hit in the TLB - in the VMM process' page table hierarchy. In the worst case this translates into 16 memory lookups just to find the right physical address.

As when switching between the address spaces of processes, switching between the address spaces of the VMM and the domains traditionally requires the TLB to be flushed. Similarly to IA-64 and many RISC architectures, Pacifica has a tagged TLB, which allows individual TLB entries to be invalidated on a per-address space basis, although, in Pacifica's case, this only applies for domains' address spaces and not processes' address spaces. This allows for address space switching between domains, including the host domain or hypervisor, without flushing the TLB. Pacifica's TLB also stores NPT translations, that is, a guest's virtual address can be translated to a physical address without first going through the host's virtual address translation, be it page tables or TLB. This can in some cases eliminate the 16 slow memory lookups and replace it with one fast TLB lookup.

Pacifica also has a Paged Real Mode, which virtualizes a Real Mode inside Protected Mode. The classical approach is to emulate Real Mode. In Pacifica's PRM, Real Mode adresses (segment + offset) are translated via SPT to physical addresses in hardware.

Pacifica's DEV

Multiple domains can not control the same device. Either devices must be allocated to specific domains or virtualized by the VMM. AMD64's on-chip MMU allows more direct control over MM virtualization, which is taken advantage of in DEV. The DEV tells which devices are allowed to access which pages in memory and effectively binds devices to domains. This way a domain can control a DMA device without being monitored by the VMM, since the DEV ensures that the DMA transfer will write to the owner domain's address space.

Analysis

Architectural improvements such as tagged TLB, may give the Pacifica performance benefits. But at the same time, architectural properties that are not necessarily related to virtualization, such as the capacity or speed of the TLB, may prove to be more significant in terms of virtualization performance.

Xen 3.0 already takes advantage of the virtualization hardware extensions can thus let unmodified OSs run virtualized. Most other virtualization softwares, including VMWare, do not yet take advantage of the extensions.

Future directions

Intel plans to later release VT2 and VT3 architectures, which will virtualize further aspects of MM and I/O. AMD will also release a Pacifica2 architecture.

References

AMD Pacifica turns the nested tables
AMD "Pacifica" Virtualization Technology
Secure Virtual Machine Architecture Reference Manual
Intel Virtualization Technology Specification for the IA-32 Intel Architecture