分类: 虚拟化
2009-03-22 14:43:34
The core of lguest is the lg loadable module. At initialization time, this module allocates a chunk of memory and maps it into the kernel's address space just above the vmalloc area - at the top, in other words. A small hypervisor is loaded into this area; it's a bit of assembly code which mainly concerns itself with switching between the kernel and the virtualized guest. Switching involves playing with the page tables - what looks like virtual memory to the host kernel is physical memory to the guest - and managing register contents.
The hypervisor will be present in the guest systems' virtual address spaces as well. Allowing a guest to modify the hypervisor would be bad news, however, as that would enable the guest to escape its virtual sandbox. Since the guest kernel will run in ring 1, normal i386 page protection won't keep it from messing with the hypervisor code. So, instead, the venerable segmentation mechanism is used to keep that code out of reach.
The lg module also implements the basics for a virtualized I/O subsystem. At the lowest level, there is a "DMA" mechanism which really just copies memory between buffers. A DMA buffer can be bound to a given address; an attempt to perform DMA to that address then copies the memory into the buffer. The DMA areas can be in memory which is shared between guests, in which case the data will be copied from one guest to another and the receiving guest will get an interrupt; this is how inter-guest networking is implemented. If no shared DMA area is found, DMA transfers are, instead, referred to the user-space hypervisor (described below) for execution. Simple disk and console drivers exist as well.
Finally, the lg module implements a controlling interface accessed via /proc/lguest - a feature which might just have to be changed before lguest goes into the mainline. The user-space hypervisor creates a guest by writing an "initialize" command to this file, specifying the memory range to use, where to find the kernel, etc. This interface can also be used to receive and execute DMA operations and send interrupts to the guest system. Interestingly, the way to actually cause the guest to run is to read from the control file; execution will continue until the guest blocks on something requiring user-space attention.
Also on the kernel side is a implementation for working with the lguest hypervisor; it must be built into any kernel which will be run as a guest. At system initialization time, this code looks for a special signature left by the hypervisor at guest startup; if the signature is present, it means the kernel is running under lguest. In that situation, the lguest-specific paravirt_ops will be installed, enabling the kernel to run properly as a guest.
The last component of the system is the user-mode hypervisor client. Its job is to allocate a range of memory which will become the guest's "physical" memory; the guest's kernel image is then mapped into that memory range. The client code itself has been specially linked to sit high in the virtual address space, leaving room for the guest system below. Once that guest system is in place, the user-mode client performs its read on the control file, causing the guest to boot.
A file on the host system can become a disk image for the guest, with the user-mode client handling the "DMA" requests to move blocks back and forth. Network devices can be set up to perform communication between guests. The lg network driver can also work in a loopback mode, connecting an internal network device to a TAP device configured on the host; in this way, guests can bind to ports and run servers.
With sufficient imagination, how all of this comes together can be seen in the diagram to the right. The lguest client starts the process, running in user space on the host. It allocates the memory indicated by the blue box, which is to become the guest's virtualized physical memory, then maps in the guest kernel. Once the user-mode client reads from /proc/lguest, the page tables and segment descriptors are tweaked to make the blue box seem like the entire system, and control is passed to the guest kernel. The guest can request some services via the kernel-space hypervisor code; for everything else, control is returned to the user-mode client.
That is a fairly complete description of what lguest can do. There is no Xen-style live migration, no UML-style copy-on-write disk devices, no resource usage management beyond what the kernel already provides, etc. As Rusty put it at linux.conf.au, lguest eschews fancy features in favor of cute pictures of puppies. The simplicity of this code is certainly one of its most attractive qualities; it is easy to understand and to play with. It should have a rather easier path into the kernel than some of the other hypervisor implementations out there. Whether it can stay simple once people start trying to do real work with it remains to be seen.