X86_64,在新架构的“长模式(long
mode)”下,很多在IA32中被提出,但却不经常被操作系统用到的一些机制不再被支持。这些机制包括段式地址变化机制(FS和GS仍然被保留),任务
转移门(TSS)机制,以及虚拟86模式。当然,出于向下兼容的考虑,x86-64在“传统模式”(Legacy
mode)下,仍然对这些机制进行了保留。
因此,X86_64只区分 ring0 和 ring3(ring1和ring2与ring0的权限是相同的),为了保护Xen,guest kernel 与 guest application共同运行在ring3等级上。
para-virtualization: Xen requires the porting of guest operating systems to the Xen Interface.
full-virtualization: it requires to provide the guest operating systems with an illusion of a complete virtual platform seen within a virtual machine behavior same as a standard PC/server platform.
Part of Xen Interface is provided by hypercalls. The other part is provided via the data structures available to domains.
与运行在本机的X86_64Linux不同,64位的XenLinux在初始化时就运行在64-bit模式(with paging enabled)。初始化时预分配给guest kernel的virtual address很小,guests need to extend or create new translation as necessary.
Xen is responsible for managing the allocation of physical memory to domains, and the guest physical memory is virtualized as "pseudo-physical memory".
On a real system, E820 BIOS call typically reports the memory map,
but the equivalent information is provided simply by "start info page" (start_info.nr_pages) on guests on Xen.
The pointer to start info page is set by Xen(for domain 0) or the domain builder (otherwise) to the register %rsi.
Each domain is supplied with a physical-to-machine table, and start_info.mfn_list points to the physical page number.
SWAPGS is intended for use with fast system calls when in 64-bit mode to allow immediate access to kernel structures on transition to kernel mode.
The native X86-64 Linux uses PDA (per processor data structure) to maintain critical data such as
the pointer to the current process,
the top of kernel stack for the current process,
and user %rsp for system call,
TLB state, and etc.
The register %gs points to the area in the kernel mode, and the instruction SWAPGS is executed when the processor enters or exists from the kernel mode.
The SWAPGS is only accessible at privilege level 0. Therefore it cannot be executed even in privilege level 1 or 2. Although we need to remove the instruction when para-vitalizing, we want to avoid to change the way the kernel uses PDA for no good reasons. This also justified the design to have the guest kernel run at privilege level 3.
to protect the guest kernel from its applications, 有两个选择:
1. Have two separate PML4 pages for the kernel and a user process. The one for the kernel has translation for the kernel and user, and the user one has just for the user.
2. Have a single PML4 page for both the kernel and a user process. When we switch to the user mode, we remove the translations for the kernel. When we switch back to the kernel mode, restore the kernel translations.
Since Xen must be OS agnostic (OS不可知的) and the kernel translations can be required for user processes ( such as vsyscall), the first option is a cleaner option. The current implementation uses the first one.
如图,是 address map of the x86-64 Xen. 如图所示,kernel和user address space被Xen分开了。这和native X86-64 Linux很像,但是the page offset of the native is 0x ffff810000000000, and it is below the first address available for the guest, which is 0x ffff880000000000. Thus, the page offset of x86-64 is set to 0x ffff880000000000.
Xen needs to intercept system calls and bounce them back to the guest kernel. The SYSCALL and SYSRET instructions are designed for operating systems that use a flat memory model ( segmentation is not used), and x86-64 Linux uses these.
SYSCALL is, however, intended for use by user code running at privilege level 3 to access operating system or executive procedures running at privilege level 0. This implies that x86-64 XenLinux cannot directly receive system calls from user processes.
Despite such extra overheads, however, this framework allows to handle Xen hypercalls in the same fashion, and the hypercalls from the kernel are handled in the optimal fashion.
1. Xen must be aware in which mode the guest is running, kernel or user.
2. SWAPGS is done by Xen so that the guest kernel can access PDA correctly without major modifications.
3. the guest requests Xen to switch to the user mode via a hypercall.
4. The guests can modify the GS.base via a hypercall.
当执行完系统调用,需要从用户模式回到内核模式时,64位Linux需要执行hypercall iret,resulting in toggle_guest_mode from the kernel to the user mode in Xen。
当从用户模式进入内核模式时,toggle_guest_mode也被调用。for example, upon exception or external interrupt in the user mode so that the kernel can handle the event notified by Xen.
The context code, especially, switch_mm (used for switching the address space) is simple and efficiently done by a multicall.
阅读(1425) | 评论(0) | 转发(0) |