In a Xen/x86 system, only the hypervisor runs with full processor privileges (ring 0 in the x86 four-ring model). It has full access to the physical memory available in the system and is responsible for allocating portions of it to running domains.
On a 32-bit x86 system, guest operating systems may use rings 1, 2 and 3 as they see fit. Segmentation is used to prevent the guest OS from accessing the portion of the address space that is reserved for Xen. We expect most guest operating systems will use ring 1 for their own operation and place applications in ring 3.
On 64-bit systems it is not possible to protect the hypervisor from untrusted guest code running in rings 1 and 2. Guests are therefore restricted to run in ring 3 only. The guest kernel is protected from its applications by context switching between the kernel and currently running application.
讨论如下内容:CPU state, exception and interrupt handling, and time.
一、CPU State
All privileged state must be handled by Xen.
The guest OS has no direct access to CR3 and is not permitted to update privileged bits in EFLAGS.
Guest OSes use hypercalls to invoke operations in Xen; these are analogous to system calls but occur from ring 1 to ring 0.
二、Exceptions
A virtual IDT is provided — a domain can submit a table of trap handlers to Xen via the set_trap_table hypercall. The exception stack frame presented to a virtual trap handler is identical to its native equivalent.
三、Interrupts and Events
Interrupts are virtualized by mapping them to event channels, which are delivered asynchronously to the target domain using a callback supplied via the set_callbacks hypercall.
A guest OS can map these events onto its standard interrupt dispatch mechanisms.
Xen is responsible for determining the target domain that will handle each physical interrupt source.
四、Time
Guest operating systems need to be aware of the passage of both real (or wallclock) time and their own ‘virtual time’ (the time for which they have been executing).
Furthermore, Xen has a notion of time which is used for scheduling. The following
notions of time are provided:
Cycle counter time. This provides a fine-grained time reference. The cycle counter time is used to accurately extrapolate the other time references. On SMP machines it is currently assumed that the cycle counter time is synchronized between CPUs. The current x86-based implementation achieves this within inter-CPU communication latencies.
System time. This is a 64-bit counter which holds the number of nanoseconds that have elapsed since system boot.
Wall clock time. This is the time of day in a Unix-style struct timeval (seconds and microseconds since 1 January 1970, adjusted by leap seconds). An NTP client hosted by domain 0 can keep this value accurate.
Domain virtual time. This progresses at the same pace as system time, but only while a domain is executing — it stops while a domain is de-scheduled. Therefore the share of the CPU that a domain receives is indicated by the rate at which its virtual time increases.
Xen exports timestamps for system time and wall-clock time to guest operating systems through a shared page of memory. Xen also provides the cycle counter time at the instant the timestamps were calculated, and the CPU frequency in Hertz.
This allows the guest to extrapolate system and wall-clock times accurately based on the current cycle counter time.
Since all time stamps need to be updated and read atomically a version number is also stored in the shared info page, which is incremented before and after updating the timestamps. Thus a guest can be sure that it read a consistent state by checking the two version numbers are equal and even.
Xen includes a periodic ticker which sends a timer event to the currently executing domain every 10ms.
The Xen scheduler also sends a timer event whenever a domain is scheduled; this allows the guest OS to adjust for the time that has passed while it has been inactive.
In addition, Xen allows each domain to request that they receive a timer event sent at a specified system time by using the set_timer_op hypercall. Guest OSes may use this timer to implement timeout values when they
block.
五、Xen CPU Scheduling
Xen offers a uniform API for CPU schedulers. It is possible to choose from a number of schedulers at boot and it should be easy to add more. The SEDF and Credit schedulers are part of the normal Xen distribution. SEDF will be going away and its use should be avoided once the credit scheduler has stabilized and become the default.
The Credit scheduler provides proportional fair shares of the host’s CPUs to the running domains. It does this while transparently load balancing runnable VCPUs across the whole system.
Note: SMP host support Xen has always supported SMP host systems. When using the credit scheduler, a domain’s VCPUs will be dynamically moved across physical CPUs to maximise domain and system throughput. VCPUs can also be manually restricted to be mapped only on a subset of the host’s physical CPUs, using the pinning mechanism.
六、Privileged Operations
Xen exports an extended interface to privileged domains (viz. Domain 0). This allows such domains to build and boot other domains on the server, and provides control interfaces for managing scheduling, memory, networking, and block devices.
阅读(877) | 评论(0) | 转发(0) |