Chinaunix首页 | 论坛 | 博客
  • 博客访问: 607385
  • 博文数量: 197
  • 博客积分: 7001
  • 博客等级: 大校
  • 技术积分: 2155
  • 用 户 组: 普通用户
  • 注册时间: 2005-02-24 00:29
文章分类

全部博文(197)

文章存档

2022年(1)

2019年(2)

2015年(1)

2012年(100)

2011年(69)

2010年(14)

2007年(3)

2005年(7)

分类: LINUX

2010-05-03 23:22:41

代码代码io.c已经被移除

 

http://lists.xensource.com/archives/html/xen-devel/2006-06/msg00166.html

[Xen-devel] [PATCH 1/9] Xen Share: Simplified I/O Mechanism, Rusty Russell, 2006/06/05

一: 总体模型

来自lguest.txt

Lguest I/O model:
Lguest uses a simplified DMA model plus shared memory for I/O.  Guests can communicate with each other if they share underlying memory (usually by the lguest program mmaping the same file), but they can use any non-shared memory to communicate with the lguest process.
Guests can register DMA buffers at any key (must be a valid physical address) using the LHCALL_BIND_DMA(key, dmabufs, num<<8|irq) hypercall.  "dmabufs" is the physical address of an array of "num" "struct lguest_dma": each contains a used_len, and an array of physical addresses and lengths.  When a transfer occurs, the "used_len" field of one of the buffers which has used_len 0 will be set to the length transferred and the irq will fire.
Using an irq value of 0 unbinds the dma buffers.
To send DMA, the LHCALL_SEND_DMA(key, dma_physaddr) hypercall is used, and the bytes used is written to the used_len field.  This can be 0 if noone else has bound a DMA buffer to that key or some other error. DMA buffers bound by the same guest are ignored.

 

来自lguest_launcher.h

/*D:200
 * Lguest I/O
 *
 * The lguest I/O mechanism is the only way Guests can talk to devices.  There
 * are two hypercalls involved: SEND_DMA for output and BIND_DMA for input.  In
 * each case, "struct lguest_dma" describes the buffer: this contains 16
 * addr/len pairs, and if there are fewer buffer elements the len array is
 * terminated with a 0.
 *
 * I/O is organized by keys: BIND_DMA attaches buffers to a particular key, and
 * SEND_DMA transfers to buffers bound to particular key.  By convention, keys
 * correspond to a physical address within the device's page.  This means that
 * devices will never accidentally end up with the same keys, and allows the
 * Host use The Futex Trick (as we'll see later in our journey).
 *
 * SEND_DMA simply indicates a key to send to, and the physical address of the
 * "struct lguest_dma" to send.  The Host will write the number of bytes
 * transferred into the "struct lguest_dma"'s used_len member.
 *
 * BIND_DMA indicates a key to bind to, a pointer to an array of "struct
 * lguest_dma"s ready for receiving, the size of that array, and an interrupt
 * to trigger when data is received.  The Host will only allow transfers into
 * buffers with a used_len of zero: it then sets used_len to the number of
 * bytes transferred and triggers the interrupt for the Guest to process the
 * new input. */

struct lguest_dma
{
 /* 0 if free to be used, filled by the Host. */
  u32 used_len;
 unsigned long addr[LGUEST_MAX_DMA_SECTIONS];
 u16 len[LGUEST_MAX_DMA_SECTIONS];
};

(1)每一个虚拟拟设备一般都有内嵌一个或多个lguest_dma结构。

(2)每一个虚拟机都包含lguest_dma_info 数组标明最多支持的DMA数目。

struct lguest_dma_info dma[LGUEST_MAX_DMA];

struct lguest_dma_info
{
 struct list_head list;
 union futex_key key;
 unsigned long dmas; //lguest_dma数组的首址
 u16 next_dma;
 u16 num_dmas;  //lguest_dma数组的大小
 u16 guestid;
 u8 interrupt;  /* 0 when not registered */
};

(3)为了让虚拟机之间能够share memory,每个有效 lguest_dma_info 要挂入相应的hash表。

static struct list_head dma_hash[61];

 

bind_dma做两件事:

(1)将lguest_dma注册到虚拟机的 lguest_dma_info 数组

(2)挂入 dma_hash,以便能快速找到共享的其他虚拟机lguest_dma_info 对象

 

 

二  虚拟设备到guest的异步请求(类似中断)

1 setup_waker 创建一个waker进程, waker进程将在各虚拟设备的输入端(实际就是各个已打开的fd)进行监听(wake_parent函数中的select), 当虚拟设备产生输入时, waker 发出LHREQ_BREAK请求

                                             write(lguest_fd, args, sizeof(args));

waker 切换到hypervisor, 调用break_guest_out()函数,执行如下代码:

if (on) {
  lg->break_out = 1;
  /* Pop it out (may be running on different CPU) */
  wake_up_process(lg->tsk);
  /* Wait for them to reset it */
  return wait_event_interruptible(lg->break_wq, !lg->break_out);

}

 

2  guest OS运行, 因为多种情况(如真实硬件中断,hypercall等)返回hypervisor执行run_guest()函数:

/* If Waker set break_out, return to Launcher. */
  if (lg->break_out)
   return -EAGAIN;

 

3  返回用户态(即launcher),调用run_guest=> handle_input  处理各虚拟设备的输出,并释放waker进程。

/* Service input, then unset the BREAK which releases
   * the Waker. */
  handle_input(lguest_fd, device_list);
  if (write(lguest_fd, args, sizeof(args)) < 0)

 

handle_input的处理见下(例如handle_console_input、handle_tun_input等),

        它们将调用get_dma_buffer得到虚拟设备的DMA buffer,然后将虚拟设备的输入读入buffer, 再调用trigger_irq 将lg->irqs_pending置位

 

launcher 处理完handle_input后,将再次调用下面这行进入hypervisor

 

readval = read(lguest_fd, arr, sizeof(arr));

4. hypervisor将执行maybe_do_interrupt对guest os注入中断,然后继续运行guest

 

 

三 guest到虚拟设备的同步操作请求 

 

以虚拟块设备为例

1    当guest os运行访问块设备(其代码在lguest_blk.c函数中),将发出LHCALL_SEND_DMA切入Hypervisor, Hypervisor的主体是下面的run_guest函数, 有如下代码,  do_hypercalls 将处理SEND_DMA hypercall ,SEND_DMA hypercall 的处理函数send_dma,该函数有如下语句:

lg->dma_is_pending = 1;

 

然后根据lg->dma_is_pending被置位切换回launcher 。

int run_guest(struct lguest *lg, unsigned long __user *user)
{
   /* First we run any hypercalls the Guest wants done: either in
   * the hypercall ring in "struct lguest_data", or directly by
   * using int 31 (LGUEST_TRAP_ENTRY). */
  do_hypercalls(lg);
  /* It's possible the Guest did a SEND_DMA hypercall to the
   * Launcher, in which case we return from the read() now. */
  if (lg->dma_is_pending) {
   if (put_user(lg->pending_dma, user) ||
       put_user(lg->pending_key, user+1))
    return -EFAULT;
   return sizeof(unsigned long)*2;
  }

 

2. 在第一步之前 launcher 运行 run_guest(int lguest_fd, struct device_list *device_list), 读/dev/lguest 文件 导致guest os运行,launcher阻塞。读/dev/lguest 文件的read函数有如下代码:

/* If we returned from read() last time because the Guest sent DMA,
  * clear the flag. */
 if (lg->dma_is_pending)
  lg->dma_is_pending = 0;

 /* Run the Guest until something interesting happens. */
 return run_guest(lg, (unsigned long __user *)user);

 

现在重新回到launcher的 run_guest(int lguest_fd, struct device_list *device_list),其中有如下代码

 

if (readval == sizeof(arr)) {
   handle_output(lguest_fd, arr[0], arr[1], device_list);
   continue;

可以看到处理 handle_output后,继续回到guest os的执行

 

3  handle_output对于handle_console_output 、handle_tun_output是非常简单的,只是一个write操作而已,但是对于块设备handle_block_output 则相对复杂。

handle_block_output函数处理,发起DMA操作(可能读也可能写),然后出发中断trigger_irq。

 

四:网络设备之一---sharenet(已被淘汰),实现intra-guest  communication

文档太差了,不知道如何使用

替代方式

[RFC PATCH 5/5] lguest: Inter-guest networking

setup_net_file 和 dma_transfer 函数

 

主要流程:

如果guest 使用同一个net file ,  该文件一个页面大小,每个guest在该页面中占据一个struct lguest_net 大小(位置则称为slot)。 setup_net_file 为该net fle 创建一个mmap,

有如下两行:

dev->mem = (void *)(dev->desc->pfn * getpagesize());

if (mmap(dev->mem, getpagesize(), PROT_READ|PROT_WRITE,    MAP_FIXED|MAP_SHARED, netfd, 0) != dev->mem)
   err(1, "could not mmap '%s'", filename);
 

这样保证了所有guest的netfile 设备实际是指向同一页面(文件),每个guest在该页面都登记一个mac地址。lguestnet_open中有如下代码:

 

  /* Copy our MAC address into the device page, so others on the network
  * can find us. */
 memcpy(info->peer[info->me].mac, dev->dev_addr, ETH_ALEN);

 

最后,lguestnet_start_xmit中有如下代码确保找到guest:

/* Look through all the published ethernet addresses to see if we
  * should send this packet. */
 for (i = 0; i < info->mapsize/sizeof(struct lguest_net); i++) {
  /* We don't send to ourselves (we actually can't SEND_DMA to
   * ourselves anyway), and don't send to unused slots.*/
  if (i == info->me || unused_peer(info->peer, i))
   continue;

  /* If it's broadcast we send it.  If they want every packet we
   * send it.  If the destination matches their address we send
   * it.  Otherwise we go to the next peer. */
  if (!broadcast && !promisc(info, i) && !mac_eq(dest, info, i))
   continue;

  pr_debug("lguestnet %s: sending from %i to %i\n",
    dev->name, info->me, i);
  /* Our routine which actually does the transfer. */
  transfer_packet(dev, skb, i);
 }

 

 

+io.c:
+ lguest provides DMA-style transfer, and buffer registration.
+ The guest can dma send to a particular address, or register a
+ set of DMA buffers at a particular address.  This provides
+ inter-guest I/O (for shared addresses, such as a shared mmap)
+ or I/O out to the userspace process (lguest).
+
+ We currently use the futex infrastructure to see if a given
+ address is shared: if it is, we look for another guest which
+ has registered a DMA buffer at this address and copy the data,
+ then interrupt the recipient.  Otherwise, we notify the guest
+ userspace (which has access to all the guest memory) to handle
+ the transfer.
+
+ TODO: We could flip whole pages between guests at this point
+ if we wanted to, however it seems unlikely to be worthwhile.
+ More optimization could be gained by having servers for certain
+ devices within the host kernel itself, avoiding at
+ least two switches into the lguest binary and back.
+


* We want Guests which share memory to be able to DMA to each other: two
 * Launchers can mmap memory the same file, then the Guests can communicate.
 * Fortunately, the futex code provides us with a way to get a "union
 * futex_key" corresponding to the memory lying at a virtual address: if the
 * two processes share memory, the "union futex_key" for that memory will match
 * even if the memory is mapped at different addresses in each.  So we always
 * convert the keys to "union futex_key"s to compare them.
 *
 * Before we dive into this though, we need to look at another set of helper
 * routines used throughout the Host kernel code to access Guest memory.
 :*/

 

五 网络设备之二---TUN/TAP

sharenet的device memory page是各个guest共享,而TUN/TAP有所不同,device memory放置两个设备的mac地址,见setup_tun_net函数,关键是网络设备在slot 1 (NET_PEERNUM), 而TUN 在slot 0。最终网络发包变成了send_dma(TUN设备的key, )。

 

/* We create the net device with 1 page, using the features field of
  * the descriptor to tell the Guest it is in slot 1 (NET_PEERNUM), and
  * that the device has fairly random timing.  We do *not* specify
  * LGUEST_NET_F_NOCSUM: these packets can reach the real world.
  *
  * We will put our MAC address is slot 0 for the Guest to see, so
  * it will send packets to us using the key "peer_offset(0)": */
 dev = new_device(devices, LGUEST_DEVICE_T_NET, 1,
    NET_PEERNUM|LGUEST_DEVICE_F_RANDOMNESS, netfd,
    handle_tun_input, peer_offset(0), handle_tun_output);

 

/* We are peer 0, ie. first slot, so we hand dev->mem to this routine
  * to write the MAC address at the start of the device memory.  */
 configure_device(ipfd, ifr.ifr_name, ip, dev->mem);

阅读(923) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~