Chinaunix首页 | 论坛 | 博客
  • 博客访问: 613975
  • 博文数量: 197
  • 博客积分: 7001
  • 博客等级: 大校
  • 技术积分: 2155
  • 用 户 组: 普通用户
  • 注册时间: 2005-02-24 00:29
文章分类

全部博文(197)

文章存档

2022年(1)

2019年(2)

2015年(1)

2012年(100)

2011年(69)

2010年(14)

2007年(3)

2005年(7)

分类: LINUX

2010-05-09 22:42:55

一: file-backed
 
简而言之:struct page(mapping成员)-> struct address_space (两个链表成员i_mmap and i_mmap_shared) =》每一个映射到该文件的vm_area_struct  =》mm_struct==>进程page table.见
 
The struct page structure for a given page is in the upper left corner. One of the fields of that structure is called mapping; it points to an address_space structure describing the object which backs up that page. That structure includes the inode for the file, various data structures for managing the pages belonging to the file, and two linked lists (i_mmap and i_mmap_shared) containing the vm_area_struct structures for each process which has a mapping into the file. The vm_area_struct (usually called a "VMA") describes how the mapping appears in a particular process's address space; the file /proc/pid/maps lists out the VMAs for the process with ID pid. The VMA provides the information needed to find out what a given page's virtual address is in that process's address space, and that, in turn, can be used to find the correct page table entry.
 
 
问题:
 
已知(1)other process的VMA (2)页面page, 如何确定other process的VMA 是否映射了和页面page同样的物理页帧?
见下面的patch。 page->index 包含了页面在file中的index,vma->vm_pgoff包含了vma在文件中的偏移,所以下面的两行可以确定other process
映射同样页帧的virtual address,
 
loffset = (page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT));
address = vma->vm_start + ((loffset - vma->vm_pgoff) << PAGE_SHIFT);
 
 
剩下的事情就非常简单了,检查页帧是否相等。
if (page_to_pfn(page) != pte_pfn(*pte))

 
 [PATCH 2.5.62] Full updated partial object-based rmap
 
try_to_unmap_obj_one

+static inline int
+try_to_unmap_obj_one(struct vm_area_struct *vma, struct page *page)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *pte;
+ pte_t pteval;
+ unsigned long loffset;
+ unsigned long address;
+ int ret = SWAP_SUCCESS;
+
+ loffset = (page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT));
+ if (loffset < vma->vm_pgoff)
+ goto out;
+
+ address = vma->vm_start + ((loffset - vma->vm_pgoff) << PAGE_SHIFT);
+
+ if (address >= vma->vm_end)
+ goto out;
+
+ if (!spin_trylock(&mm->page_table_lock)) {
+ ret = SWAP_AGAIN;
+ goto out;
+ }
+ pgd = pgd_offset(mm, address);
+ if (!pgd_present(*pgd))
+ goto out_unlock;
+
+ pmd = pmd_offset(pgd, address);
+ if (!pmd_present(*pmd))
+ goto out_unlock;
+
+ pte = pte_offset_map(pmd, address);
+ if (!pte_present(*pte))
+ goto out_unmap;
+
+ if (page_to_pfn(page) != pte_pfn(*pte))
+ goto out_unmap;
+
+ if (vma->vm_flags & VM_LOCKED) {
+ ret =  SWAP_FAIL;
+ goto out_unmap;
+ }
+
+ flush_cache_page(vma, address);
+ pteval = ptep_get_and_clear(pte);
+ flush_tlb_page(vma, address);
+
+ if (pte_dirty(pteval))
+ set_page_dirty(page);
+
+ if (atomic_read(&page->pte.mapcount) == 0)
+ BUG();
+
+ mm->rss--;
+ atomic_dec(&page->pte.mapcount);
+ page_cache_release(page);
+
+out_unmap:
+ pte_unmap(pte);
+
+out_unlock:
+ spin_unlock(&mm->page_table_lock);
+
+out:
+ return ret;
+}
 
 
优化措施: radix priority search tree ,见 
 
 Documentation/prio_tree.txt 和ULK 3RD。目标是为了快速定位page所属的VMA, 因为有些VMA虽然映射到同一文件,但并不包含该页,这样可能会导致性能问题scalability 。
 
2 Anonymous Pages
 
共享Anonymous Pages有两种情况(1)父子进程 (2)Another (quite unusual) case occurs when a process creates a memory region specifying both the MAP_ANONYMOUS and MAP_SHARED flag: the pages of such a region will be shared among the future descendants of the process. 见ULK 3RD 17.2.1. Reverse Mapping for Anonymous Pages.
 
从这样来看,这样的vma总是处于相同的virtual address(有趣的是mremap将导致结论不成立),而file-backed 可以映射到不同的VA,见 和。
 
处理方式:
创建一个anon-vma对象将有相关的Anonymous VMA链接起来,anonymous page的mapping成员指向anon-vma,这样unmap该page时就可以找到所有可能相关的VMA。
 
 
问题:  
已知(1)other process的Anonymous VMA (2)Anonymous页面page, 如何确定other process的Anonymous VMA 是否映射了和Anonymous页面page同样的物理页帧?
 解答: 和前面file-back VMA处理方式类似。
 
优化措施:anon_vma_chain (2.6.34)
 
 
简而言之,为了避免搜索不必要的vma(scalablity), 现在每一个进程都有anon_vma, 这样子进程COW后的page 进行rmap 时只要处理自己的anon_vma(当然也包括自己的子进程), 但是不要处理父进程和兄弟进程的VMA。这样对于子进程是O(1)操作,当然父进程还是跑不了O(N)操作。
现在问题是如何组织数据结构:
/*
 * The copy-on-write semantics of fork mean that an anon_vma
 * can become associated with multiple processes. Furthermore,
 * each child process will have its own anon_vma, where new
 * pages for that process are instantiated.
 *
 * This structure allows us to find the anon_vmas associated
 * with a VMA, or the VMAs associated with an anon_vma.
 * The "same_vma" list contains the anon_vma_chains linking
 * all the anon_vmas associated with this VMA.
 * The "same_anon_vma" list contains the anon_vma_chains
 * which link all the VMAs associated with this anon_vma.
 */
struct anon_vma_chain {
 struct vm_area_struct *vma;
 struct anon_vma *anon_vma;
 struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
 struct list_head same_anon_vma; /* locked by anon_vma->lock */
};
 
 
每个anon_vma_chain 映射一对(struct anon_vma *anon_vma,struct vm_area_struct *vma),注意anon_vma和vma不一定属于同一个进程,anon_vma可能属某个进程,而vma属于该进程的子孙进程。
该anon_vma_chain 也分别出现在struct anon_vma *anon_vma的链表中 和struct vm_area_struct *vma的链表中
链表一:
page(mapping成员)-->anon_vma(head成员对应anon_vma_chain 链表)-->每个anon_vma_chain (
same_anon_vma成员把链表组织在一起,vma成员对应的VMA结构) 。这样可以从anon_vma出发找到相关VMA.
 
static int try_to_unmap_anon(struct page *page, enum ttu_flags flags)
{
 struct anon_vma *anon_vma;
 struct anon_vma_chain *avc;
 int ret = SWAP_AGAIN;
 anon_vma = page_lock_anon_vma(page);
 if (!anon_vma)
  return ret;
 list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
  struct vm_area_struct *vma = avc->vma;
  unsigned long address = vma_address(page, vma);
  if (address == -EFAULT)
   continue;
  ret = try_to_unmap_one(page, vma, address, flags);
  if (ret != SWAP_AGAIN || !page_mapped(page))
   break;
 }
 page_unlock_anon_vma(anon_vma);
 return ret;
}
 
链表二: 一个VMA可能对应多个AVC(一个是自己的anon_vma 指向AVC再指向自己,另一个是父进程的anon_vma 通过另外一个AVC再指向自己。 有没有多与两个AVC的情况?答案是有:子进程如果再创建子进程, 孙进程的VMA对应3个AVC, 如果曾孙进程,则可能更多,参见anon_vma_clone)。vma的anon_vma_chain成员是AVC链表的表头,AVC通过same_vma成员链在一起.
 

/*
 * Attach the anon_vmas from src to dst.
 * Returns 0 on success, -ENOMEM on failure.
 */
int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
 struct anon_vma_chain *avc, *pavc;
 list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
  avc = anon_vma_chain_alloc();
  if (!avc)
   goto enomem_failure;
  anon_vma_chain_link(dst, avc, pavc->anon_vma);
 }
 return 0;
 enomem_failure:
 unlink_anon_vmas(dst);
 return -ENOMEM;
}
 
 
阅读(1728) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~