Chinaunix首页 | 论坛 | 博客
  • 博客访问: 294413
  • 博文数量: 44
  • 博客积分: 10
  • 博客等级: 民兵
  • 技术积分: 1354
  • 用 户 组: 普通用户
  • 注册时间: 2012-04-08 15:38
个人简介

人生像是在跑马拉松,能够完赛的都是不断地坚持向前迈进;人生就是像在跑马拉松,不断调整步伐,把握好分分秒秒;人生还是像在跑马拉松,能力决定了能跑短程、半程还是全程。人生其实就是一场马拉松,坚持不懈,珍惜时间。

文章分类

分类: LINUX

2015-05-02 10:11:13

根据git的合入记录,CMAContiguous Memory Allocator,连续内存分配器)是在内核3.5的版本引入,由三星的工程师开发实现的,用于DMA映射框架下提升连续大块内存的申请。

其实现主要是在系统引导时获取内存,并将内存设置为MIGRATE_CMA迁移类型,然后再将内存归还给系统。内核分配内存时,在CMA管理内存中仅允许申请可移动类型内存页面(movable pages),例如DMA映射时不使用的页面缓存。而通过dma_alloc_from_contiguous()申请大块连续内存时,将会把这些可移动页面从CMA管理区中迁移出去,以便腾出足够的连续内存空间满足申请需要。由此,实现了任何时刻只要系统中有足够的内存空间,便可以申请得到大块连续内存。

先由其初始化开始分析,于/drivers/base/dma-contiguous.c代码文件中,可以找到其初始化函数cma_init_reserved_areas(),其通过core_initcall()注册到系统初始化中。
   先看一下cma_init_reserved_areas()实现:

  1. 【file:/drivers/base/dma-contiguous.c】
  2. static int __init cma_init_reserved_areas(void)
  3. {
  4.     int i;
  5.  
  6.     for (i = 0; i < cma_area_count; i++) {
  7.         int ret = cma_activate_area(&cma_areas[i]);
  8.         if (ret)
  9.             return ret;
  10.     }
  11.  
  12.     return 0;
  13. }

    其主要是通过遍历cma_areasCMA管理区信息,调用cma_activate_area()将各个区进行初始化。其中cma_areas信息来自于DMA初始化时:start_kernel()->setup_arch()->    dma_contiguous_reserve()读取来自cmdline的信息,然后通过dma_contiguous_reserve_area()进行内存预留和cma_areas内存信息设置。具体这里不做深入分析。

    而继续cma_activate_area()的实现:

  1. 【file:/drivers/base/dma-contiguous.c】
  2. static int __init cma_activate_area(struct cma *cma)
  3. {
  4.     int bitmap_size = BITS_TO_LONGS(cma->count) * sizeof(long);
  5.     unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
  6.     unsigned i = cma->count >> pageblock_order;
  7.     struct zone *zone;
  8.  
  9.     cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  10.  
  11.     if (!cma->bitmap)
  12.         return -ENOMEM;
  13.  
  14.     WARN_ON_ONCE(!pfn_valid(pfn));
  15.     zone = page_zone(pfn_to_page(pfn));
  16.  
  17.     do {
  18.         unsigned j;
  19.         base_pfn = pfn;
  20.         for (j = pageblock_nr_pages; j; --j, pfn++) {
  21.             WARN_ON_ONCE(!pfn_valid(pfn));
  22.             if (page_zone(pfn_to_page(pfn)) != zone)
  23.                 return -EINVAL;
  24.         }
  25.         init_cma_reserved_pageblock(pfn_to_page(base_pfn));
  26.     } while (--i);
  27.  
  28.     return 0;
  29. }

   该函数主要是对CMA管理区进行初始化,先是kzalloc()申请位图,然后以最高阶pageblock_order的页面数量pageblock_nr_pages为单位对该区的内存页面进行检验,确保该数量单位的内存页面都合法且同处于一个内存管理区,也就是保证至少有一个最高阶的pageblock_nr_pages数量的内存块会被初始化,如果不够该数量,则返回-EINVAL错误。

    而里面具体初始化页面的函数为init_cma_reserved_pageblock()

  1. 【file:/drivers/base/dma-contiguous.c】
  2. /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
  3. void __init init_cma_reserved_pageblock(struct page *page)
  4. {
  5.     unsigned i = pageblock_nr_pages;
  6.     struct page *p = page;
  7.  
  8.     do {
  9.         __ClearPageReserved(p);
  10.         set_page_count(p, 0);
  11.     } while (++p, --i);
  12.  
  13.     set_pageblock_migratetype(page, MIGRATE_CMA);
  14.  
  15.     if (pageblock_order >= MAX_ORDER) {
  16.         i = pageblock_nr_pages;
  17.         p = page;
  18.         do {
  19.             set_page_refcounted(p);
  20.             __free_pages(p, MAX_ORDER - 1);
  21.             p += MAX_ORDER_NR_PAGES;
  22.         } while (i -= MAX_ORDER_NR_PAGES);
  23.     } else {
  24.         set_page_refcounted(page);
  25.         __free_pages(page, pageblock_order);
  26.     }
  27.  
  28.     adjust_managed_page_count(page, pageblock_nr_pages);
  29. }

该函数先是set_page_count()将页面计数初始化,接着set_pageblock_migratetype()将页面设置为MIGRATE_CMA类型,然后set_page_refcounted()重置页面引用计数后通过__free_pages()将内存释放至伙伴管理算法中,最终是挂到了zone->free_area[order].free_list[MIGRATE_CMA](这里的orderpageblock_orderMAX_ORDER-1),最后通过adjust_managed_page_count()修改内存管理页面数量。

初始化基本上就这样了。

CMA的内存分配则是通过dma_generic_alloc_coherent()进行分配的。

  1. 【file:/arch/x86/kernel/pci-dma.c】
  2. void *dma_generic_alloc_coherent(struct device *dev, size_t size,
  3.                  dma_addr_t *dma_addr, gfp_t flag,
  4.                  struct dma_attrs *attrs)
  5. {
  6.     unsigned long dma_mask;
  7.     struct page *page;
  8.     unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  9.     dma_addr_t addr;
  10.  
  11.     dma_mask = dma_alloc_coherent_mask(dev, flag);
  12.  
  13.     flag |= __GFP_ZERO;
  14. again:
  15.     page = NULL;
  16.     /* CMA can be used only in the context which permits sleeping */
  17.     if (flag & __GFP_WAIT)
  18.         page = dma_alloc_from_contiguous(dev, count, get_order(size));
  19.     /* fallback */
  20.     if (!page)
  21.         page = alloc_pages_node(dev_to_node(dev), flag, get_order(size));
  22.     if (!page)
  23.         return NULL;
  24.  
  25.     addr = page_to_phys(page);
  26.     if (addr + size > dma_mask) {
  27.         __free_pages(page, get_order(size));
  28.  
  29.         if (dma_mask < DMA_BIT_MASK(32) && !(flag & GFP_DMA)) {
  30.             flag = (flag & ~GFP_DMA32) | GFP_DMA;
  31.             goto again;
  32.         }
  33.  
  34.         return NULL;
  35.     }
  36.  
  37.     *dma_addr = addr;
  38.     return page_address(page);
  39. }

如果希望从CMA管理区中获取内存,则分配标志flag需允许分配时休眠__GFP_WAIT。进而将通过dma_alloc_from_contiguous()获取到内存。

dma_alloc_from_contiguous()实现:

  1. 【file:/drivers/base/dma-contiguous.c】
  2. /**
  3.  * dma_alloc_from_contiguous() - allocate pages from contiguous area
  4.  * @dev: Pointer to device for which the allocation is performed.
  5.  * @count: Requested number of pages.
  6.  * @align: Requested alignment of pages (in PAGE_SIZE order).
  7.  *
  8.  * This function allocates memory buffer for specified device. It uses
  9.  * device specific contiguous memory area if available or the default
  10.  * global one. Requires architecture specific get_dev_cma_area() helper
  11.  * function.
  12.  */
  13. struct page *dma_alloc_from_contiguous(struct device *dev, int count,
  14.                        unsigned int align)
  15. {
  16.     unsigned long mask, pfn, pageno, start = 0;
  17.     struct cma *cma = dev_get_cma_area(dev);
  18.     struct page *page = NULL;
  19.     int ret;
  20.  
  21.     if (!cma || !cma->count)
  22.         return NULL;
  23.  
  24.     if (align > CONFIG_CMA_ALIGNMENT)
  25.         align = CONFIG_CMA_ALIGNMENT;
  26.  
  27.     pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma,
  28.          count, align);
  29.  
  30.     if (!count)
  31.         return NULL;
  32.  
  33.     mask = (1 << align) - 1;
  34.  
  35.     mutex_lock(&cma_mutex);
  36.  
  37.     for (;;) {
  38.         pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count,
  39.                             start, count, mask);
  40.         if (pageno >= cma->count)
  41.             break;
  42.  
  43.         pfn = cma->base_pfn + pageno;
  44.         ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
  45.         if (ret == 0) {
  46.             bitmap_set(cma->bitmap, pageno, count);
  47.             page = pfn_to_page(pfn);
  48.             break;
  49.         } else if (ret != -EBUSY) {
  50.             break;
  51.         }
  52.         pr_debug("%s(): memory range at %p is busy, retrying\n",
  53.              __func__, pfn_to_page(pfn));
  54.         /* try again with a bit different memory target */
  55.         start = pageno + mask + 1;
  56.     }
  57.  
  58.     mutex_unlock(&cma_mutex);
  59.     pr_debug("%s(): returned %p\n", __func__, page);
  60.     return page;
  61. }

该函数通过dev_get_cma_area()获得设备使用的CMA管理区,然后通过bitmap_find_next_zero_area()查找到CMA管理区中合适大小的未被分配的页面空间,接着调用alloc_contig_range()尝试去分配该查找到的页面空间,如果查找到,则使用bitmap_set()将该空间的bitmap位图进行置位表示已被使用,完了pfn_to_page()通过页框号去得首页面的结构并返回。

其中bitmap_find_next_zero_area()的实现:

  1. 【file:/lib/bitmap.c】
  2. /*
  3.  * bitmap_find_next_zero_area - find a contiguous aligned zero area
  4.  * @map: The address to base the search on
  5.  * @size: The bitmap size in bits
  6.  * @start: The bitnumber to start searching at
  7.  * @nr: The number of zeroed bits we're looking for
  8.  * @align_mask: Alignment mask for zero area
  9.  *
  10.  * The @align_mask should be one less than a power of 2; the effect is that
  11.  * the bit offset of all zero areas this function finds is multiples of that
  12.  * power of 2. A @align_mask of 0 means no alignment is required.
  13.  */
  14. unsigned long bitmap_find_next_zero_area(unsigned long *map,
  15.                      unsigned long size,
  16.                      unsigned long start,
  17.                      unsigned int nr,
  18.                      unsigned long align_mask)
  19. {
  20.     unsigned long index, end, i;
  21. again:
  22.     index = find_next_zero_bit(map, size, start);
  23.  
  24.     /* Align allocation */
  25.     index = __ALIGN_MASK(index, align_mask);
  26.  
  27.     end = index + nr;
  28.     if (end > size)
  29.         return end;
  30.     i = find_next_bit(map, end, index);
  31.     if (i < end) {
  32.         start = i + 1;
  33.         goto again;
  34.     }
  35.     return index;
  36. }

该函数通过find_next_zero_bit()find_next_bit()往返查找bit位置0与置1之间的空间,以期找到足够大的空间以实现空间分配的查找。

alloc_contig_range()的实现:

  1. 【file:/mm/page_alloc.c】
  2. /**
  3.  * alloc_contig_range() -- tries to allocate given range of pages
  4.  * @start: start PFN to allocate
  5.  * @end: one-past-the-last PFN to allocate
  6.  * @migratetype: migratetype of the underlaying pageblocks (either
  7.  * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks
  8.  * in range must have the same migratetype and it must
  9.  * be either of the two.
  10.  *
  11.  * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
  12.  * aligned, however it's the caller's responsibility to guarantee that
  13.  * we are the only thread that changes migrate type of pageblocks the
  14.  * pages fall in.
  15.  *
  16.  * The PFN range must belong to a single zone.
  17.  *
  18.  * Returns zero on success or negative error code. On success all
  19.  * pages which PFN is in [start, end) are allocated for the caller and
  20.  * need to be freed with free_contig_range().
  21.  */
  22. int alloc_contig_range(unsigned long start, unsigned long end,
  23.                unsigned migratetype)
  24. {
  25.     unsigned long outer_start, outer_end;
  26.     int ret = 0, order;
  27.  
  28.     struct compact_control cc = {
  29.         .nr_migratepages = 0,
  30.         .order = -1,
  31.         .zone = page_zone(pfn_to_page(start)),
  32.         .sync = true,
  33.         .ignore_skip_hint = true,
  34.     };
  35.     INIT_LIST_HEAD(&cc.migratepages);
  36.  
  37.     /*
  38.      * What we do here is we mark all pageblocks in range as
  39.      * MIGRATE_ISOLATE. Because pageblock and max order pages may
  40.      * have different sizes, and due to the way page allocator
  41.      * work, we align the range to biggest of the two pages so
  42.      * that page allocator won't try to merge buddies from
  43.      * different pageblocks and change MIGRATE_ISOLATE to some
  44.      * other migration type.
  45.      *
  46.      * Once the pageblocks are marked as MIGRATE_ISOLATE, we
  47.      * migrate the pages from an unaligned range (ie. pages that
  48.      * we are interested in). This will put all the pages in
  49.      * range back to page allocator as MIGRATE_ISOLATE.
  50.      *
  51.      * When this is done, we take the pages in range from page
  52.      * allocator removing them from the buddy system. This way
  53.      * page allocator will never consider using them.
  54.      *
  55.      * This lets us mark the pageblocks back as
  56.      * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the
  57.      * aligned range but not in the unaligned, original range are
  58.      * put back to page allocator so that buddy can use them.
  59.      */
  60.  
  61.     ret = start_isolate_page_range(pfn_max_align_down(start),
  62.                        pfn_max_align_up(end), migratetype,
  63.                        false);
  64.     if (ret)
  65.         return ret;
  66.  
  67.     ret = __alloc_contig_migrate_range(&cc, start, end);
  68.     if (ret)
  69.         goto done;
  70.  
  71.     /*
  72.      * Pages from [start, end) are within a MAX_ORDER_NR_PAGES
  73.      * aligned blocks that are marked as MIGRATE_ISOLATE. What's
  74.      * more, all pages in [start, end) are free in page allocator.
  75.      * What we are going to do is to allocate all pages from
  76.      * [start, end) (that is remove them from page allocator).
  77.      *
  78.      * The only problem is that pages at the beginning and at the
  79.      * end of interesting range may be not aligned with pages that
  80.      * page allocator holds, ie. they can be part of higher order
  81.      * pages. Because of this, we reserve the bigger range and
  82.      * once this is done free the pages we are not interested in.
  83.      *
  84.      * We don't have to hold zone->lock here because the pages are
  85.      * isolated thus they won't get removed from buddy.
  86.      */
  87.  
  88.     lru_add_drain_all();
  89.     drain_all_pages();
  90.  
  91.     order = 0;
  92.     outer_start = start;
  93.     while (!PageBuddy(pfn_to_page(outer_start))) {
  94.         if (++order >= MAX_ORDER) {
  95.             ret = -EBUSY;
  96.             goto done;
  97.         }
  98.         outer_start &= ~0UL << order;
  99.     }
  100.  
  101.     /* Make sure the range is really isolated. */
  102.     if (test_pages_isolated(outer_start, end, false)) {
  103.         pr_warn("alloc_contig_range test_pages_isolated(%lx, %lx) failed\n",
  104.                outer_start, end);
  105.         ret = -EBUSY;
  106.         goto done;
  107.     }
  108.  
  109.  
  110.     /* Grab isolated pages from freelists. */
  111.     outer_end = isolate_freepages_range(&cc, outer_start, end);
  112.     if (!outer_end) {
  113.         ret = -EBUSY;
  114.         goto done;
  115.     }
  116.  
  117.     /* Free head and tail (if any) */
  118.     if (start != outer_start)
  119.         free_contig_range(outer_start, start - outer_start);
  120.     if (end != outer_end)
  121.         free_contig_range(end, outer_end - end);
  122.  
  123. done:
  124.     undo_isolate_page_range(pfn_max_align_down(start),
  125.                 pfn_max_align_up(end), migratetype);
  126.     return ret;
  127. }

该函数主要是用于分配指定页面号的连续内存空间,其特点是内存块不需要页面块或者内存页面阶对齐,而且需要由调用者保证单线程操作,所以在dma_alloc_from_contiguous()调用时是加了互斥锁做保护的,此外被分配的空间不允许跨内存管理区。

为了深入了解其动作,深入分析一下其调用的几个函数,先看一下start_isolate_page_range()

  1. 【file:/mm/page_isolation.c】
  2. /*
  3.  * start_isolate_page_range() -- make page-allocation-type of range of pages
  4.  * to be MIGRATE_ISOLATE.
  5.  * @start_pfn: The lower PFN of the range to be isolated.
  6.  * @end_pfn: The upper PFN of the range to be isolated.
  7.  * @migratetype: migrate type to set in error recovery.
  8.  *
  9.  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  10.  * the range will never be allocated. Any free pages and pages freed in the
  11.  * future will not be allocated again.
  12.  *
  13.  * start_pfn/end_pfn must be aligned to pageblock_order.
  14.  * Returns 0 on success and -EBUSY if any part of range cannot be isolated.
  15.  */
  16. int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  17.                  unsigned migratetype, bool skip_hwpoisoned_pages)
  18. {
  19.     unsigned long pfn;
  20.     unsigned long undo_pfn;
  21.     struct page *page;
  22.  
  23.     BUG_ON((start_pfn) & (pageblock_nr_pages - 1));
  24.     BUG_ON((end_pfn) & (pageblock_nr_pages - 1));
  25.  
  26.     for (pfn = start_pfn;
  27.          pfn < end_pfn;
  28.          pfn += pageblock_nr_pages) {
  29.         page = __first_valid_page(pfn, pageblock_nr_pages);
  30.         if (page &&
  31.             set_migratetype_isolate(page, skip_hwpoisoned_pages)) {
  32.             undo_pfn = pfn;
  33.             goto undo;
  34.         }
  35.     }
  36.     return 0;
  37. undo:
  38.     for (pfn = start_pfn;
  39.          pfn < undo_pfn;
  40.          pfn += pageblock_nr_pages)
  41.         unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
  42.  
  43.     return -EBUSY;
  44. }

将页面类型设置为MIGRATE_ISOLATE意味着指定范围的空闲页面将不会被分配,值得注意的时候,这里的迁移类型变更和前面分析的页面迁移是一致的,都是基于pageblock_nr_pages为基数的页面个数做迁移的。

而里面调用的set_migratetype_isolate()

  1. 【file:/mm/page_isolation.c】
  2. int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages)
  3. {
  4.     struct zone *zone;
  5.     unsigned long flags, pfn;
  6.     struct memory_isolate_notify arg;
  7.     int notifier_ret;
  8.     int ret = -EBUSY;
  9.  
  10.     zone = page_zone(page);
  11.  
  12.     spin_lock_irqsave(&zone->lock, flags);
  13.  
  14.     pfn = page_to_pfn(page);
  15.     arg.start_pfn = pfn;
  16.     arg.nr_pages = pageblock_nr_pages;
  17.     arg.pages_found = 0;
  18.  
  19.     /*
  20.      * It may be possible to isolate a pageblock even if the
  21.      * migratetype is not MIGRATE_MOVABLE. The memory isolation
  22.      * notifier chain is used by balloon drivers to return the
  23.      * number of pages in a range that are held by the balloon
  24.      * driver to shrink memory. If all the pages are accounted for
  25.      * by balloons, are free, or on the LRU, isolation can continue.
  26.      * Later, for example, when memory hotplug notifier runs, these
  27.      * pages reported as "can be isolated" should be isolated(freed)
  28.      * by the balloon driver through the memory notifier chain.
  29.      */
  30.     notifier_ret = memory_isolate_notify(MEM_ISOLATE_COUNT, &arg);
  31.     notifier_ret = notifier_to_errno(notifier_ret);
  32.     if (notifier_ret)
  33.         goto out;
  34.     /*
  35.      * FIXME: Now, memory hotplug doesn't call shrink_slab() by itself.
  36.      * We just check MOVABLE pages.
  37.      */
  38.     if (!has_unmovable_pages(zone, page, arg.pages_found,
  39.                  skip_hwpoisoned_pages))
  40.         ret = 0;
  41.  
  42.     /*
  43.      * immobile means "not-on-lru" paes. If immobile is larger than
  44.      * removable-by-driver pages reported by notifier, we'll fail.
  45.      */
  46.  
  47. out:
  48.     if (!ret) {
  49.         unsigned long nr_pages;
  50.         int migratetype = get_pageblock_migratetype(page);
  51.  
  52.         set_pageblock_migratetype(page, MIGRATE_ISOLATE);
  53.         nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE);
  54.  
  55.         __mod_zone_freepage_state(zone, -nr_pages, migratetype);
  56.     }
  57.  
  58.     spin_unlock_irqrestore(&zone->lock, flags);
  59.     if (!ret)
  60.         drain_all_pages();
  61.     return ret;
  62. }

由该函数可以看到,将页面设置为MIGRATE_ISOLATE类型时,其确保该空间范围内不存在不可以移动页面,同时其设置完页面类型后,通过move_freepages_block会将其从原来的页面类型链表中移除并挂入到MIGRATE_ISOLATE类型的链表中,移入MIGRATE_ISOLATE类型的页面将不会被分配出去。

start_isolate_page_range()完了如果没有异常状况会返回0,继而是调用__alloc_contig_migrate_range()

  1. 【file:/mm/page_isolation.c】
  2. /* [start, end) must belong to a single zone. */
  3. static int __alloc_contig_migrate_range(struct compact_control *cc,
  4.                     unsigned long start, unsigned long end)
  5. {
  6.     /* This function is based on compact_zone() from compaction.c. */
  7.     unsigned long nr_reclaimed;
  8.     unsigned long pfn = start;
  9.     unsigned int tries = 0;
  10.     int ret = 0;
  11.  
  12.     migrate_prep();
  13.  
  14.     while (pfn < end || !list_empty(&cc->migratepages)) {
  15.         if (fatal_signal_pending(current)) {
  16.             ret = -EINTR;
  17.             break;
  18.         }
  19.  
  20.         if (list_empty(&cc->migratepages)) {
  21.             cc->nr_migratepages = 0;
  22.             pfn = isolate_migratepages_range(cc->zone, cc,
  23.                              pfn, end, true);
  24.             if (!pfn) {
  25.                 ret = -EINTR;
  26.                 break;
  27.             }
  28.             tries = 0;
  29.         } else if (++tries == 5) {
  30.             ret = ret < 0 ? ret : -EBUSY;
  31.             break;
  32.         }
  33.  
  34.         nr_reclaimed = reclaim_clean_pages_from_list(cc->zone,
  35.                             &cc->migratepages);
  36.         cc->nr_migratepages -= nr_reclaimed;
  37.  
  38.         ret = migrate_pages(&cc->migratepages, alloc_migrate_target,
  39.                     0, MIGRATE_SYNC, MR_CMA);
  40.     }
  41.     if (ret < 0) {
  42.         putback_movable_pages(&cc->migratepages);
  43.         return ret;
  44.     }
  45.     return 0;
  46. }

该函数中调用的migrate_prep()主要是为了将LRU链表进行清空,以便内存页面更好地隔离出来。

其余的则主要是while循环处理非空闲的页,其中主要涉及函数有isolate_migratepages_range()reclaim_clean_pages_from_list()migrate_pages()

先看一下isolate_migratepages_range()的实现:

  1. 【file:/mm/compaction.c】
  2. /**
  3.  * isolate_migratepages_range() - isolate all migrate-able pages in range.
  4.  * @zone: Zone pages are in.
  5.  * @cc: Compaction control structure.
  6.  * @low_pfn: The first PFN of the range.
  7.  * @end_pfn: The one-past-the-last PFN of the range.
  8.  * @unevictable: true if it allows to isolate unevictable pages
  9.  *
  10.  * Isolate all pages that can be migrated from the range specified by
  11.  * [low_pfn, end_pfn). Returns zero if there is a fatal signal
  12.  * pending), otherwise PFN of the first page that was not scanned
  13.  * (which may be both less, equal to or more then end_pfn).
  14.  *
  15.  * Assumes that cc->migratepages is empty and cc->nr_migratepages is
  16.  * zero.
  17.  *
  18.  * Apart from cc->migratepages and cc->nr_migratetypes this function
  19.  * does not modify any cc's fields, in particular it does not modify
  20.  * (or read for that matter) cc->migrate_pfn.
  21.  */
  22. unsigned long
  23. isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
  24.         unsigned long low_pfn, unsigned long end_pfn, bool unevictable)
  25. {
  26.     unsigned long last_pageblock_nr = 0, pageblock_nr;
  27.     unsigned long nr_scanned = 0, nr_isolated = 0;
  28.     struct list_head *migratelist = &cc->migratepages;
  29.     isolate_mode_t mode = 0;
  30.     struct lruvec *lruvec;
  31.     unsigned long flags;
  32.     bool locked = false;
  33.     struct page *page = NULL, *valid_page = NULL;
  34.     bool skipped_async_unsuitable = false;
  35.  
  36.     /*
  37.      * Ensure that there are not too many pages isolated from the LRU
  38.      * list by either parallel reclaimers or compaction. If there are,
  39.      * delay for some time until fewer pages are isolated
  40.      */
  41.     while (unlikely(too_many_isolated(zone))) {
  42.         /* async migration should just abort */
  43.         if (!cc->sync)
  44.             return 0;
  45.  
  46.         congestion_wait(BLK_RW_ASYNC, HZ/10);
  47.  
  48.         if (fatal_signal_pending(current))
  49.             return 0;
  50.     }
  51.  
  52.     /* Time to isolate some pages for migration */
  53.     cond_resched();
  54.     for (; low_pfn < end_pfn; low_pfn++) {
  55.         /* give a chance to irqs before checking need_resched() */
  56.         if (locked && !((low_pfn+1) % SWAP_CLUSTER_MAX)) {
  57.             if (should_release_lock(&zone->lru_lock)) {
  58.                 spin_unlock_irqrestore(&zone->lru_lock, flags);
  59.                 locked = false;
  60.             }
  61.         }
  62.  
  63.         /*
  64.          * migrate_pfn does not necessarily start aligned to a
  65.          * pageblock. Ensure that pfn_valid is called when moving
  66.          * into a new MAX_ORDER_NR_PAGES range in case of large
  67.          * memory holes within the zone
  68.          */
  69.         if ((low_pfn & (MAX_ORDER_NR_PAGES - 1)) == 0) {
  70.             if (!pfn_valid(low_pfn)) {
  71.                 low_pfn += MAX_ORDER_NR_PAGES - 1;
  72.                 continue;
  73.             }
  74.         }
  75.  
  76.         if (!pfn_valid_within(low_pfn))
  77.             continue;
  78.         nr_scanned++;
  79.  
  80.         /*
  81.          * Get the page and ensure the page is within the same zone.
  82.          * See the comment in isolate_freepages about overlapping
  83.          * nodes. It is deliberate that the new zone lock is not taken
  84.          * as memory compaction should not move pages between nodes.
  85.          */
  86.         page = pfn_to_page(low_pfn);
  87.         if (page_zone(page) != zone)
  88.             continue;
  89.  
  90.         if (!valid_page)
  91.             valid_page = page;
  92.  
  93.         /* If isolation recently failed, do not retry */
  94.         pageblock_nr = low_pfn >> pageblock_order;
  95.         if (!isolation_suitable(cc, page))
  96.             goto next_pageblock;
  97.  
  98.         /*
  99.          * Skip if free. page_order cannot be used without zone->lock
  100.          * as nothing prevents parallel allocations or buddy merging.
  101.          */
  102.         if (PageBuddy(page))
  103.             continue;
  104.  
  105.         /*
  106.          * For async migration, also only scan in MOVABLE blocks. Async
  107.          * migration is optimistic to see if the minimum amount of work
  108.          * satisfies the allocation
  109.          */
  110.         if (!cc->sync && last_pageblock_nr != pageblock_nr &&
  111.             !migrate_async_suitable(get_pageblock_migratetype(page))) {
  112.             cc->finished_update_migrate = true;
  113.             skipped_async_unsuitable = true;
  114.             goto next_pageblock;
  115.         }
  116.  
  117.         /*
  118.          * Check may be lockless but that's ok as we recheck later.
  119.          * It's possible to migrate LRU pages and balloon pages
  120.          * Skip any other type of page
  121.          */
  122.         if (!PageLRU(page)) {
  123.             if (unlikely(balloon_page_movable(page))) {
  124.                 if (locked && balloon_page_isolate(page)) {
  125.                     /* Successfully isolated */
  126.                     cc->finished_update_migrate = true;
  127.                     list_add(&page->lru, migratelist);
  128.                     cc->nr_migratepages++;
  129.                     nr_isolated++;
  130.                     goto check_compact_cluster;
  131.                 }
  132.             }
  133.             continue;
  134.         }
  135.  
  136.         /*
  137.          * PageLRU is set. lru_lock normally excludes isolation
  138.          * splitting and collapsing (collapsing has already happened
  139.          * if PageLRU is set) but the lock is not necessarily taken
  140.          * here and it is wasteful to take it just to check transhuge.
  141.          * Check TransHuge without lock and skip the whole pageblock if
  142.          * it's either a transhuge or hugetlbfs page, as calling
  143.          * compound_order() without preventing THP from splitting the
  144.          * page underneath us may return surprising results.
  145.          */
  146.         if (PageTransHuge(page)) {
  147.             if (!locked)
  148.                 goto next_pageblock;
  149.             low_pfn += (1 << compound_order(page)) - 1;
  150.             continue;
  151.         }
  152.  
  153.         /* Check if it is ok to still hold the lock */
  154.         locked = compact_checklock_irqsave(&zone->lru_lock, &flags,
  155.                                 locked, cc);
  156.         if (!locked || fatal_signal_pending(current))
  157.             break;
  158.  
  159.         /* Recheck PageLRU and PageTransHuge under lock */
  160.         if (!PageLRU(page))
  161.             continue;
  162.         if (PageTransHuge(page)) {
  163.             low_pfn += (1 << compound_order(page)) - 1;
  164.             continue;
  165.         }
  166.  
  167.         if (!cc->sync)
  168.             mode |= ISOLATE_ASYNC_MIGRATE;
  169.  
  170.         if (unevictable)
  171.             mode |= ISOLATE_UNEVICTABLE;
  172.  
  173.         lruvec = mem_cgroup_page_lruvec(page, zone);
  174.  
  175.         /* Try isolate the page */
  176.         if (__isolate_lru_page(page, mode) != 0)
  177.             continue;
  178.  
  179.         VM_BUG_ON_PAGE(PageTransCompound(page), page);
  180.  
  181.         /* Successfully isolated */
  182.         cc->finished_update_migrate = true;
  183.         del_page_from_lru_list(page, lruvec, page_lru(page));
  184.         list_add(&page->lru, migratelist);
  185.         cc->nr_migratepages++;
  186.         nr_isolated++;
  187.  
  188. check_compact_cluster:
  189.         /* Avoid isolating too much */
  190.         if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
  191.             ++low_pfn;
  192.             break;
  193.         }
  194.  
  195.         continue;
  196.  
  197. next_pageblock:
  198.         low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
  199.         last_pageblock_nr = pageblock_nr;
  200.     }
  201.  
  202.     acct_isolated(zone, locked, cc);
  203.  
  204.     if (locked)
  205.         spin_unlock_irqrestore(&zone->lru_lock, flags);
  206.  
  207.     /*
  208.      * Update the pageblock-skip information and cached scanner pfn,
  209.      * if the whole pageblock was scanned without isolating any page.
  210.      * This is not done when pageblock was skipped due to being unsuitable
  211.      * for async compaction, so that eventual sync compaction can try.
  212.      */
  213.     if (low_pfn == end_pfn && !skipped_async_unsuitable)
  214.         update_pageblock_skip(cc, valid_page, nr_isolated, true);
  215.  
  216.     trace_mm_compaction_isolate_migratepages(nr_scanned, nr_isolated);
  217.  
  218.     count_compact_events(COMPACTMIGRATE_SCANNED, nr_scanned);
  219.     if (nr_isolated)
  220.         count_compact_events(COMPACTISOLATED, nr_isolated);
  221.  
  222.     return low_pfn;
  223. }

该函数主要是将low_pfnend_pfn范围内,可用移动的内存页隔离出来,挂到cc->migratepages链表上。为后面的内存迁移做准备。

接着再看reclaim_clean_pages_from_list()

  1. 【file:/mm/vmscan.c】
  2. unsigned long reclaim_clean_pages_from_list(struct zone *zone,
  3.                         struct list_head *page_list)
  4. {
  5.     struct scan_control sc = {
  6.         .gfp_mask = GFP_KERNEL,
  7.         .priority = DEF_PRIORITY,
  8.         .may_unmap = 1,
  9.     };
  10.     unsigned long ret, dummy1, dummy2, dummy3, dummy4, dummy5;
  11.     struct page *page, *next;
  12.     LIST_HEAD(clean_pages);
  13.  
  14.     list_for_each_entry_safe(page, next, page_list, lru) {
  15.         if (page_is_file_cache(page) && !PageDirty(page) &&
  16.             !isolated_balloon_page(page)) {
  17.             ClearPageActive(page);
  18.             list_move(&page->lru, &clean_pages);
  19.         }
  20.     }
  21.  
  22.     ret = shrink_page_list(&clean_pages, zone, &sc,
  23.             TTU_UNMAP|TTU_IGNORE_ACCESS,
  24.             &dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
  25.     list_splice(&clean_pages, page_list);
  26.     __mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret);
  27.     return ret;
  28. }

该函数则主要是将文件缓存、干净的以及非隔离的气球页进行直接回收。

继而分析migrate_pages()

  1. 【file:/mm/migrate.c】
  2. /*
  3.  * migrate_pages - migrate the pages specified in a list, to the free pages
  4.  * supplied as the target for the page migration
  5.  *
  6.  * @from: The list of pages to be migrated.
  7.  * @get_new_page: The function used to allocate free pages to be used
  8.  * as the target of the page migration.
  9.  * @private: Private data to be passed on to get_new_page()
  10.  * @mode: The migration mode that specifies the constraints for
  11.  * page migration, if any.
  12.  * @reason: The reason for page migration.
  13.  *
  14.  * The function returns after 10 attempts or if no pages are movable any more
  15.  * because the list has become empty or no retryable pages exist any more.
  16.  * The caller should call putback_lru_pages() to return pages to the LRU
  17.  * or free list only if ret != 0.
  18.  *
  19.  * Returns the number of pages that were not migrated, or an error code.
  20.  */
  21. int migrate_pages(struct list_head *from, new_page_t get_new_page,
  22.         unsigned long private, enum migrate_mode mode, int reason)
  23. {
  24.     int retry = 1;
  25.     int nr_failed = 0;
  26.     int nr_succeeded = 0;
  27.     int pass = 0;
  28.     struct page *page;
  29.     struct page *page2;
  30.     int swapwrite = current->flags & PF_SWAPWRITE;
  31.     int rc;
  32.  
  33.     if (!swapwrite)
  34.         current->flags |= PF_SWAPWRITE;
  35.  
  36.     for(pass = 0; pass < 10 && retry; pass++) {
  37.         retry = 0;
  38.  
  39.         list_for_each_entry_safe(page, page2, from, lru) {
  40.             cond_resched();
  41.  
  42.             if (PageHuge(page))
  43.                 rc = unmap_and_move_huge_page(get_new_page,
  44.                         private, page, pass > 2, mode);
  45.             else
  46.                 rc = unmap_and_move(get_new_page, private,
  47.                         page, pass > 2, mode);
  48.  
  49.             switch(rc) {
  50.             case -ENOMEM:
  51.                 goto out;
  52.             case -EAGAIN:
  53.                 retry++;
  54.                 break;
  55.             case MIGRATEPAGE_SUCCESS:
  56.                 nr_succeeded++;
  57.                 break;
  58.             default:
  59.                 /*
  60.                  * Permanent failure (-EBUSY, -ENOSYS, etc.):
  61.                  * unlike -EAGAIN case, the failed page is
  62.                  * removed from migration page list and not
  63.                  * retried in the next outer loop.
  64.                  */
  65.                 nr_failed++;
  66.                 break;
  67.             }
  68.         }
  69.     }
  70.     rc = nr_failed + retry;
  71. out:
  72.     if (nr_succeeded)
  73.         count_vm_events(PGMIGRATE_SUCCESS, nr_succeeded);
  74.     if (nr_failed)
  75.         count_vm_events(PGMIGRATE_FAIL, nr_failed);
  76.     trace_mm_migrate_pages(nr_succeeded, nr_failed, mode, reason);
  77.  
  78.     if (!swapwrite)
  79.         current->flags &= ~PF_SWAPWRITE;
  80.  
  81.     return rc;
  82. }

该函数主要实现的是页面迁移操作。其中核心函数是unmap_and_move(),其用于申请新页面,将老页面移过去再进行映射,以实现老页面得以回收。由此可知__alloc_contig_migrate_range()函数主要工作是将页面进行隔离然后再进行分离。

最后回到alloc_contig_range()函数,其从__alloc_contig_migrate_range()返回后,将再次调用lru_add_drain_all(),这里应该是为了防止__alloc_contig_migrate_range()中间休眠时,LRU链表被添加上页面了。而drain_all_pages()则是将每CPU中缓存的页面进行释放,这些页面将会根据其标记释放至MIGRATE_ISOLATE空闲列表中。接着再是test_pages_isolated(),用于检查确保该范围内的页面已经被隔离;isolate_freepages_range()则是将指定范围的空闲页面隔离出来;最后undo_isolate_page_range()则是将所有的标记为隔离的页面重新标记为MIGRATE_CMA,至此所需的连续内存页面已经分配到了,无需在乎其迁移属性了,便更改回去。

     此外CMA管理内存的释放为:

  1. 【file:/mm/page_alloc.c】
  2. void free_contig_range(unsigned long pfn, unsigned nr_pages)
  3. {
  4.     unsigned int count = 0;
  5.  
  6.     for (; nr_pages--; pfn++) {
  7.         struct page *page = pfn_to_page(pfn);
  8.  
  9.         count += page_count(page) != 1;
  10.         __free_page(page);
  11.     }
  12.     WARN(count != 0, "%d pages are still in use!\n", count);
  13. }

于是内存释放再次回归到__free_page(),这就便不再深入了。

原本无意于分析CMA的,一时好奇琢磨了一下,但已琢磨至此,遂记之,但有部分细节存在疑惑有待深入,因涉及面广,待后期深入分析后再进行细化。如有理解错误之处,望不吝指正。

阅读(10375) | 评论(0) | 转发(2) |
给主人留下些什么吧!~~