2013年(31)
分类: LINUX
2013-02-18 16:35:47
原文地址:Linux常用内核态内存分配方式总结 作者:bigluo
一、 alloc_pages类
此类函数主要包括:
struct page * alloc_page(unsigned int gfp_mask)——分配一页物理内存并返回该页物理内存的page结构指针。
struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)——分配 个连续的物理页并返回分配的第一个物理页的page结构指针。
unsigned long get_free_page(unsigned int gfp_mask)——分配一页物理内存并将该物理页全部清零,最后返回一个虚拟(线形)地址。
unsigned long __get_free_page(unsigned int gfp_mask)——Allocates a single page and returns a virtual address。
Unsigned long __get_free_pages(unsigned int gfp_mask, unsigned int order)——Allocates 2order number of pages and returns a virtual address。
struct page * __get_dma_pages(unsigned int gfp_mask, unsigned int order)——Allocates 2order number of pages from the DMA zone and returns a struct page。
此类函数主要通过伙伴分配系统进行分配,它们是linux内核最基本的内存分配函数,一次请求能分配的最大物理页数由变量MAX_ORDER决定。
相应的内存释放函数如下:
void __free_pages(struct page *page, unsigned int order) |
void __free_page(struct page *page) |
void free_page(void *addr) |
二、 kmalloc
这个函数建立在slab分配器之上,主要用于分配范围在 字节— 字节大小以内的小内存区域。并且此函数分配的内存在线形地址和物理地址上都是连续的,它不能分配到所谓的高端内存区域内的内存,高端内存区域内的内存必须由专门的方式来获得。
由于使用伙伴分配系统分配小块内存会带来太多的碎片,因此linux保留了2列cache专门用于小块内存分配。这两列cache一列专门用于DMA分配,起名为size-N(DMA) cache,另一列实用于通用的内存分配,起名为size-N cache,其中N为对应cache的大小。对于每个大小为cs_size的cache linux是采用以下结构体来描述的:
点击(此处)折叠或打开
由以上结构体可知:linux是以大小来同时描述size-N(DMA) cache 和size-N cache的。
由于这些cache的个数是有限的,linux在编译阶段就初始化了一个称为cache_size的静态数组来描述这些cache,该数组定义大致如下:
由该数组可知,linux在编译时已建立好对各个cache的描述,接下来的工作就是在系统启动时分配好cache并将对应的各个cache指针写入该数组中即可。
有了以上数组,kmalloc的工作就变得简单了:根据用户提供的内存分配大小查找该数组,若找到合适大小cache,则调用slab分配器接口分配所需内存,若找不到合适大小,则返回NULL。
对应与kmalloc的内存释放函数为kfree:它释放内存时首先确保所释放的内存指针不为NULL,然后在确保指针指向的内存区域在slab内,最后才是调用slab分配器的接口回收该指针指向的内存区域。
三、 vmalloc
与kmalloc不同的时,vmalloc分配的内存只是在线形地址上是连续的,它不保证分配的内存在物理上也连续。vmalloc的主要目的是用于非连续物理内存分配。
采用vmalloc的主要原因在于:由于伙伴分配系统会产生外碎片,我们很难找到大块的在物理上连续的物理内存,因此linux采用vmalloc来解决这个问题。
原理:linux在内核虚拟地址空间中保留了从VMALLOC_START到VMALLOC_END之间的区域用于vmalloc分配内存,在vmalloc分配内存时,最终还是要调用伙伴分配器的接口来一页一页地(因此不要求页与页之间连续)分配物理内存,当分配到物理内存后再通过修改页表来使连续的虚拟地址空间对应到不连续的物理地址空间。由此我们知道vmalloc分配的物理内存都是物理页的整数倍的,另外由于为vmalloc保留的线形地址空间是有限的,因此我们能够通过vmalloc获得物理内存也是有限的。
通过vmalloc分配出去的线形空间是通过vm_struct结构体来描述的,它的定义如下:
点击(此处)折叠或打开
已分配出去的线形地址空间通过结构体的next域连接起来,它们按地址排序,当要分配一块新的地址空间时可通过查找这个连表并结合保留给vmalloc的线形空间来获得新的可用区域信息。这个结构体本身所需的空间是通过kmalloc获得的。
与vmalloc对应的内存释放函数名为vfree。
接下来是2个只用于高端内存分配的函数
四、 kmap
在内核线形地址空间顶部linux保留了一段区域用于kmap函数映射高端内存,这个地址空间范围为从PKMAP_BASE到FIXADDR_START。
以下摘自《深入linux虚拟内存管理》:
Space is reserved at the top of the kernel page tables from PKMAP_BASE to FIXADDR_START for a PKMap. The size of the space reserved varies slightly. On the x86, PKMAP_BASE is at 0xFE000000, and the address of FIXADDR_START is a compile time constant that varies with configure options, but that is typically only a few pages located near the end of the linear address space. This means that there is slightly below 32MiB of page table space for mapping pages from high memory into usable space.
For mapping pages, a single page set of PTEs is stored at the beginning of the PKMap area to allow 1,024 high pages to be mapped into low memory for short periods with the function kmap() and to be unmapped with kunmap(). The pool seems very small, but the page is only mapped by kmap() for a very short time. Comments in the code indicate that there was a plan to allocate contiguous page table entries to expand this area, but it has remained just that, comments in the code, so a large portion of the PKMap is unused.
The page table entry for use with kmap() is called pkmap_page_table, which is located at PKMAP_BASE and which is set up during system initialization. On the x86, this takes place at the end of the pagetable_init() function. The pages for the PGD and PMD entries are allocated by the boot memory allocator to ensure they exist.
The current state of the page table entries is managed by a simple array called pkmap_count, which has LAST_PKMAP entries in it. On an x86 system without PAE, this is 1,024, and, with PAE, it is 512. More accurately, albeit not expressed in code, the LAST_PKMAP variable is equivalent to PTRS_PER_PTE.
Each element is not exactly a reference count, but it is very close. If the entry is 0, the page is free and has not been used since the last TLB flush. If it is 1, the slot is unused, but a page is still mapped there waiting for a TLB flush. Flushes are delayed until every slot has been used at least once because a global flush is required for all CPUs when the global page tables are modified and is extremely expensive. Any higher value is a reference count of n-1 users of the page.
五、 kmap_atomic
The use of kmap_atomic() is discouraged, but slots are reserved for each CPU for when they are necessary, such as when bounce buffers are used by devices from interrupt. There are a varying number of different requirements an architecture has for atomic high memory mapping, which are enumerated by km_type. The total number of uses is KM_TYPE_NR. On the x86, there are a total of six different uses for atomic kmaps.
KM_TYPE_NR entries per processor are reserved at boot time for atomic mapping at the location FIX_KMAP_BEGIN and ending at FIX_KMAP_END. Obviously, a user of an atomic kmap may not sleep or exit before calling kunmap_atomic() because the next process on the processor may try to use the same entry and fail.
The function kmap_atomic() has the very simple task of mapping the requested page to the slot set aside in the page tables for the requested type of operation and processor. The function kunmap_atomic() is interesting because it will only clear the PTE with pte_clear() if debugging is enabled. It is considered unnecessary to bother unmapping atomic pages because the next call to kmap_atomic() will simply replace it and make TLB flushes unnecessary.