全部博文(165)
分类: LINUX
2013-08-09 15:15:03
Pagemap 是用户空间的透视图。
页映射是一个新的(作为2.6.25)的接口,在内核中,允许用户空间的程序通过读取/ proc文件来以检查页表和相关信息。
三个组成部分
1、页映射:/proc/pid/pagemap。该文件允许用户空间程序找出每个虚拟页映射到物理帧。每个虚拟页面对应一个64位的值。包含以下数据(fs/proc/task_mmu.c,pagemap_read方法读取):
* Bits 0-55 page frame number(PFN) if present
* Bits 0-4 swap type if swapped
* Bits 5-55 swap offset if swapped
* Bits 55-60 page shift (page size = 1<
* Bit 61 reserved for future use
* Bit 62 page swapped
* Bit 63 page present
如果页面交换区内,PFN由文件描述符与在waps内的页偏移组成。未映射页面返回一个空的PFN。这精确地确定哪些页面被映射(或交换)和进程间的映射页面。
使用/proc/pid/maps可以高效的确定映射的内存区域、跳过未映射的区域。
/proc/kpagecount:这个文件包含64位计数 , 表示每一页被映射的次数,按照PFN值固定索引。
/proc/kpageflags:此文件包含为64位的标志集 ,表示该页的属性,按照PFN索引。
The flags are (from fs/proc/proc_misc, abovekpageflags_read):
0. LOCKED
1. ERROR
2. REFERENCED
3. UPTODATE
4. DIRTY
5. LRU
6. ACTIVE
7. SLAB
8. WRITEBACK
9. RECLAIM
10. BUDDY
使用页映射做一些有用的东西:使用页映射查看内存使用的流程:
1、读取/proc/pid/maps,以确定这部分的内存空间 映射到哪里
2。选择你感兴趣的,全部或一个特定的部分,或栈或堆等
3。打开/proc/pid/pagemap,找到你想检查的那些页。
4。读取每页对应的那个64位的值 。
5、读取/proc/kpagecount 和/or /proc/kpageflags,找到你想要的数据
另外:读取该文件必须以8字节对齐,否则返回-EINVAL。
This patch enables extraction of the pfn of a hugepage from
/proc/pid/pagemap in an architecture independent manner.
Details
-------
My test program (leak_pagemap) works as follows:
- creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,)
- read()/write() something on it,
- call page-types with option -p,
- munmap() and unlink() the file on hugetlbfs
Without my patches
------------------
$ ./leak_pagemap
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000000 1 0 __________________________________
0x0000000000000804 1 0 __R________M______________________ referenced,mmap
0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
total 101 0
The output of page-types don't show any hugepage.
With my patches
---------------
$ ./leak_pagemap
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000000 1 0 __________________________________
0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge
0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge
0x0000000000000804 1 0 __R________M______________________ referenced,mmap
0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap
0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap
0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked
0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked
0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked
total 51300 200
The output of page-types shows 51200 pages contributing to hugepages,
containing 100 head pages and 51100 tail pages as expected.
add per node hstate attributes
Add the per huge page size control/query attributes to the per node
sysdevs:
/sys/devices/system/node/node
nr_hugepages - r/w
free_huge_pages - r/o
surplus_huge_pages - r/o
The patch attempts to re-use/share as much of the existing global hstate
attribute initialization and handling, and the "nodes_allowed" constraint
processing as possible.
Calling set_max_huge_pages() with no node indicates a change to global
hstate parameters. In this case, any non-default task mempolicy will be
used to generate the nodes_allowed mask. A valid node id indicates an
update to that node's hstate parameters, and the count argument specifies
the target count for the specified node. From this info, we compute the
target global count for the hstate and construct a nodes_allowed node mask
contain only the specified node.
Setting the node specific nr_hugepages via the per node attribute
effectively ignores any task mempolicy or cpuset constraints.
With this patch:
(me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
./ ../ free_hugepages nr_hugepages surplus_hugepages
Starting from:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 0
Node 2 HugePages_Free: 0
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 0
Allocate 16 persistent huge pages on node 2:
(me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
[Note that this is equivalent to:
numactl -m 2 hugeadmin --pool-pages-min 2M:+16
]
Yields:
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 16
Node 2 HugePages_Free: 16
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0
vm.nr_hugepages = 16
Global controls work as expected--reduce pool to 8 persistent huge pages:
(me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
Node 0 HugePages_Total: 0
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 HugePages_Total: 0
Node 1 HugePages_Free: 0
Node 1 HugePages_Surp: 0
Node 2 HugePages_Total: 8
Node 2 HugePages_Free: 8
Node 2 HugePages_Surp: 0
Node 3 HugePages_Total: 0
Node 3 HugePages_Free: 0
Node 3 HugePages_Surp: 0