PostgreSQL内存管理
为描述方便,后文提到的PG都是指PostgreSQL数据库,本文将从以下方面介绍PG动态内存管理的实现:
1. 前言 1
2. 内存上下文 1
3. AllocSet 3
4. AllocBlock 4
5. AllocChunk 5
6. 分配-palloc 5
7. 释放-pfree 12
1. 前言
本文主要介绍PostgreSQL的动态内存管理机制,PostgreSQL的动态内存管理可以带来以下几方面好处:
1) 防止内存泄漏。
传统的malloc/free要保证每一个malloc都要有对应的free来释放;仅仅依赖开发人员的细心难以保证,PostgreSQL的内存管理通过预先分配一大段内存,然后自己来管理这段内存,最后只要在合适的时间点释放这段内存就行,不需要考虑内部的小块内存。
2) 分配效率。
由于堆内存是线程间共享的,所以传统的malloc必须要考虑分配时的线程安全问题,从而要用锁来控制;PostgreSQL由于是多进程架构,每个进程只有一个线程,不需要考虑并发;下面也可以看到,PostgreSQL的内存管理没有涉及到锁,因此性能会快一些。
3) 减少内存碎片。
根据1)的描述,可以看出,通过每次向OS申请一大段内存,然后自己管理,可以有效的减少内存碎片的发生。
2. 内存上下文
内存上下文(Memory Context):简称上下文,是一组内存块的控制结构,指明了该组内存的一般用途,内存上下文只定义了操作内存的一组接口,类似面向对象语言中的抽象类,要根据实际情况实现不同的管理方式,目前PG只实现了一种AllocSet。每个PG后台进程包含有多个这样的内存上下文,这些内存上下文通过树状结构组织在一起,MemoryContext结构如下所示:
-
typedef struct MemoryContextData
-
{
-
NodeTag type; /* identifies exact kind of context */
-
MemoryContextMethods *methods; /* virtual function table */
-
MemoryContext parent; /* NULL if no parent (toplevel context) */
-
MemoryContext firstchild; /* head of linked list of children */
-
MemoryContext nextchild; /* next child of same parent */
-
char *name; /* context name (just for debugging) */
-
} MemoryContextData
type:普遍应用的PG代码中,用于识别一段内存地址所对应的结构类型,以便转化为相对应的数据结构对内存进行操作。
methods:定义了一组对内存上下文进行操作的接口,目前只有AllocSet这一种实现
parent:该上下文的父节点,对于最顶端的上下文,该值为NULL
firstchild:该上下文的第一个孩子节点,如果没有孩子节点,该值为NULL
nextchild:指向和该上下文具有相同父节点的下一个上下文,如果该上下文是这一层的最后一个节点,该值为NULL。
下图根节点S,它包含3个子节点A、B、C,B包含子节点D的结构示意图。
可以看到,具有相同的父亲的节点(又叫兄弟节点)的上下文可以看做由next指针连起来的单向链表,为父节点增加孩子节点时,插入新的节点时采用单链表的前插法,即插入到相同层次节点的最前端,同时修改父节点的first指针。例如,在上面的树中,为S增加一个孩子节点X,最后变为:
3. AllocSet
AllocSet定义如下:
-
typedef struct AllocSetContext
-
{
-
MemoryContextData header; /* Standard memory-context fields */
-
/* Info about storage allocated in this context: */
-
AllocBlock blocks; /* head of list of blocks in this set */
-
AllocChunk freelist[ALLOCSET_NUM_FREELISTS]; /* free chunk lists */
-
bool isReset; /* T = no space alloced since last reset */
-
/* Allocation parameters for this context: */
-
Size initBlockSize; /* initial block size */
-
Size maxBlockSize; /* maximum block size */
-
Size nextBlockSize; /* next block size to allocate */
-
Size allocChunkLimit; /* effective chunk size limit */
-
AllocBlock keeper; /* if not NULL, keep this block over resets */
-
} AllocSetContext;
-
-
typedef AllocSetContext *AllocSet
header:抽象类型MemoryContext,通过这个字段可以区分各个上下文的层次关系
hlocks:实际分配内存的地方,AllocSet中内存的分配是以Block为单位向操作系统申请的,多个Block之间以单向链表的方式连接起来。该blocks是单向链表的头,清除上下文内存时从它开始遍历整个链表,释放所有block中的内存。AllocBlock的结构参见后面AllocBlock部分。
freelist:空闲链表表头,ALLOCSET_NUM_FREELISTS定义如下:
-
#define ALLOC_MINBITS 3 /* smallest chunk size is 8 bytes */
-
#define ALLOCSET_NUM_FREELISTS 11
-
#define ALLOC_CHUNK_LIMIT (1 << (ALLOCSET_NUM_FREELISTS-1+ALLOC_MINBITS))
-
/* Size of largest chunk that we use a fixed size for */
AllocSet的实现中,对于小的内存块(8~8192byte)来说,释放时会缓存到Freelist中,这里freelist相当于的链表头,一共11个单向链表,分别缓存大小为8,16,32,64,128,256,512,1024,2048,4096,8192的内存块,对于申请的内存块大小不在这些范围里的数据块,按向上取整的原则进行分配,例如,申请20byte的内存,AllocSet实际要申请32byte,释放时,把该内存块连接到freelist[2]对应的单链表,作为新的表头。
当有其他地方申请20byte内存时,通过计算先查看freelist[2]链表中是否有空闲块,找到后,把该内存块从freelist[2]中移除,作为可用的内存块返回给用户,同时把该块后面的空闲块作为freelist[2]的链表头。由于每次释放时,新的Chunk总是位于链表头,分配时,也是从链表头部获取,因此,可以把freelist看做一个后进先出的栈。
对于大于8192的内存块,不进行缓存,释放时直接调用free还给操作系统。
AllocChunk参见后面的介绍。
内存大小 缓存位置
1~8 Freelist[0]
9~16 Freelist[1]
17~32 Freelist[2]
33~64 Freelist[3]
65~128 Freelist[4]
129~256 Freelist[5]
257~512 Freelist[6]
513~1024 Freelist[7]
1025~2048 Freelist[8]
2049~4096 Freelist[9]
4097~8192 Freelist[10]
8193~MAX 不缓存
isReset:表示该上下文自从上次Reset后没有在上面分配过内存,可以避免一些不必要的操作。
initBlockSize:Block的初始大小
maxBlockSize:Block的最大大小
nextBlockSize:下一个Block的大小
allocChunkLimit:freelist是否缓存的分水岭。
4. AllocBlock
-
typedef struct AllocBlockData
-
{
-
AllocSet aset; /* aset that owns this block */
-
AllocBlock next; /* next block in aset's blocks list */
-
char *freeptr; /* start of free space in this block */
-
char *endptr; /* end of space in this block */
-
} AllocBlockData;
aset:该Block所属的AllocSet
next:指向后面的可用Block
freeptr:指向该Block空闲空间的初始位置
endptr:指向该Block的最后位置
Block为PG向OS申请内存分配的基本单位,一个Block可能被拆成多个Chunk,也可能只包含一个Chunk(比如较大内存),Chunk释放时放入freelist链表中以便下次分配,最后由Block统一释放归还给OS。下图描述了AllocSet中Block,Chunk,Freelist之间的关系:
5. AllocChunk
-
typedef struct AllocChunkData
-
{
-
/* aset is the owning aset if allocated, or the freelist link if free */
-
void *aset;
-
/* size is always the size of the usable space in the chunk */
-
Size size;
-
} AllocChunkData;
aset:这个字段使用了空间复用的技术,在不同场景有不一样的含义,当该chunk已经分配给用户时,该值指向该chunk所属的上下文,即指向AllocSet;当该chunk被释放时,该值指向所属的空闲链表的下一个chunk,相当于单链表的next指针。
size:该chunk的大小,不包括chunk结构体本身大小。
6. 分配-palloc
palloc内存分配过程:
1. 申请的内存是否大于缓存限制,如果是,则调用malloc申请一块block,该Block仅包含一个Chunk,新Block放到AllocSet->blocks链表的第二个位置(第一个位置永远是分配空间的活动Block),返回内存地址给用户;否则,转入2;
2. 通过申请的大小计算所属的空闲链表,取出链表头,如果链表头不为空,则说明有空闲的Chunk可用,返回该chunk,同时将该chunk从freelist中移除;如果为空,则说明没有可用空闲chunk,转入步骤3;
3. 调整申请内存的大小为2的整数次幂,取出当前Block头,通过freeptr和endptr计算该block可用空间是否满足需要(要大于调整后内存大小);如果小于所申请的大小,则把当前Block剩余部分按大小拆分(拆分方式同样按2的整数次幂拆分,例如100拆分成64和32,最后4字节将浪费掉),放到相应的空闲链表,以便其他分配小块内存时可以使用到这部分空间而不会造成浪费,然后转到步骤4;如果当前Block空间足够,则转到步骤5;如果Block头指针为空,转入步骤4;
4. 新建一个Block,新Block大小为nextBlockSize,最大为maxBlockSize,可根据实际分配情况进行调整,但不能小于用户申请大小;
5. 执行到这一步,说明已有可用的Block,并且大小满足用户所需,这里仅需要调整Block的freeptr的位置,调整Chunk的大小和所指向AllocSet即可,然后返回可用指针给用户。
PS:为了申请时不需要逐个判断chunk是否满足要求,这里采用了多分配的机制,例如申请17~32 bytes ,最终都是分配32byte,会造成一定的浪费,但为了加快分配速度在所难免。实际上大多数的内存管理程序也都是采用类似的方式,如:glibc的malloc,memcached等,只是不同的内存管理机制预分配的大小不同,比如 glibc 对小于512的采用公差为8的等差数列;对于大于512的采用有序双向链表。
Palloc对应的函数为AllocSetAlloc:
-
static void *
-
AllocSetAlloc(MemoryContext context, Size size)
-
{
-
AllocSet set = (AllocSet) context;
-
AllocBlock block;
-
AllocChunk chunk;
-
int fidx;
-
Size chunk_size;
-
Size blksize;
-
-
AssertArg(AllocSetIsValid(set));
-
-
/*
-
* If requested size exceeds maximum for chunks, allocate an entire block
-
* for this request.
-
*/
-
if (size > set->allocChunkLimit)
-
{
-
chunk_size = MAXALIGN(size);
-
blksize = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
-
block = (AllocBlock) malloc(blksize);
-
if (block == NULL)
-
{
-
MemoryContextStats(TopMemoryContext);
-
ereport(ERROR,
-
(errcode(ERRCODE_OUT_OF_MEMORY),
-
errmsg("out of memory"),
-
errdetail("Failed on request of size %lu.",
-
(unsigned long) size)));
-
}
-
block->aset = set;
-
block->freeptr = block->endptr = ((char *) block) + blksize;
-
-
chunk = (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ);
-
chunk->aset = set;
-
chunk->size = chunk_size;
-
#ifdef MEMORY_CONTEXT_CHECKING
-
chunk->requested_size = size;
-
/* set mark to catch clobber of "unused" space */
-
if (size < chunk_size)
-
((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
-
#endif
-
#ifdef RANDOMIZE_ALLOCATED_MEMORY
-
/* fill the allocated space with junk */
-
randomize_mem((char *) AllocChunkGetPointer(chunk), size);
-
#endif
-
-
/*
-
* Stick the new block underneath the active allocation block, so that
-
* we don't lose the use of the space remaining therein.
-
*/
-
if (set->blocks != NULL)
-
{
-
block->next = set->blocks->next;
-
set->blocks->next = block;
-
}
-
else
-
{
-
block->next = NULL;
-
set->blocks = block;
-
}
-
-
set->isReset = false;
-
-
AllocAllocInfo(set, chunk);
-
return AllocChunkGetPointer(chunk);
-
}
-
if语句判断申请的内存是否大于8192,如果大于8192,直接使用malloc分配一段内存,大小为(Block头 + Chunk头 + 用户请求大小按8字节对齐),然后把该Block按前插法插入到AllocSet的blocks链表中的第二个位置。
-
/*
-
* Request is small enough to be treated as a chunk. Look in the
-
* corresponding free list to see if there is a free chunk we could reuse.
-
* If one is found, remove it from the free list, make it again a member
-
* of the alloc set and return its data address.
-
*/
-
fidx = AllocSetFreeIndex(size);
-
chunk = set->freelist[fidx];
-
到这里,说明申请的内存小于8192,获取对应的空闲链表,判断是否为空。
-
if (chunk != NULL)
-
{
-
Assert(chunk->size >= size);
-
-
set->freelist[fidx] = (AllocChunk) chunk->aset;
-
-
chunk->aset = (void *) set;
-
-
#ifdef MEMORY_CONTEXT_CHECKING
-
chunk->requested_size = size;
-
/* set mark to catch clobber of "unused" space */
-
if (size < chunk->size)
-
((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
-
#endif
-
#ifdef RANDOMIZE_ALLOCATED_MEMORY
-
/* fill the allocated space with junk */
-
randomize_mem((char *) AllocChunkGetPointer(chunk), size);
-
#endif
-
-
/* isReset must be false already */
-
Assert(!set->isReset);
-
-
AllocAllocInfo(set, chunk);
-
return AllocChunkGetPointer(chunk);
-
Chunk不为NULL说明存在大小合适的空闲块,将该空闲块链表下一个空闲块作为新的链表头,保存到Freelist中,将取下来的空闲块返回给用户。
-
}
-
-
/*
-
* Choose the actual chunk size to allocate.
-
*/
-
chunk_size = (1 << ALLOC_MINBITS) << fidx;
-
Assert(chunk_size >= size);
-
-
/*
-
* If there is enough room in the active allocation block, we will put the
-
* chunk into that block. Else must start a new one.
-
*/
-
执行到这里,说明没有可用的空闲块,那么调整用户申请内存大小为:以2为底取对数后向上取整,调整为2的整数次幂,方便后面向空闲链表中放。
-
if ((block = set->blocks) != NULL)
-
{
-
Size availspace = block->endptr - block->freeptr;
-
获取当前block中的可用空间
-
if (availspace < (chunk_size + ALLOC_CHUNKHDRSZ))
-
{
-
/*
-
* The existing active (top) block does not have enough room for
-
* the requested allocation, but it might still have a useful
-
* amount of space in it. Once we push it down in the block list,
-
* we'll never try to allocate more space from it. So, before we
-
* do that, carve up its free space into chunks that we can put on
-
* the set's freelists.
-
*
-
* Because we can only get here when there's less than
-
* ALLOC_CHUNK_LIMIT left in the block, this loop cannot iterate
-
* more than ALLOCSET_NUM_FREELISTS-1 times.
-
*/
-
while (availspace >= ((1 << ALLOC_MINBITS) + ALLOC_CHUNKHDRSZ))
-
{
-
Size availchunk = availspace - ALLOC_CHUNKHDRSZ;
-
int a_fidx = AllocSetFreeIndex(availchunk);
-
-
/*
-
* In most cases, we'll get back the index of the next larger
-
* freelist than the one we need to put this chunk on. The
-
* exception is when availchunk is exactly a power of 2.
-
*/
-
if (availchunk != ((Size) 1 << (a_fidx + ALLOC_MINBITS)))
-
{
-
a_fidx--;
-
Assert(a_fidx >= 0);
-
availchunk = ((Size) 1 << (a_fidx + ALLOC_MINBITS));
-
}
-
-
chunk = (AllocChunk) (block->freeptr);
-
-
block->freeptr += (availchunk + ALLOC_CHUNKHDRSZ);
-
availspace -= (availchunk + ALLOC_CHUNKHDRSZ);
-
-
chunk->size = availchunk;
-
#ifdef MEMORY_CONTEXT_CHECKING
-
chunk->requested_size = 0; /* mark it free */
-
#endif
-
chunk->aset = (void *) set->freelist[a_fidx];
-
set->freelist[a_fidx] = chunk;
-
}
-
上面 if 语句作用:如果block中可用空间无法满足用户的请求,那么在申请新的Block之前要把当前Block的剩余空间按从大到小拆分,并链接到相对应的Freelist中,以便其他分配小块内存时能从Freelist中找到;如果不拆分,那么这部分空间将被浪费掉。
-
/* Mark that we need to create a new block */
-
block = NULL;
-
}
-
}
-
-
/*
-
* Time to create a new regular (multi-chunk) block?
-
*/
-
if (block == NULL)
-
{
-
Size required_size;
-
-
/*
-
* The first such block has size initBlockSize, and we double the
-
* space in each succeeding block, but not more than maxBlockSize.
-
*/
-
blksize = set->nextBlockSize;
-
set->nextBlockSize <<= 1;
-
if (set->nextBlockSize > set->maxBlockSize)
-
set->nextBlockSize = set->maxBlockSize;
-
-
/*
-
* If initBlockSize is less than ALLOC_CHUNK_LIMIT, we could need more
-
* space... but try to keep it a power of 2.
-
*/
-
required_size = chunk_size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ;
-
while (blksize < required_size)
-
blksize <<= 1;
-
-
/* Try to allocate it */
-
block = (AllocBlock) malloc(blksize);
-
新申请一个内存块,大小为nextblock,如果该值小于用户申请的内存大小,则将该值 *2,直到大于用户请求的大小。
-
/*
-
* We could be asking for pretty big blocks here, so cope if malloc
-
* fails. But give up if there's less than a meg or so available...
-
*/
-
while (block == NULL && blksize > 1024 * 1024)
-
{
-
blksize >>= 1;
-
if (blksize < required_size)
-
break;
-
block = (AllocBlock) malloc(blksize);
-
}
-
如果申请的内存大于1M,且分配失败时,在满足用户请求大小的前提下缩小申请block大小,直到分配成功,但前提是一定要大于用户的请求大小。
-
-
if (block == NULL)
-
{
-
MemoryContextStats(TopMemoryContext);
-
ereport(ERROR,
-
(errcode(ERRCODE_OUT_OF_MEMORY),
-
errmsg("out of memory"),
-
errdetail("Failed on request of size %lu.",
-
(unsigned long) size)));
-
}
-
-
block->aset = set;
-
block->freeptr = ((char *) block) + ALLOC_BLOCKHDRSZ;
-
block->endptr = ((char *) block) + blksize;
-
-
/*
-
* If this is the first block of the set, make it the "keeper" block.
-
* Formerly, a keeper block could only be created during context
-
* creation, but allowing it to happen here lets us have fast reset
-
* cycling even for contexts created with minContextSize = 0; that way
-
* we don't have to force space to be allocated in contexts that might
-
* never need any space. Don't mark an oversize block as a keeper,
-
* however.
-
*/
-
if (set->keeper == NULL && blksize == set->initBlockSize)
-
set->keeper = block;
-
-
block->next = set->blocks;
-
set->blocks = block;
-
block分配成功,设置block的freeptr和endptr位置
-
}
-
-
/*
-
* OK, do the allocation
-
*/
-
chunk = (AllocChunk) (block->freeptr);
-
-
block->freeptr += (chunk_size + ALLOC_CHUNKHDRSZ);
-
Assert(block->freeptr <= block->endptr);
-
-
chunk->aset = (void *) set;
-
chunk->size = chunk_size;
-
#ifdef MEMORY_CONTEXT_CHECKING
-
chunk->requested_size = size;
-
/* set mark to catch clobber of "unused" space */
-
if (size < chunk->size)
-
((char *) AllocChunkGetPointer(chunk))[size] = 0x7E;
-
#endif
-
#ifdef RANDOMIZE_ALLOCATED_MEMORY
-
/* fill the allocated space with junk */
-
randomize_mem((char *) AllocChunkGetPointer(chunk), size);
-
#endif
-
-
set->isReset = false;
-
-
AllocAllocInfo(set, chunk);
-
return AllocChunkGetPointer(chunk);
-
设置chunk头,返回可用的指针地址给用户
-
}
7. 释放-pfree
pfree的过程很简单,只需要根据内存大小,判断是需要把内存通过free还给系统,还是缓存到freelist中。
1. 如果释放的Chunk大小大于缓存的最大大小,则找到该chunk所属Block,调用free归还给系统;
2. 否则,根据大小将该chunk放入对应的空闲链表头部。
-
static void
-
AllocSetFree(MemoryContext context, void *pointer)
-
{
-
AllocSet set = (AllocSet) context;
-
AllocChunk chunk = AllocPointerGetChunk(pointer);
-
-
AllocFreeInfo(set, chunk);
-
-
#ifdef MEMORY_CONTEXT_CHECKING
-
/* Test for someone scribbling on unused space in chunk */
-
if (chunk->requested_size < chunk->size)
-
if (((char *) pointer)[chunk->requested_size] != 0x7E)
-
elog(WARNING, "detected write past chunk end in %s %p",
-
set->header.name, chunk);
-
#endif
-
大于allocChunkLimit时,直接将内存还给操作系统
-
if (chunk->size > set->allocChunkLimit)
-
{
-
/*
-
* Big chunks are certain to have been allocated as single-chunk
-
* blocks. Find the containing block and return it to malloc().
-
*/
-
AllocBlock block = set->blocks;
-
AllocBlock prevblock = NULL;
-
-
while (block != NULL)
-
{
-
if (chunk == (AllocChunk) (((char *) block) + ALLOC_BLOCKHDRSZ))
-
break;
-
prevblock = block;
-
block = block->next;
-
}
-
查找链表,找到当前Chunk所属的Block
-
if (block == NULL)
-
elog(ERROR, "could not find block containing chunk %p", chunk);
-
/* let's just make sure chunk is the only one in the block */
-
Assert(block->freeptr == ((char *) block) +
-
(chunk->size + ALLOC_BLOCKHDRSZ + ALLOC_CHUNKHDRSZ));
-
-
/* OK, remove block from aset's list and free it */
-
if (prevblock == NULL)
-
set->blocks = block->next;
-
else
-
prevblock->next = block->next;
-
从Block链表中删除一个Block
-
#ifdef CLOBBER_FREED_MEMORY
-
/* Wipe freed memory for debugging purposes */
-
memset(block, 0x7F, block->freeptr - ((char *) block));
-
#endif
-
free(block);
-
释放该Block所占内存归还给系统
-
}
-
else
-
{
-
/* Normal case, put the chunk into appropriate freelist */
-
int fidx = AllocSetFreeIndex(chunk->size);
-
找到该Chunk所属空闲链表
-
chunk->aset = (void *) set->freelist[fidx];
-
该Chunk指向链表头,这里aset可以看做链表的next指针
-
#ifdef CLOBBER_FREED_MEMORY
-
/* Wipe freed memory for debugging purposes */
-
memset(pointer, 0x7F, chunk->size);
-
#endif
-
-
#ifdef MEMORY_CONTEXT_CHECKING
-
/* Reset requested_size to 0 in chunks that are on freelist */
-
chunk->requested_size = 0;
-
#endif
-
set->freelist[fidx] = chunk;
-
链表头指向新free的Chunk,相当于链表的前插
-
}
-
}
阅读(5656) | 评论(0) | 转发(0) |