一、前言
rte_mempool库是DPDK中的一个基本核心库,它是提高DPDK性能的方式之一,DPDK中基本所有的设备的应用都会应用到它。了解它,有助于性能问题定位,有助于跟深入理解DPDK。 rte_mempool的核心库位于工程的lib\mempool\目录下。
二、rte_mempool结构介绍
2.1 rte_mempool结构体
介绍rte_mempool之前,先了解以下rte_mempool结构体的定义,其定义位于rte_mempool.h中,结构体定义如下:
各域段含义:
-
name: 表示内存池的名字,一个进程中的内存池的名字不可相同,否则申请不会成功(申请memzone时检测)。内存池名字的唯一性,决定了可以通过内存池的名字,通过rte_mempool_lookup()对外接口在全局rte_mempool_list中找到该内存池的地址。
-
*pool_data或pool_id,这是一个枚举体。pool_data指向该mempool中用于存储rte_ring的首地址。
-
pool_config:应用传给ops函数的不透明数据。当前DPDK框架层未用到,cnxk和mlx5自定的有用到。
-
mz:内存池的内存memzone。
-
flags:分配内存池的flags,多生产者多消费者的模式,通过该flag指定,决定了rte_mempool_ops的类型。
-
socket_id:分配内存池所在的socket_id;
-
size: 内存池中mbuf的个数
-
cache_size:内存池中每个core的本地cache大小
-
elt_size:对象中一个元素的大小。等于rte_mbuf结构体大小+私有数据+mbuf_data_room_size.
-
header_size和trailer_size分别表示对象的头部和尾部大小
-
private_data_size:添加在rte_mempool结构体后面的用于存储私有数据的一段私有数据大小。对于网络设备的pktmbuf内存池,其大小就是struct rte_pktmbuf_pool_private结构体的大小。
-
ops_index: rte_mempool可以通过名字指定rte_mempool_ops,rte_mempool_ops中有分配和释放、入队和出队、获取有效的对象个数、内存池填充、内存池信息获取和计算存储指定数量对象的memory size。DPDK中有支持多个rte_mempool_ops,如,ops_mp_mc、ops_sp_sc、ops_mp_sc、ops_sp_mc、ops_mt_rts和ops_mt_hts。用户也可以自定义这些ops,然后通过将其注册到全局rte_mempool_ops_table变量中,该变量中定义了一个ops数组。ops注册到该全局变量后,该ops就占用了一个index。这里的ops_index就是DPDK中注册的rte_mempool_ops在全局变量定义的数组的下表。
-
local_cache指向rte_mempool的本地核的chache内存,具体细节下文还会提到。
-
populated_size:已填充的对象个数
-
elt_list:内存池中对象是通过该链表将其串起来的。
-
nb_mem_chunks:memory chunks的数量
-
mem_list:数据类型为struct rte_mempool_memhdr,其记录了一个chunk的iova、va和内存大小,通过tailq将mempool中所有的memory chunk串在一起。对象的内存就是memory chunks关联的。
2.2 mempool的结构
rte_memool库的基本概念,也可以从中也有一些介绍。mempool的是通过三部分实现的:
-
mempool对象节点:mempool对象节点,通过名称来唯一标识,其在创建时挂接在全局static struct rte_tailq_elem rte_mempool_tailq链表中。通过名字可以找到该对象节点,对象节点保存了rte_mempool的地址。
-
mempool的实体内存区域:rte_mempool中的mz保存了实际分配的连续内存空间的信息,mz->addr就是rte_mempool的地址,存储了所mempool对象实体。对象实体,有三部分构成:rte_mempool结构体,private data和local cache(每个核都有一个)构成。
-
ring无锁队列:无锁环形队列struct rte_ring,rte_ring的内存结构中包含了一个指针数组,其指向了mempool的所有对象。
rte_mempool中本地cache、rte_ring和对象的存取关系图如下:
rte_mempool中引入的local_cache对象缓冲区,并非硬件上的cache,DPDK应用的业务线程一般绑核的,因此是为了减少多核访问ring造成的临界区访问。local_cache上和rte_ring中一样,有一个指针数组,指向具体的对象。从coreX上的app会优先访问该local_cache上的对象。入队的时候优先入local_cache中,出队时优先出local_cache中。当cache是空时,则会从rte_ring中取对象;当cache被放满时,则会将多余的对象放入到rte_ring中。
三、rte_mempool创建
下面以pktmbuf pool的创建流程为例进行rte_mempool创建说明。
3.1 pktmbuf pool私有数据计算
-
// 每个mbuf的大小
-
elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size;
-
// 每个mbuf data_room_size
-
mbp_priv.mbuf_data_room_size = data_room_size;
-
mbp_priv.mbuf_priv_size = priv_size;
mbuf对象有三部分构成:rte_mbuf结构头,priv_size和data_room。
3.2 空mempool创建
创建空memepool接口为rte_mempool_create_empty()。该接口中做了如下事情:
-
通过rte_mempool_calc_obj_size计算mempool的object的大小。object的内存结构为:header + element_size + trailer。其中头就是struct rte_mempool_objhdr结构,记录了对象所属mp和对象的iova地址。
-
分配一个struct rte_tailq_entry并将其插入到全局的static struct rte_tailq_elem rte_mempool_tailq上。
-
mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);
-
struct rte_tailq_entry *te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
-
te->data = mp;
-
TAILQ_INSERT_TAIL(mempool_list, te, next);
3. 计算mempool的大小:rte_mempool结构体大小 + sizeof(struct rte_mempool_cache) * RTE_MAX_LCORE) + private_data_size
-
mempool_size = RTE_MEMPOOL_HEADER_SIZE(mp, cache_size);
-
mempool_size += private_data_size;
-
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);
4. 计算完mempool大小后,申请mempool的内存
-
mz = rte_memzone_reserve(mz_name, mempool_size, socket_id, mz_flags);
-
if (mz == NULL)
-
goto exit_unlock;
-
-
/* init the mempool structure */
-
mp = mz->addr;
-
memset(mp, 0, RTE_MEMPOOL_HEADER_SIZE(mp, cache_size));
-
ret = strlcpy(mp->name, name, sizeof(mp->name));
-
if (ret < 0 || ret >= (int)sizeof(mp->name)) {
-
rte_errno = ENAMETOOLONG;
-
goto exit_unlock;
-
}
-
mp->mz = mz;
-
mp->size = n;
-
mp->flags = flags;
-
mp->socket_id = socket_id;
-
mp->elt_size = objsz.elt_size;
-
mp->header_size = objsz.header_size;
-
mp->trailer_size = objsz.trailer_size;
-
/* Size of default caches, zero means disabled. */
-
mp->cache_size = cache_size;
-
mp->private_data_size = private_data_size;
-
STAILQ_INIT(&mp->elt_list);
-
STAILQ_INIT(&mp->mem_list);
-
-
/*
-
* local_cache pointer is set even if cache_size is zero.
-
* The local_cache points to just past the elt_pa[] array.
-
*/
-
mp->local_cache = (struct rte_mempool_cache *)
-
RTE_PTR_ADD(mp, RTE_MEMPOOL_HEADER_SIZE(mp, 0));
-
-
/* Init all default caches. */
-
if (cache_size != 0) {
-
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
-
mempool_cache_init(&mp->local_cache[lcore_id],
-
cache_size);
-
}
5. 初始化mempool结构体即,初始化mempool中的每个local_cache数据,如35-38行。
3.3 设置mempool ops
调用rte_mempool_set_ops_byname()通过名字设置mempool ops。
-
int rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
-
void *pool_config)
-
{
-
struct rte_mempool_ops *ops = NULL;
-
unsigned i;
-
-
/* too late, the mempool is already populated. */
-
if (mp->flags & RTE_MEMPOOL_F_POOL_CREATED)
-
return -EEXIST;
-
-
for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
-
if (!strcmp(name,
-
rte_mempool_ops_table.ops[i].name)) {
-
ops = &rte_mempool_ops_table.ops[i];
-
break;
-
}
-
}
-
-
if (ops == NULL)
-
return -EINVAL;
-
-
mp->ops_index = i;
-
mp->pool_config = pool_config;
-
rte_mempool_trace_set_ops_byname(mp, name, pool_config);
-
return 0;
-
}
3.4 pool私有数据初始化
调用rte_pktmbuf_pool_init()初始化pool中的私有数据结构。
-
void
-
rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
-
{
-
struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
-
struct rte_pktmbuf_pool_private default_mbp_priv;
-
uint16_t roomsz;
-
-
RTE_ASSERT(mp->private_data_size >=
-
sizeof(struct rte_pktmbuf_pool_private));
-
RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf));
-
-
/* if no structure is provided, assume no mbuf private area */
-
user_mbp_priv = opaque_arg;
-
if (user_mbp_priv == NULL) {
-
memset(&default_mbp_priv, 0, sizeof(default_mbp_priv));
-
if (mp->elt_size > sizeof(struct rte_mbuf))
-
roomsz = mp->elt_size - sizeof(struct rte_mbuf);
-
else
-
roomsz = 0;
-
default_mbp_priv.mbuf_data_room_size = roomsz;
-
user_mbp_priv = &default_mbp_priv;
-
}
-
-
RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf) +
-
((user_mbp_priv->flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
-
sizeof(struct rte_mbuf_ext_shared_info) :
-
user_mbp_priv->mbuf_data_room_size) +
-
user_mbp_priv->mbuf_priv_size);
-
RTE_ASSERT((user_mbp_priv->flags &
-
~RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0);
-
-
mbp_priv = rte_mempool_get_priv(mp);
-
memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
-
}
3.5 填充mempool
填充mempool的实现如下:
-
int
-
rte_mempool_populate_default(struct rte_mempool *mp)
-
{
-
unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
-
char mz_name[RTE_MEMZONE_NAMESIZE];
-
const struct rte_memzone *mz;
-
ssize_t mem_size;
-
size_t align, pg_sz, pg_shift = 0;
-
rte_iova_t iova;
-
unsigned mz_id, n;
-
int ret;
-
bool need_iova_contig_obj;
-
size_t max_alloc_size = SIZE_MAX;
-
-
ret = mempool_ops_alloc_once(mp);
-
if (ret != 0)
-
return ret;
-
-
/* mempool must not be populated */
-
if (mp->nb_mem_chunks != 0)
-
return -EEXIST;
-
-
/*
-
* the following section calculates page shift and page size values.
-
*
-
* these values impact the result of calc_mem_size operation, which
-
* returns the amount of memory that should be allocated to store the
-
* desired number of objects. when not zero, it allocates more memory
-
* for the padding between objects, to ensure that an object does not
-
* cross a page boundary. in other words, page size/shift are to be set
-
* to zero if mempool elements won't care about page boundaries.
-
* there are several considerations for page size and page shift here.
-
*
-
* if we don't need our mempools to have physically contiguous objects,
-
* then just set page shift and page size to 0, because the user has
-
* indicated that there's no need to care about anything.
-
*
-
* if we do need contiguous objects (if a mempool driver has its
-
* own calc_size() method returning min_chunk_size = mem_size),
-
* there is also an option to reserve the entire mempool memory
-
* as one contiguous block of memory.
-
*
-
* if we require contiguous objects, but not necessarily the entire
-
* mempool reserved space to be contiguous, pg_sz will be != 0,
-
* and the default ops->populate() will take care of not placing
-
* objects across pages.
-
*
-
* if our IO addresses are physical, we may get memory from bigger
-
* pages, or we might get memory from smaller pages, and how much of it
-
* we require depends on whether we want bigger or smaller pages.
-
* However, requesting each and every memory size is too much work, so
-
* what we'll do instead is walk through the page sizes available, pick
-
* the smallest one and set up page shift to match that one. We will be
-
* wasting some space this way, but it's much nicer than looping around
-
* trying to reserve each and every page size.
-
*
-
* If we fail to get enough contiguous memory, then we'll go and
-
* reserve space in smaller chunks.
-
*/
-
-
need_iova_contig_obj = !(mp->flags & RTE_MEMPOOL_F_NO_IOVA_CONTIG);
-
ret = rte_mempool_get_page_size(mp, &pg_sz);
-
if (ret < 0)
-
return ret;
-
-
if (pg_sz != 0)
-
pg_shift = rte_bsf32(pg_sz);
-
-
for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-
size_t min_chunk_size;
-
-
mem_size = rte_mempool_ops_calc_mem_size(
-
mp, n, pg_shift, &min_chunk_size, &align);
-
-
if (mem_size < 0) {
-
ret = mem_size;
-
goto fail;
-
}
-
-
ret = snprintf(mz_name, sizeof(mz_name),
-
RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
-
if (ret < 0 || ret >= (int)sizeof(mz_name)) {
-
ret = -ENAMETOOLONG;
-
goto fail;
-
}
-
-
/* if we're trying to reserve contiguous memory, add appropriate
-
* memzone flag.
-
*/
-
if (min_chunk_size == (size_t)mem_size)
-
mz_flags |= RTE_MEMZONE_IOVA_CONTIG;
-
-
/* Allocate a memzone, retrying with a smaller area on ENOMEM */
-
do {
-
mz = rte_memzone_reserve_aligned(mz_name,
-
RTE_MIN((size_t)mem_size, max_alloc_size),
-
mp->socket_id, mz_flags, align);
-
-
if (mz != NULL || rte_errno != ENOMEM)
-
break;
-
-
max_alloc_size = RTE_MIN(max_alloc_size,
-
(size_t)mem_size) / 2;
-
} while (mz == NULL && max_alloc_size >= min_chunk_size);
-
-
if (mz == NULL) {
-
ret = -rte_errno;
-
goto fail;
-
}
-
-
if (need_iova_contig_obj)
-
iova = mz->iova;
-
else
-
iova = RTE_BAD_IOVA;
-
-
if (pg_sz == 0 || (mz_flags & RTE_MEMZONE_IOVA_CONTIG))
-
ret = rte_mempool_populate_iova(mp, mz->addr,
-
iova, mz->len,
-
rte_mempool_memchunk_mz_free,
-
(void *)(uintptr_t)mz);
-
else
-
ret = rte_mempool_populate_virt(mp, mz->addr,
-
mz->len, pg_sz,
-
rte_mempool_memchunk_mz_free,
-
(void *)(uintptr_t)mz);
-
if (ret == 0) /* should not happen */
-
ret = -ENOBUFS;
-
if (ret < 0) {
-
rte_memzone_free(mz);
-
goto fail;
-
}
-
}
-
-
rte_mempool_trace_populate_default(mp);
-
return mp->size;
-
-
fail:
-
rte_mempool_free_memchunks(mp);
-
return ret;
-
}
在上面填充实现接口中,通过rte_mempool_ops创建内存池中的rte_ring,并将其地址赋给mp->pool_data,实现流程间如下代码:
-
static int
-
mempool_ops_alloc_once(struct rte_mempool *mp)
-
{
-
int ret;
-
-
/* create the internal ring if not already done */
-
if ((mp->flags & RTE_MEMPOOL_F_POOL_CREATED) == 0) {
-
ret = rte_mempool_ops_alloc(mp);
-
if (ret != 0)
-
return ret;
-
mp->flags |= RTE_MEMPOOL_F_POOL_CREATED;
-
}
-
return 0;
-
}
-
-
int
-
rte_mempool_ops_alloc(struct rte_mempool *mp)
-
{
-
struct rte_mempool_ops *ops;
-
-
rte_mempool_trace_ops_alloc(mp);
-
ops = rte_mempool_get_ops(mp->ops_index);
-
return ops->alloc(mp);
-
}
-
-
static int
-
ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
-
{
-
int ret;
-
char rg_name[RTE_RING_NAMESIZE];
-
struct rte_ring *r;
-
-
ret = snprintf(rg_name, sizeof(rg_name),
-
RTE_MEMPOOL_MZ_FORMAT, mp->name);
-
if (ret < 0 || ret >= (int)sizeof(rg_name)) {
-
rte_errno = ENAMETOOLONG;
-
return -rte_errno;
-
}
-
-
/*
-
* Allocate the ring that will be used to store objects.
-
* Ring functions will return appropriate errors if we are
-
* running as a secondary process etc., so no checks made
-
* in this function for that condition.
-
*/
-
r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
-
mp->socket_id, rg_flags);
-
if (r == NULL)
-
return -rte_errno;
-
-
mp->pool_data = r;
-
-
return 0;
-
}
再顺便补充一下:内存池的rte_ring{BANNED}{BANNED}最佳佳后是通过rte_ring_create_elem()接口创建的。该接口创建时,从rte_memzone里申请rte_ring的内存(结构为:rte_ring结构体+void*ptr[mp->size]),并将rte_ring的地址和对应的memzone地址保存在struct rte_tailq_entry中,将其插入到全局的rte_ring_tailq上。具体请查看
rte_ring_create_elem()的实现。
-
得到page_size和page_shift,存放所有的mbuf。计算当前可用的chunk大小,申请chunk内存。每个chunk memory的信息以struct rte_mempool_memhdr形式保存下来,插入到mp->mem_list中,chunk memory的数量保存在mp->nb_mem_chunks。在chunk虚拟内存中,依次划分对象实体,通过rte_mempoo_ops填充接口rte_mempool_ops_populate()调用mempool_add_elem()将一个个实体对象插入到mp->elt_list链表上,关键函数如下。
-
i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
-
(char *)vaddr + off,
-
(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
-
len - off, mempool_add_elem, NULL);
返回值i表示该chunk memory中填充的对象个数。mempool_add_elem实现如下:
-
static void
-
mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
-
void *obj, rte_iova_t iova)
-
{
-
struct rte_mempool_objhdr *hdr;
-
struct rte_mempool_objtlr *tlr __rte_unused;
-
-
/* set mempool ptr in header */
-
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
-
hdr->mp = mp;
-
hdr->iova = iova;
-
STAILQ_INSERT_TAIL(&mp->elt_list, hdr, next);
-
mp->populated_size++;
-
-
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-
hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
-
tlr = rte_mempool_get_trailer(obj);
-
tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
-
#endif
-
}
3.6 初始化pkt mbuf
调用rte_mempool_obj_iter()遍历rte_mempool中的所有对象,调用rte_pktmbuf_init()初始化每个对象,
遍历所有对象的接口:
-
uint32_t
-
rte_mempool_obj_iter(struct rte_mempool *mp,
-
rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
-
{
-
struct rte_mempool_objhdr *hdr;
-
void *obj;
-
unsigned n = 0;
-
-
STAILQ_FOREACH(hdr, &mp->elt_list, next) {
-
obj = (char *)hdr + sizeof(*hdr);
-
obj_cb(mp, obj_cb_arg, obj, n);
-
n++;
-
}
-
-
return n;
-
}
初始化每个对象的接口:
-
void
-
rte_pktmbuf_init(struct rte_mempool *mp,
-
__rte_unused void *opaque_arg,
-
void *_m,
-
__rte_unused unsigned i)
-
{
-
struct rte_mbuf *m = _m;
-
uint32_t mbuf_size, buf_len, priv_size;
-
-
RTE_ASSERT(mp->private_data_size >=
-
sizeof(struct rte_pktmbuf_pool_private));
-
-
priv_size = rte_pktmbuf_priv_size(mp);
-
mbuf_size = sizeof(struct rte_mbuf) + priv_size;
-
buf_len = rte_pktmbuf_data_room_size(mp);
-
-
RTE_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
-
RTE_ASSERT(mp->elt_size >= mbuf_size);
-
RTE_ASSERT(buf_len <= UINT16_MAX);
-
-
memset(m, 0, mbuf_size);
-
/* start of buffer is after mbuf structure and priv data */
-
m->priv_size = priv_size;
-
m->buf_addr = (char *)m + mbuf_size;
-
m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
-
m->buf_len = (uint16_t)buf_len;
-
-
/* keep some headroom between start of buffer and data */
-
m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);
-
-
/* init some constant fields */
-
m->pool = mp;
-
m->nb_segs = 1;
-
m->port = RTE_MBUF_PORT_INVALID;
-
rte_mbuf_refcnt_set(m, 1);
-
m->next = NULL;
-
}
至此,一个rte_mempool的池子就建立完毕。
四、rte_mempool使用
pktmbuf pool中的mbuf是供网口收包和应用发包使用的。
从内存池中申请一个原始的mbuf:
-
static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
申请接口内部会调用rte_mempool_get_bulk()从mp中批量获取n个mbuf(此处n为1,该接口支持批量申请,接口如下)。从本地core的cache中获取,不够则先从rte_ring中获取mbuf保存在本地cache中。
-
static __rte_always_inline int
-
rte_mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned int n)
-
{
-
struct rte_mempool_cache *cache;
-
cache = rte_mempool_default_cache(mp, rte_lcore_id());
-
rte_mempool_trace_get_bulk(mp, obj_table, n, cache);
-
return rte_mempool_generic_get(mp, obj_table, n, cache);
-
}
-
-
static __rte_always_inline int
-
rte_mempool_generic_get(struct rte_mempool *mp, void **obj_table,
-
unsigned int n, struct rte_mempool_cache *cache)
-
{
-
int ret;
-
ret = rte_mempool_do_generic_get(mp, obj_table, n, cache);
-
if (ret == 0)
-
RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 1);
-
rte_mempool_trace_generic_get(mp, obj_table, n, cache);
-
return ret;
-
}
-
-
static __rte_always_inline int
-
rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
-
unsigned int n, struct rte_mempool_cache *cache)
-
{
-
int ret;
-
uint32_t index, len;
-
void **cache_objs;
-
-
/* No cache provided or cannot be satisfied from cache */
-
if (unlikely(cache == NULL || n >= cache->size))
-
goto ring_dequeue;
-
-
cache_objs = cache->objs;
-
-
/* Can this be satisfied from the cache? */
-
if (cache->len < n) {
-
/* No. Backfill the cache first, and then fill from it */
-
uint32_t req = n + (cache->size - cache->len);
-
-
/* How many do we require i.e. number to fill the cache + the request */
-
ret = rte_mempool_ops_dequeue_bulk(mp,
-
&cache->objs[cache->len], req);
-
if (unlikely(ret < 0)) {
-
/*
-
* In the off chance that we are buffer constrained,
-
* where we are not able to allocate cache + n, go to
-
* the ring directly. If that fails, we are truly out of
-
* buffers.
-
*/
-
goto ring_dequeue;
-
}
-
-
cache->len += req;
-
}
-
-
/* Now fill in the response ... */
-
for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++)
-
*obj_table = cache_objs[len];
-
-
cache->len -= n;
-
-
RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-
RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
-
-
return 0;
-
-
ring_dequeue:
-
-
/* get remaining objects from ring */
-
ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
-
-
if (ret < 0) {
-
RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
-
RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
-
} else {
-
RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-
RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
-
}
-
-
return ret;
-
}
将一个mbuf放回到内存池:
-
void rte_mbuf_raw_free(struct rte_mbuf *m)
释放接口内部调用rte_mempool_put_bulk()将n个mbuf(
此处n为1,该接口支持批量申请,接口如下)释放到内存池。先释放到本地core的cache,本地cache满且仍有多余则释放到rte_ring中。
-
static __rte_always_inline void
-
rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
-
unsigned int n)
-
{
-
struct rte_mempool_cache *cache;
-
cache = rte_mempool_default_cache(mp, rte_lcore_id());
-
rte_mempool_trace_put_bulk(mp, obj_table, n, cache);
-
rte_mempool_generic_put(mp, obj_table, n, cache);
-
}
-
-
static __rte_always_inline void
-
rte_mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
-
unsigned int n, struct rte_mempool_cache *cache)
-
{
-
rte_mempool_trace_generic_put(mp, obj_table, n, cache);
-
RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 0);
-
rte_mempool_do_generic_put(mp, obj_table, n, cache);
-
}
-
-
static __rte_always_inline void
-
rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
-
unsigned int n, struct rte_mempool_cache *cache)
-
{
-
void **cache_objs;
-
-
/* increment stat now, adding in mempool always success */
-
RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-
RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-
/* No cache provided or if put would overflow mem allocated for cache */
-
if (unlikely(cache == NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE))
-
goto ring_enqueue;
-
-
cache_objs = &cache->objs[cache->len];
-
-
/*
-
* The cache follows the following algorithm
-
* 1. Add the objects to the cache
-
* 2. Anything greater than the cache min value (if it crosses the
-
* cache flush threshold) is flushed to the ring.
-
*/
-
-
/* Add elements back into the cache */
-
rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
-
-
cache->len += n;
-
-
if (cache->len >= cache->flushthresh) {
-
rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
-
cache->len - cache->size);
-
cache->len = cache->size;
-
}
-
-
return;
-
-
ring_enqueue:
-
-
/* push remaining objects in ring */
-
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-
if (rte_mempool_ops_enqueue_bulk(mp, obj_table, n) < 0)
-
rte_panic("cannot put objects in mempool\n");
-
#else
-
rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
-
#endif
-
}
五、rte_mempool信息查询
rte_mempool的状态信息查询接口rte_mempool_dump(FILE *f, struct rte_mempool *mp),支持dump如下信息:
-
void
-
rte_mempool_dump(FILE *f, struct rte_mempool *mp)
-
{
-
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-
struct rte_mempool_info info;
-
struct rte_mempool_debug_stats sum;
-
unsigned lcore_id;
-
#endif
-
struct rte_mempool_memhdr *memhdr;
-
struct rte_mempool_ops *ops;
-
unsigned common_count;
-
unsigned cache_count;
-
size_t mem_len = 0;
-
-
RTE_ASSERT(f != NULL);
-
RTE_ASSERT(mp != NULL);
-
-
fprintf(f, "mempool <%s>@%p\n", mp->name, mp);
-
fprintf(f, " flags=%x\n", mp->flags);
-
fprintf(f, " socket_id=%d\n", mp->socket_id);
-
fprintf(f, " pool=%p\n", mp->pool_data);
-
fprintf(f, " iova=0x%" PRIx64 "\n", mp->mz->iova);
-
fprintf(f, " nb_mem_chunks=%u\n", mp->nb_mem_chunks);
-
fprintf(f, " size=%"PRIu32"\n", mp->size);
-
fprintf(f, " populated_size=%"PRIu32"\n", mp->populated_size);
-
fprintf(f, " header_size=%"PRIu32"\n", mp->header_size);
-
fprintf(f, " elt_size=%"PRIu32"\n", mp->elt_size);
-
fprintf(f, " trailer_size=%"PRIu32"\n", mp->trailer_size);
-
fprintf(f, " total_obj_size=%"PRIu32"\n",
-
mp->header_size + mp->elt_size + mp->trailer_size);
-
-
fprintf(f, " private_data_size=%"PRIu32"\n", mp->private_data_size);
-
-
fprintf(f, " ops_index=%d\n", mp->ops_index);
-
ops = rte_mempool_get_ops(mp->ops_index);
-
fprintf(f, " ops_name: <%s>\n", (ops != NULL) ? ops->name : "NA");
-
-
STAILQ_FOREACH(memhdr, &mp->mem_list, next)
-
mem_len += memhdr->len;
-
if (mem_len != 0) {
-
fprintf(f, " avg bytes/object=%#Lf\n",
-
(long double)mem_len / mp->size);
-
}
-
-
cache_count = rte_mempool_dump_cache(f, mp);
-
common_count = rte_mempool_ops_get_count(mp);
-
if ((cache_count + common_count) > mp->size)
-
common_count = mp->size - cache_count;
-
fprintf(f, " common_pool_count=%u\n", common_count);
-
-
/* sum and dump statistics */
-
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
-
rte_mempool_ops_get_info(mp, &info);
-
memset(&sum, 0, sizeof(sum));
-
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
-
sum.put_bulk += mp->stats[lcore_id].put_bulk;
-
sum.put_objs += mp->stats[lcore_id].put_objs;
-
sum.put_common_pool_bulk += mp->stats[lcore_id].put_common_pool_bulk;
-
sum.put_common_pool_objs += mp->stats[lcore_id].put_common_pool_objs;
-
sum.get_common_pool_bulk += mp->stats[lcore_id].get_common_pool_bulk;
-
sum.get_common_pool_objs += mp->stats[lcore_id].get_common_pool_objs;
-
sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
-
sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
-
sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
-
sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs;
-
sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
-
sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
-
}
-
fprintf(f, " stats:\n");
-
fprintf(f, " put_bulk=%"PRIu64"\n", sum.put_bulk);
-
fprintf(f, " put_objs=%"PRIu64"\n", sum.put_objs);
-
fprintf(f, " put_common_pool_bulk=%"PRIu64"\n", sum.put_common_pool_bulk);
-
fprintf(f, " put_common_pool_objs=%"PRIu64"\n", sum.put_common_pool_objs);
-
fprintf(f, " get_common_pool_bulk=%"PRIu64"\n", sum.get_common_pool_bulk);
-
fprintf(f, " get_common_pool_objs=%"PRIu64"\n", sum.get_common_pool_objs);
-
fprintf(f, " get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
-
fprintf(f, " get_success_objs=%"PRIu64"\n", sum.get_success_objs);
-
fprintf(f, " get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
-
fprintf(f, " get_fail_objs=%"PRIu64"\n", sum.get_fail_objs);
-
if (info.contig_block_size > 0) {
-
fprintf(f, " get_success_blks=%"PRIu64"\n",
-
sum.get_success_blks);
-
fprintf(f, " get_fail_blks=%"PRIu64"\n", sum.get_fail_blks);
-
}
-
#else
-
fprintf(f, " no statistics available\n");
-
#endif
-
-
rte_mempool_audit(mp);
-
}
rte_mempool中有一些统计信息,保存在mp->stats,值得关注,它是通过RTE_LIBRTE_MEMPOOL_DEBUG控制的,一般不会打开。
rte_mempool库中还有两个很有用的接口:rte_mempool_dump_cache(FILE *f, const struct rte_mempool *mp)得到指定内存池中每个本地core的cache中可用的对象个数。
和rte_mempool_ops_get_count(const struct rte_mempool *mp)得到rte_ring中可用的对象个数。
再根据mp->size和以上连个值,可以计算得到应用中的mbuf使用中的个数,有些产品业务中很关注些指标。
阅读(3702) | 评论(0) | 转发(0) |