RoadLee

首页　| 　博文目录　| 　关于我

博客访问： 54012
博文数量： 14
博客积分： 0
博客等级：民兵
技术积分： 165
用户组：普通用户
注册时间： 2022-11-22 23:41

个人简介

将分享技术博文作为一种快乐，提升自己帮助他人

文章分类

全部博文（14）

功耗管理（0）
操作系统（1）
系统调用（2）
进程间通信（6）
DPDK（5）
未分配的博文（0）

文章存档

2023年（9）

2022年（5）

我的朋友

一、前言

rte_mempool库是DPDK中的一个基本核心库，它是提高DPDK性能的方式之一，DPDK中基本所有的设备的应用都会应用到它。了解它，有助于性能问题定位，有助于跟深入理解DPDK。 rte_mempool的核心库位于工程的lib\mempool\目录下。

二、rte_mempool结构介绍

2.1 rte_mempool结构体

介绍rte_mempool之前，先了解以下rte_mempool结构体的定义，其定义位于rte_mempool.h中，结构体定义如下：

各域段含义：

name: 表示内存池的名字，一个进程中的内存池的名字不可相同，否则申请不会成功(申请memzone时检测)。内存池名字的唯一性，决定了可以通过内存池的名字，通过rte_mempool_lookup()对外接口在全局rte_mempool_list中找到该内存池的地址。
*pool_data或pool_id，这是一个枚举体。pool_data指向该mempool中用于存储rte_ring的首地址。
pool_config：应用传给ops函数的不透明数据。当前DPDK框架层未用到，cnxk和mlx5自定的有用到。
mz：内存池的内存memzone。
flags：分配内存池的flags，多生产者多消费者的模式，通过该flag指定，决定了rte_mempool_ops的类型。
socket_id：分配内存池所在的socket_id；
size: 内存池中mbuf的个数
cache_size：内存池中每个core的本地cache大小
elt_size：对象中一个元素的大小。等于rte_mbuf结构体大小+私有数据+mbuf_data_room_size.
header_size和trailer_size分别表示对象的头部和尾部大小
private_data_size：添加在rte_mempool结构体后面的用于存储私有数据的一段私有数据大小。对于网络设备的pktmbuf内存池，其大小就是struct rte_pktmbuf_pool_private结构体的大小。
ops_index: rte_mempool可以通过名字指定rte_mempool_ops，rte_mempool_ops中有分配和释放、入队和出队、获取有效的对象个数、内存池填充、内存池信息获取和计算存储指定数量对象的memory size。DPDK中有支持多个rte_mempool_ops，如，ops_mp_mc、ops_sp_sc、ops_mp_sc、ops_sp_mc、ops_mt_rts和ops_mt_hts。用户也可以自定义这些ops，然后通过将其注册到全局rte_mempool_ops_table变量中，该变量中定义了一个ops数组。ops注册到该全局变量后，该ops就占用了一个index。这里的ops_index就是DPDK中注册的rte_mempool_ops在全局变量定义的数组的下表。
local_cache指向rte_mempool的本地核的chache内存，具体细节下文还会提到。
populated_size：已填充的对象个数
elt_list：内存池中对象是通过该链表将其串起来的。
nb_mem_chunks：memory chunks的数量
mem_list：数据类型为struct rte_mempool_memhdr，其记录了一个chunk的iova、va和内存大小，通过tailq将mempool中所有的memory chunk串在一起。对象的内存就是memory chunks关联的。

2.2 mempool的结构

rte_memool库的基本概念，也可以从中也有一些介绍。mempool的是通过三部分实现的：

mempool对象节点：mempool对象节点，通过名称来唯一标识，其在创建时挂接在全局static struct rte_tailq_elem rte_mempool_tailq链表中。通过名字可以找到该对象节点，对象节点保存了rte_mempool的地址。
mempool的实体内存区域：rte_mempool中的mz保存了实际分配的连续内存空间的信息，mz->addr就是rte_mempool的地址，存储了所mempool对象实体。对象实体，有三部分构成：rte_mempool结构体，private data和local cache(每个核都有一个)构成。
ring无锁队列：无锁环形队列struct rte_ring，rte_ring的内存结构中包含了一个指针数组，其指向了mempool的所有对象。

rte_mempool中本地cache、rte_ring和对象的存取关系图如下：

rte_mempool中引入的local_cache对象缓冲区，并非硬件上的cache，DPDK应用的业务线程一般绑核的，因此是为了减少多核访问ring造成的临界区访问。local_cache上和rte_ring中一样，有一个指针数组，指向具体的对象。从coreX上的app会优先访问该local_cache上的对象。入队的时候优先入local_cache中，出队时优先出local_cache中。当cache是空时，则会从rte_ring中取对象；当cache被放满时，则会将多余的对象放入到rte_ring中。

三、rte_mempool创建

下面以pktmbuf pool的创建流程为例进行rte_mempool创建说明。

3.1 pktmbuf pool私有数据计算

点击(此处)折叠或打开

// 每个mbuf的大小
elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size + (unsigned)data_room_size;
// 每个mbuf data_room_size
mbp_priv.mbuf_data_room_size = data_room_size;
mbp_priv.mbuf_priv_size = priv_size;

mbuf对象有三部分构成：rte_mbuf结构头，priv_size和data_room。

3.2 空mempool创建

创建空memepool接口为rte_mempool_create_empty()。该接口中做了如下事情：

通过rte_mempool_calc_obj_size计算mempool的object的大小。object的内存结构为：header + element_size + trailer。其中头就是struct rte_mempool_objhdr结构，记录了对象所属mp和对象的iova地址。
分配一个struct rte_tailq_entry并将其插入到全局的static struct rte_tailq_elem rte_mempool_tailq上。

点击(此处)折叠或打开

mempool_list = RTE_TAILQ_CAST(rte_mempool_tailq.head, rte_mempool_list);
struct rte_tailq_entry *te = rte_zmalloc("MEMPOOL_TAILQ_ENTRY", sizeof(*te), 0);
te->data = mp;
TAILQ_INSERT_TAIL(mempool_list, te, next);

3. 计算mempool的大小：rte_mempool结构体大小 + sizeof(struct rte_mempool_cache) * RTE_MAX_LCORE) + private_data_size

点击(此处)折叠或打开

mempool_size = RTE_MEMPOOL_HEADER_SIZE(mp, cache_size);
mempool_size += private_data_size;
mempool_size = RTE_ALIGN_CEIL(mempool_size, RTE_MEMPOOL_ALIGN);

4. 计算完mempool大小后，申请mempool的内存

点击(此处)折叠或打开

mz = rte_memzone_reserve(mz_name, mempool_size, socket_id, mz_flags);
if (mz == NULL)
goto exit_unlock;
/* init the mempool structure */
mp = mz->addr;
memset(mp, 0, RTE_MEMPOOL_HEADER_SIZE(mp, cache_size));
ret = strlcpy(mp->name, name, sizeof(mp->name));
if (ret < 0 || ret >= (int)sizeof(mp->name)) {
rte_errno = ENAMETOOLONG;
goto exit_unlock;
}
mp->mz = mz;
mp->size = n;
mp->flags = flags;
mp->socket_id = socket_id;
mp->elt_size = objsz.elt_size;
mp->header_size = objsz.header_size;
mp->trailer_size = objsz.trailer_size;
/* Size of default caches, zero means disabled. */
mp->cache_size = cache_size;
mp->private_data_size = private_data_size;
STAILQ_INIT(&mp->elt_list);
STAILQ_INIT(&mp->mem_list);
/*
* local_cache pointer is set even if cache_size is zero.
* The local_cache points to just past the elt_pa[] array.
*/
mp->local_cache = (struct rte_mempool_cache *)
RTE_PTR_ADD(mp, RTE_MEMPOOL_HEADER_SIZE(mp, 0));
/* Init all default caches. */
if (cache_size != 0) {
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
mempool_cache_init(&mp->local_cache[lcore_id],
cache_size);
}

5. 初始化mempool结构体即，初始化mempool中的每个local_cache数据，如35-38行。

3.3 设置mempool ops

调用rte_mempool_set_ops_byname()通过名字设置mempool ops。

点击(此处)折叠或打开

int rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
void *pool_config)
{
struct rte_mempool_ops *ops = NULL;
unsigned i;
/* too late, the mempool is already populated. */
if (mp->flags & RTE_MEMPOOL_F_POOL_CREATED)
return -EEXIST;
for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
if (!strcmp(name,
rte_mempool_ops_table.ops[i].name)) {
ops = &rte_mempool_ops_table.ops[i];
break;
}
}
if (ops == NULL)
return -EINVAL;
mp->ops_index = i;
mp->pool_config = pool_config;
rte_mempool_trace_set_ops_byname(mp, name, pool_config);
return 0;
}

3.4 pool私有数据初始化

调用rte_pktmbuf_pool_init()初始化pool中的私有数据结构。

点击(此处)折叠或打开

void
rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
{
struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
struct rte_pktmbuf_pool_private default_mbp_priv;
uint16_t roomsz;
RTE_ASSERT(mp->private_data_size >=
sizeof(struct rte_pktmbuf_pool_private));
RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf));
/* if no structure is provided, assume no mbuf private area */
user_mbp_priv = opaque_arg;
if (user_mbp_priv == NULL) {
memset(&default_mbp_priv, 0, sizeof(default_mbp_priv));
if (mp->elt_size > sizeof(struct rte_mbuf))
roomsz = mp->elt_size - sizeof(struct rte_mbuf);
else
roomsz = 0;
default_mbp_priv.mbuf_data_room_size = roomsz;
user_mbp_priv = &default_mbp_priv;
}
RTE_ASSERT(mp->elt_size >= sizeof(struct rte_mbuf) +
((user_mbp_priv->flags & RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) ?
sizeof(struct rte_mbuf_ext_shared_info) :
user_mbp_priv->mbuf_data_room_size) +
user_mbp_priv->mbuf_priv_size);
RTE_ASSERT((user_mbp_priv->flags &
~RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF) == 0);
mbp_priv = rte_mempool_get_priv(mp);
memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
}

3.5 填充mempool

填充mempool的实现如下：

点击(此处)折叠或打开

int
rte_mempool_populate_default(struct rte_mempool *mp)
{
unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
char mz_name[RTE_MEMZONE_NAMESIZE];
const struct rte_memzone *mz;
ssize_t mem_size;
size_t align, pg_sz, pg_shift = 0;
rte_iova_t iova;
unsigned mz_id, n;
int ret;
bool need_iova_contig_obj;
size_t max_alloc_size = SIZE_MAX;
ret = mempool_ops_alloc_once(mp);
if (ret != 0)
return ret;
/* mempool must not be populated */
if (mp->nb_mem_chunks != 0)
return -EEXIST;
/*
* the following section calculates page shift and page size values.
*
* these values impact the result of calc_mem_size operation, which
* returns the amount of memory that should be allocated to store the
* desired number of objects. when not zero, it allocates more memory
* for the padding between objects, to ensure that an object does not
* cross a page boundary. in other words, page size/shift are to be set
* to zero if mempool elements won't care about page boundaries.
* there are several considerations for page size and page shift here.
*
* if we don't need our mempools to have physically contiguous objects,
* then just set page shift and page size to 0, because the user has
* indicated that there's no need to care about anything.
*
* if we do need contiguous objects (if a mempool driver has its
* own calc_size() method returning min_chunk_size = mem_size),
* there is also an option to reserve the entire mempool memory
* as one contiguous block of memory.
*
* if we require contiguous objects, but not necessarily the entire
* mempool reserved space to be contiguous, pg_sz will be != 0,
* and the default ops->populate() will take care of not placing
* objects across pages.
*
* if our IO addresses are physical, we may get memory from bigger
* pages, or we might get memory from smaller pages, and how much of it
* we require depends on whether we want bigger or smaller pages.
* However, requesting each and every memory size is too much work, so
* what we'll do instead is walk through the page sizes available, pick
* the smallest one and set up page shift to match that one. We will be
* wasting some space this way, but it's much nicer than looping around
* trying to reserve each and every page size.
*
* If we fail to get enough contiguous memory, then we'll go and
* reserve space in smaller chunks.
*/
need_iova_contig_obj = !(mp->flags & RTE_MEMPOOL_F_NO_IOVA_CONTIG);
ret = rte_mempool_get_page_size(mp, &pg_sz);
if (ret < 0)
return ret;
if (pg_sz != 0)
pg_shift = rte_bsf32(pg_sz);
for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
size_t min_chunk_size;
mem_size = rte_mempool_ops_calc_mem_size(
mp, n, pg_shift, &min_chunk_size, &align);
if (mem_size < 0) {
ret = mem_size;
goto fail;
}
ret = snprintf(mz_name, sizeof(mz_name),
RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
if (ret < 0 || ret >= (int)sizeof(mz_name)) {
ret = -ENAMETOOLONG;
goto fail;
}
/* if we're trying to reserve contiguous memory, add appropriate
* memzone flag.
*/
if (min_chunk_size == (size_t)mem_size)
mz_flags |= RTE_MEMZONE_IOVA_CONTIG;
/* Allocate a memzone, retrying with a smaller area on ENOMEM */
do {
mz = rte_memzone_reserve_aligned(mz_name,
RTE_MIN((size_t)mem_size, max_alloc_size),
mp->socket_id, mz_flags, align);
if (mz != NULL || rte_errno != ENOMEM)
break;
max_alloc_size = RTE_MIN(max_alloc_size,
(size_t)mem_size) / 2;
} while (mz == NULL && max_alloc_size >= min_chunk_size);
if (mz == NULL) {
ret = -rte_errno;
goto fail;
}
if (need_iova_contig_obj)
iova = mz->iova;
else
iova = RTE_BAD_IOVA;
if (pg_sz == 0 || (mz_flags & RTE_MEMZONE_IOVA_CONTIG))
ret = rte_mempool_populate_iova(mp, mz->addr,
iova, mz->len,
rte_mempool_memchunk_mz_free,
(void *)(uintptr_t)mz);
else
ret = rte_mempool_populate_virt(mp, mz->addr,
mz->len, pg_sz,
rte_mempool_memchunk_mz_free,
(void *)(uintptr_t)mz);
if (ret == 0) /* should not happen */
ret = -ENOBUFS;
if (ret < 0) {
rte_memzone_free(mz);
goto fail;
}
}
rte_mempool_trace_populate_default(mp);
return mp->size;
fail:
rte_mempool_free_memchunks(mp);
return ret;
}

创建rte_ring

在上面填充实现接口中，通过rte_mempool_ops创建内存池中的rte_ring，并将其地址赋给mp->pool_data，实现流程间如下代码：

点击(此处)折叠或打开

static int
mempool_ops_alloc_once(struct rte_mempool *mp)
{
int ret;
/* create the internal ring if not already done */
if ((mp->flags & RTE_MEMPOOL_F_POOL_CREATED) == 0) {
ret = rte_mempool_ops_alloc(mp);
if (ret != 0)
return ret;
mp->flags |= RTE_MEMPOOL_F_POOL_CREATED;
}
return 0;
}
int
rte_mempool_ops_alloc(struct rte_mempool *mp)
{
struct rte_mempool_ops *ops;
rte_mempool_trace_ops_alloc(mp);
ops = rte_mempool_get_ops(mp->ops_index);
return ops->alloc(mp);
}
static int
ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
{
int ret;
char rg_name[RTE_RING_NAMESIZE];
struct rte_ring *r;
ret = snprintf(rg_name, sizeof(rg_name),
RTE_MEMPOOL_MZ_FORMAT, mp->name);
if (ret < 0 || ret >= (int)sizeof(rg_name)) {
rte_errno = ENAMETOOLONG;
return -rte_errno;
}
/*
* Allocate the ring that will be used to store objects.
* Ring functions will return appropriate errors if we are
* running as a secondary process etc., so no checks made
* in this function for that condition.
*/
r = rte_ring_create(rg_name, rte_align32pow2(mp->size + 1),
mp->socket_id, rg_flags);
if (r == NULL)
return -rte_errno;
mp->pool_data = r;
return 0;
}

再顺便补充一下：内存池的rte_ring{BANNED}{BANNED}最佳佳后是通过rte_ring_create_elem()接口创建的。该接口创建时，从rte_memzone里申请rte_ring的内存(结构为：rte_ring结构体+void*ptr[mp->size])，并将rte_ring的地址和对应的memzone地址保存在struct rte_tailq_entry中，将其插入到全局的rte_ring_tailq上。具体请查看rte_ring_create_elem()的实现。

得到page_size和page_shift，存放所有的mbuf。计算当前可用的chunk大小，申请chunk内存。每个chunk memory的信息以struct rte_mempool_memhdr形式保存下来，插入到mp->mem_list中，chunk memory的数量保存在mp->nb_mem_chunks。在chunk虚拟内存中，依次划分对象实体，通过rte_mempoo_ops填充接口rte_mempool_ops_populate()调用mempool_add_elem()将一个个实体对象插入到mp->elt_list链表上，关键函数如下。

点击(此处)折叠或打开

i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
(char *)vaddr + off,
(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
len - off, mempool_add_elem, NULL);

返回值i表示该chunk memory中填充的对象个数。mempool_add_elem实现如下：

点击(此处)折叠或打开

static void
mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
void *obj, rte_iova_t iova)
{
struct rte_mempool_objhdr *hdr;
struct rte_mempool_objtlr *tlr __rte_unused;
/* set mempool ptr in header */
hdr = RTE_PTR_SUB(obj, sizeof(*hdr));
hdr->mp = mp;
hdr->iova = iova;
STAILQ_INSERT_TAIL(&mp->elt_list, hdr, next);
mp->populated_size++;
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
hdr->cookie = RTE_MEMPOOL_HEADER_COOKIE2;
tlr = rte_mempool_get_trailer(obj);
tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
#endif
}

3.6 初始化pkt mbuf

调用rte_mempool_obj_iter()遍历rte_mempool中的所有对象，调用rte_pktmbuf_init()初始化每个对象，
遍历所有对象的接口：

点击(此处)折叠或打开

uint32_t
rte_mempool_obj_iter(struct rte_mempool *mp,
rte_mempool_obj_cb_t *obj_cb, void *obj_cb_arg)
{
struct rte_mempool_objhdr *hdr;
void *obj;
unsigned n = 0;
STAILQ_FOREACH(hdr, &mp->elt_list, next) {
obj = (char *)hdr + sizeof(*hdr);
obj_cb(mp, obj_cb_arg, obj, n);
n++;
}
return n;
}

初始化每个对象的接口：

点击(此处)折叠或打开

void
rte_pktmbuf_init(struct rte_mempool *mp,
__rte_unused void *opaque_arg,
void *_m,
__rte_unused unsigned i)
{
struct rte_mbuf *m = _m;
uint32_t mbuf_size, buf_len, priv_size;
RTE_ASSERT(mp->private_data_size >=
sizeof(struct rte_pktmbuf_pool_private));
priv_size = rte_pktmbuf_priv_size(mp);
mbuf_size = sizeof(struct rte_mbuf) + priv_size;
buf_len = rte_pktmbuf_data_room_size(mp);
RTE_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
RTE_ASSERT(mp->elt_size >= mbuf_size);
RTE_ASSERT(buf_len <= UINT16_MAX);
memset(m, 0, mbuf_size);
/* start of buffer is after mbuf structure and priv data */
m->priv_size = priv_size;
m->buf_addr = (char *)m + mbuf_size;
m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size;
m->buf_len = (uint16_t)buf_len;
/* keep some headroom between start of buffer and data */
m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);
/* init some constant fields */
m->pool = mp;
m->nb_segs = 1;
m->port = RTE_MBUF_PORT_INVALID;
rte_mbuf_refcnt_set(m, 1);
m->next = NULL;
}

至此，一个rte_mempool的池子就建立完毕。

四、rte_mempool使用

pktmbuf pool中的mbuf是供网口收包和应用发包使用的。
从内存池中申请一个原始的mbuf:

点击(此处)折叠或打开

static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)

申请接口内部会调用rte_mempool_get_bulk()从mp中批量获取n个mbuf(此处n为1，该接口支持批量申请，接口如下)。从本地core的cache中获取，不够则先从rte_ring中获取mbuf保存在本地cache中。

点击(此处)折叠或打开

static __rte_always_inline int
rte_mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned int n)
{
struct rte_mempool_cache *cache;
cache = rte_mempool_default_cache(mp, rte_lcore_id());
rte_mempool_trace_get_bulk(mp, obj_table, n, cache);
return rte_mempool_generic_get(mp, obj_table, n, cache);
}
static __rte_always_inline int
rte_mempool_generic_get(struct rte_mempool *mp, void **obj_table,
unsigned int n, struct rte_mempool_cache *cache)
{
int ret;
ret = rte_mempool_do_generic_get(mp, obj_table, n, cache);
if (ret == 0)
RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 1);
rte_mempool_trace_generic_get(mp, obj_table, n, cache);
return ret;
}
static __rte_always_inline int
rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
unsigned int n, struct rte_mempool_cache *cache)
{
int ret;
uint32_t index, len;
void **cache_objs;
/* No cache provided or cannot be satisfied from cache */
if (unlikely(cache == NULL || n >= cache->size))
goto ring_dequeue;
cache_objs = cache->objs;
/* Can this be satisfied from the cache? */
if (cache->len < n) {
/* No. Backfill the cache first, and then fill from it */
uint32_t req = n + (cache->size - cache->len);
/* How many do we require i.e. number to fill the cache + the request */
ret = rte_mempool_ops_dequeue_bulk(mp,
&cache->objs[cache->len], req);
if (unlikely(ret < 0)) {
/*
* In the off chance that we are buffer constrained,
* where we are not able to allocate cache + n, go to
* the ring directly. If that fails, we are truly out of
* buffers.
*/
goto ring_dequeue;
}
cache->len += req;
}
/* Now fill in the response ... */
for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++)
*obj_table = cache_objs[len];
cache->len -= n;
RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
return 0;
ring_dequeue:
/* get remaining objects from ring */
ret = rte_mempool_ops_dequeue_bulk(mp, obj_table, n);
if (ret < 0) {
RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
} else {
RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
}
return ret;
}

将一个mbuf放回到内存池：

点击(此处)折叠或打开

void rte_mbuf_raw_free(struct rte_mbuf *m)

释放接口内部调用rte_mempool_put_bulk()将n个mbuf(此处n为1，该接口支持批量申请，接口如下)释放到内存池。先释放到本地core的cache，本地cache满且仍有多余则释放到rte_ring中。

点击(此处)折叠或打开

static __rte_always_inline void
rte_mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
unsigned int n)
{
struct rte_mempool_cache *cache;
cache = rte_mempool_default_cache(mp, rte_lcore_id());
rte_mempool_trace_put_bulk(mp, obj_table, n, cache);
rte_mempool_generic_put(mp, obj_table, n, cache);
}
static __rte_always_inline void
rte_mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
unsigned int n, struct rte_mempool_cache *cache)
{
rte_mempool_trace_generic_put(mp, obj_table, n, cache);
RTE_MEMPOOL_CHECK_COOKIES(mp, obj_table, n, 0);
rte_mempool_do_generic_put(mp, obj_table, n, cache);
}
static __rte_always_inline void
rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
unsigned int n, struct rte_mempool_cache *cache)
{
void **cache_objs;
/* increment stat now, adding in mempool always success */
RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
/* No cache provided or if put would overflow mem allocated for cache */
if (unlikely(cache == NULL || n > RTE_MEMPOOL_CACHE_MAX_SIZE))
goto ring_enqueue;
cache_objs = &cache->objs[cache->len];
/*
* The cache follows the following algorithm
* 1. Add the objects to the cache
* 2. Anything greater than the cache min value (if it crosses the
* cache flush threshold) is flushed to the ring.
*/
/* Add elements back into the cache */
rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
cache->len += n;
if (cache->len >= cache->flushthresh) {
rte_mempool_ops_enqueue_bulk(mp, &cache->objs[cache->size],
cache->len - cache->size);
cache->len = cache->size;
}
return;
ring_enqueue:
/* push remaining objects in ring */
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
if (rte_mempool_ops_enqueue_bulk(mp, obj_table, n) < 0)
rte_panic("cannot put objects in mempool\n");
#else
rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
#endif
}

五、rte_mempool信息查询

rte_mempool的状态信息查询接口rte_mempool_dump(FILE *f, struct rte_mempool *mp)，支持dump如下信息：

点击(此处)折叠或打开

void
rte_mempool_dump(FILE *f, struct rte_mempool *mp)
{
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
struct rte_mempool_info info;
struct rte_mempool_debug_stats sum;
unsigned lcore_id;
#endif
struct rte_mempool_memhdr *memhdr;
struct rte_mempool_ops *ops;
unsigned common_count;
unsigned cache_count;
size_t mem_len = 0;
RTE_ASSERT(f != NULL);
RTE_ASSERT(mp != NULL);
fprintf(f, "mempool <%s>@%p\n", mp->name, mp);
fprintf(f, " flags=%x\n", mp->flags);
fprintf(f, " socket_id=%d\n", mp->socket_id);
fprintf(f, " pool=%p\n", mp->pool_data);
fprintf(f, " iova=0x%" PRIx64 "\n", mp->mz->iova);
fprintf(f, " nb_mem_chunks=%u\n", mp->nb_mem_chunks);
fprintf(f, " size=%"PRIu32"\n", mp->size);
fprintf(f, " populated_size=%"PRIu32"\n", mp->populated_size);
fprintf(f, " header_size=%"PRIu32"\n", mp->header_size);
fprintf(f, " elt_size=%"PRIu32"\n", mp->elt_size);
fprintf(f, " trailer_size=%"PRIu32"\n", mp->trailer_size);
fprintf(f, " total_obj_size=%"PRIu32"\n",
mp->header_size + mp->elt_size + mp->trailer_size);
fprintf(f, " private_data_size=%"PRIu32"\n", mp->private_data_size);
fprintf(f, " ops_index=%d\n", mp->ops_index);
ops = rte_mempool_get_ops(mp->ops_index);
fprintf(f, " ops_name: <%s>\n", (ops != NULL) ? ops->name : "NA");
STAILQ_FOREACH(memhdr, &mp->mem_list, next)
mem_len += memhdr->len;
if (mem_len != 0) {
fprintf(f, " avg bytes/object=%#Lf\n",
(long double)mem_len / mp->size);
}
cache_count = rte_mempool_dump_cache(f, mp);
common_count = rte_mempool_ops_get_count(mp);
if ((cache_count + common_count) > mp->size)
common_count = mp->size - cache_count;
fprintf(f, " common_pool_count=%u\n", common_count);
/* sum and dump statistics */
#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
rte_mempool_ops_get_info(mp, &info);
memset(&sum, 0, sizeof(sum));
for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
sum.put_bulk += mp->stats[lcore_id].put_bulk;
sum.put_objs += mp->stats[lcore_id].put_objs;
sum.put_common_pool_bulk += mp->stats[lcore_id].put_common_pool_bulk;
sum.put_common_pool_objs += mp->stats[lcore_id].put_common_pool_objs;
sum.get_common_pool_bulk += mp->stats[lcore_id].get_common_pool_bulk;
sum.get_common_pool_objs += mp->stats[lcore_id].get_common_pool_objs;
sum.get_success_bulk += mp->stats[lcore_id].get_success_bulk;
sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs;
sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
}
fprintf(f, " stats:\n");
fprintf(f, " put_bulk=%"PRIu64"\n", sum.put_bulk);
fprintf(f, " put_objs=%"PRIu64"\n", sum.put_objs);
fprintf(f, " put_common_pool_bulk=%"PRIu64"\n", sum.put_common_pool_bulk);
fprintf(f, " put_common_pool_objs=%"PRIu64"\n", sum.put_common_pool_objs);
fprintf(f, " get_common_pool_bulk=%"PRIu64"\n", sum.get_common_pool_bulk);
fprintf(f, " get_common_pool_objs=%"PRIu64"\n", sum.get_common_pool_objs);
fprintf(f, " get_success_bulk=%"PRIu64"\n", sum.get_success_bulk);
fprintf(f, " get_success_objs=%"PRIu64"\n", sum.get_success_objs);
fprintf(f, " get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
fprintf(f, " get_fail_objs=%"PRIu64"\n", sum.get_fail_objs);
if (info.contig_block_size > 0) {
fprintf(f, " get_success_blks=%"PRIu64"\n",
sum.get_success_blks);
fprintf(f, " get_fail_blks=%"PRIu64"\n", sum.get_fail_blks);
}
#else
fprintf(f, " no statistics available\n");
#endif
rte_mempool_audit(mp);
}

rte_mempool中有一些统计信息，保存在mp->stats，值得关注，它是通过RTE_LIBRTE_MEMPOOL_DEBUG控制的，一般不会打开。
rte_mempool库中还有两个很有用的接口：rte_mempool_dump_cache(FILE *f, const struct rte_mempool *mp)得到指定内存池中每个本地core的cache中可用的对象个数。
和rte_mempool_ops_get_count(const struct rte_mempool *mp)得到rte_ring中可用的对象个数。
再根据mp->size和以上连个值，可以计算得到应用中的mbuf使用中的个数，有些产品业务中很关注些指标。

阅读(3937) | 评论(0) | 转发(0) |

上一篇：没有了

下一篇：DPDK rte_mbuf常用接口汇总

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6