Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2172
  • 博文数量: 3
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 10
  • 用 户 组: 普通用户
  • 注册时间: 2022-04-29 11:10
文章分类
文章存档

2022年(3)

我的朋友

分类: LINUX

2022-04-29 11:11:04

 dpdk内存管理——rte_malloc实现

——lvyilong316

    DPDK以两种方式对外提供内存管理方法,一个是rte_mempool,主要用于网卡数据包的收发;一个是rte_malloc,主要为应用程序提供内存使用接口。这里我们主要讲一下rte_malloc函数。

rte_malloc实现的大体流程如下图所示。


    下面我们逐个函数分析。

l  rte_malloc


点击(此处)折叠或打开

  1. /*
  2.  * Allocate memory on default heap.
  3.  */
  4. void *
  5. rte_malloc(const char *type, size_t size, unsigned align)
  6. {
  7.          return rte_malloc_socket(type, size, align, SOCKET_ID_ANY);
  8. }


    这个函数没什么可说的,直接调用rte_malloc_socket,但注意传入的socketid参数为SOCKET_ID_ANY

l  rte_malloc_socket

    从这个函数的入口检查可以看出,如果传入的分配内存大小size0或对其align不是2次方的倍数就返回NULL


点击(此处)折叠或打开

  1. void *
  2. rte_malloc_socket(const char *type, size_t size, unsigned align, int socket_arg)
  3. {
  4.          struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
  5.          int socket, i;
  6.          void *ret;
  7.  
  8.          /* return NULL if size is 0 or alignment is not power-of-2 */
  9.          if (size == 0 || (align && !rte_is_power_of_2(align)))
  10.                    return NULL;
  11.  
  12.          if (!rte_eal_has_hugepages())
  13.                    socket_arg = SOCKET_ID_ANY;
  14.     /*如果传入的socket参数为SOCKET_ID_ANY ,则会先尝试在当前socket上分配内存*/
  15.          if (socket_arg == SOCKET_ID_ANY)
  16.                    socket = malloc_get_numa_socket(); /*获取当前socket_id*/
  17.          else
  18.                    socket = socket_arg;
  19.  
  20.          /* Check socket parameter */
  21.          if (socket >= RTE_MAX_NUMA_NODES)
  22.                    return NULL;
  23.     /*尝试在当前socket上分配内存,如果分配成功则返回*/
  24.          ret = malloc_heap_alloc(&mcfg->malloc_heaps[socket], type,
  25.                                      size, 0, align == 0 ? 1 : align, 0);
  26.          if (ret != NULL || socket_arg != SOCKET_ID_ANY)
  27.                    return ret;
  28.     /*尝试在其他socket上分配内存,直到分配成功或者所有socket都尝试失败*/
  29.          /* try other heaps */
  30.          for (i = 0; i < RTE_MAX_NUMA_NODES; i++) {
  31.                    /* we already tried this one */
  32.                    if (i == socket)
  33.                             continue;
  34.  
  35.                    ret = malloc_heap_alloc(&mcfg->malloc_heaps[i], type,
  36.                                                size, 0, align == 0 ? 1 : align, 0);
  37.                    if (ret != NULL)
  38.                             return ret;
  39.          }
  40.  
  41.          return NULL;
  42. }


    到这里我们可以得到一个结论,在开启NUMArte_malloc会优先在当前socket上分配内存,如果分配失败再尝试在其他socket上分配内存

l  malloc_heap_alloc

这个函数用来模拟从heap(也就是struct malloc_heap)分配内存,其调用逻辑图如下:


点击(此处)折叠或打开

  1. void *
  2. malloc_heap_alloc(struct malloc_heap *heap,
  3.                    const char *type __attribute__((unused)), size_t size, unsigned flags,
  4.                    size_t align, size_t bound)
  5. {
  6.          struct malloc_elem *elem;
  7.     /*将size调整为cache line对齐*/
  8.          size = RTE_CACHE_LINE_ROUNDUP(size);
  9.          align = RTE_CACHE_LINE_ROUNDUP(align);
  10.  
  11.          rte_spinlock_lock(&heap->lock);
  12.     /*找到合适的malloc_elem结构*/
  13.          elem = find_suitable_element(heap, size, flags, align, bound);
  14.          if (elem != NULL) {
  15.                    elem = malloc_elem_alloc(elem, size, align, bound);
  16.                    /* increase heap's count of allocated elements */
  17.                    heap->alloc_count++; /*计数加一*/
  18.          }
  19.          rte_spinlock_unlock(&heap->lock);
  20.  
  21.          return elem == NULL ? NULL : (void *)(&elem[1]);
  22. }

     注意最后的返回值,返回的是elem[1]的地址,而不是elem的地址。elem[1]是什么呢?其实就是elem+1。说的直观点,rte_malloc其实就是分配了一个内存块,也可以说是分配了一个malloc_elem,这个malloc_elem作为这个内存块的一部分(存放在开头),相当于这个内存块的描述符,真正可以使用的内存是malloc_elem之后的内存区域。如下图所示。

在补一张内存初始化中讲到的数据结构关系图。

   

     下面看下find_suitable_element函数是如何找到合适的malloc_elem的。

l  find_suitable_element

点击(此处)折叠或打开

  1. static struct malloc_elem *
  2. find_suitable_element(struct malloc_heap *heap, size_t size,
  3.                    unsigned flags, size_t align, size_t bound)
  4. {
  5.          size_t idx;
  6.          struct malloc_elem *elem, *alt_elem = NULL;
  7.     /*根据申请内存的大小,在struct malloc_heap->free_head数组中找到合适的idx*/
  8.          for (idx = malloc_elem_free_list_index(size);
  9.                             idx < RTE_HEAP_NUM_FREELISTS; idx++) {
  10.                    /*在heap->free_head[idx]链表中找到合适的malloc_elem*/
  11.                    for (elem = LIST_FIRST(&heap->free_head[idx]);
  12.                                      !!elem; elem = LIST_NEXT(elem, free_list)) {
  13.                             if (malloc_elem_can_hold(elem, size, align, bound)) {
  14.                                      if (check_hugepage_sz(flags, elem->ms->hugepage_sz))
  15.                                                return elem;
  16.                                      if (alt_elem == NULL)
  17.                                                alt_elem = elem;
  18.                             }
  19.                    }
  20.          }
  21.  
  22.          if ((alt_elem != NULL) && (flags & RTE_MEMZONE_SIZE_HINT_ONLY))
  23.                    return alt_elem;
  24.  
  25.          return NULL;
  26. }

我们知道malloc_elem的组织结构是个二维的链表,如下图所示。所以第一步要找到合适的一维链表。也就是在struct malloc_heap->free_head数组中找到合适的idx

    我们在前面介绍过,struct malloc_heap->free_head数组的下标和数组中malloc_elem的大小有类似如下对应关系。所以malloc_elem_free_list_index就是返回能够满足申请大小size的最小的idx

heap->free_head[0] - (0   , 2^8]

heap->free_head[1] - (2^8 , 2^10]

heap->free_head[2] - (2^10 ,2^12]

heap->free_head[3] - (2^12, 2^14]

heap->free_head[4] - (2^14, MAX_SIZE]

之后尝试heap->free_head[idx]上的malloc_elem分配内存,如果分配失败,再尝试更大一点的(idx++)

下面malloc_elem_can_hold负责在heap->free_head[idx]找到一个合适的malloc_elem。而其内部只是调用了elem_start_pt

l  elem_start_pt

点击(此处)折叠或打开

  1. static void *
  2. elem_start_pt(struct malloc_elem *elem, size_t size, unsigned align,
  3.                    size_t bound)
  4. {
  5.          const size_t bmask = ~(bound - 1);
  6.          /*在debug模式下MALLOC_ELEM_TRAILER_LEN为cacheline大小,正常为0*/
  7.          uintptr_t end_pt = (uintptr_t)elem +
  8.                             elem->size - MALLOC_ELEM_TRAILER_LEN;
  9.          uintptr_t new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
  10.          uintptr_t new_elem_start;
  11.  
  12.          /* check boundary */
  13.          if ((new_data_start & bmask) != ((end_pt - 1) & bmask)) {
  14.                    end_pt = RTE_ALIGN_FLOOR(end_pt, bound);
  15.                    new_data_start = RTE_ALIGN_FLOOR((end_pt - size), align);
  16.                    if (((end_pt - 1) & bmask) != (new_data_start & bmask))
  17.                             return NULL;
  18.          }
  19.  
  20.          new_elem_start = new_data_start - MALLOC_ELEM_HEADER_LEN;
  21.  
  22.          /* if the new start point is before the exist start, it won't fit */
  23.          return (new_elem_start < (uintptr_t)elem) ? NULL : (void *)new_elem_start;
  24. }

代码中的几个指针如下如所示,其本质就是在当前malloc_elem中尝试按照size分配一个新的malloc_elem,看下其起始地址是否越界。如果不越界就将当前malloc_elem返回(不是新的malloc_elem,这时还没有真的分配新malloc_elem)。

找到合适的malloc_elem后,就调用malloc_elem_alloc从此malloc_elem分配新的满足size大小的malloc_elem

l  malloc_elem_alloc

点击(此处)折叠或打开

  1. struct malloc_elem *
  2. malloc_elem_alloc(struct malloc_elem *elem, size_t size, unsigned align,
  3.                    size_t bound)
  4. {
  5.          struct malloc_elem *new_elem = elem_start_pt(elem, size, align, bound);
  6.          const size_t old_elem_size = (uintptr_t)new_elem - (uintptr_t)elem;
  7.          /*trailer_size就是align-MALLOC_ELEM_TRAILER_LEN的大小,而MALLOC_ELEM_TRAILER_LEN在debug下为cacheline,否则为0*/
  8.          const size_t trailer_size = elem->size - old_elem_size - size -
  9.                    MALLOC_ELEM_OVERHEAD;
  10.     /*将老的elem从链表中删除*/
  11.          elem_free_list_remove(elem);
  12.  
  13.          if (trailer_size > MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
  14.                    /* split it, too much free space after elem */
  15.                    struct malloc_elem *new_free_elem =
  16.                                      RTE_PTR_ADD(new_elem, size + MALLOC_ELEM_OVERHEAD);
  17.  
  18.                    split_elem(elem, new_free_elem);
  19.                    malloc_elem_free_list_insert(new_free_elem);
  20.          }
  21.  
  22.     /*如果old_elem_size太小,就将老的elem状态设置为ELEM_BUSY*/
  23.          if (old_elem_size < MALLOC_ELEM_OVERHEAD + MIN_DATA_SIZE) {
  24.                    /* don't split it, pad the element instead */
  25.                    elem->state = ELEM_BUSY;
  26.                    elem->pad = old_elem_size;
  27.  
  28.                    /* put a dummy header in padding, to point to real element header */
  29.                    if (elem->pad > 0){ /* pad will be at least 64-bytes, as everything
  30.                                         * is cache-line aligned */
  31.                             new_elem->pad = elem->pad;
  32.                             new_elem->state = ELEM_PAD;
  33.                             new_elem->size = elem->size - elem->pad;/*elem->size -old_elem_size*/
  34.                             set_header(new_elem);
  35.                    }
  36.  
  37.                    return new_elem;
  38.          }
  39.  
  40.          /* we are going to split the element in two. The original element
  41.           * remains free, and the new element is the one allocated.
  42.           * Re-insert original element, in case its new size makes it
  43.           * belong on a different list.
  44.           */
  45.          /*如果old_elem_size足够大则将原有的elem分隔成两个elem,分别设置elem,new_elem的size*/
  46.          split_elem(elem, new_elem);
  47.          new_elem->state = ELEM_BUSY;/*设置new_elem的状态*/
  48.          malloc_elem_free_list_insert(elem);/*根据原有的elem调整后的size再找到合适的idx,将其插入heap->free_head[idx]*/
  49.  
  50.          return new_elem;
  51. }

elem分裂前后对比如下图所示:

分裂前

分裂后

l  rte_free

rte_free的过程就是rte_malloc的逆过程,也就是上述分裂elem的逆过程,这里不再展开。

阅读(714) | 评论(0) | 转发(0) |
0

上一篇:没有了

下一篇:dpdk rte_memzone_reserve实现

给主人留下些什么吧!~~