Chinaunix首页 | 论坛 | 博客
  • 博客访问: 543723
  • 博文数量: 51
  • 博客积分: 345
  • 博客等级: 民兵
  • 技术积分: 534
  • 用 户 组: 普通用户
  • 注册时间: 2011-03-21 12:02
个人简介

文章分类

全部博文(51)

文章存档

2023年(2)

2022年(1)

2021年(7)

2020年(10)

2019年(2)

2016年(20)

2015年(5)

2014年(1)

2011年(3)

我的朋友

分类: LINUX

2022-06-16 21:26:33

本文通过libbpf-bootstrap中一个实例来展示如何在BPF中使用全局变量来控制程序流以及全局变量的实现原理。

 

先看一下大神对BPF全局变量的概括


BPF global variables look and behave exactly like a user-space variables: they can be used in expressions, updated (the non-const ones), you can even take their address and pass around into helper functions. But that is only true for the BPF code side. From user-space, they can be read and updated only through BPF skeleton:

·       skel->rodata for read-only variables;

·       skel->bss for mutable zero-initialized variables;

·       skel->data for non-zero-initialized mutable variables.

You can still read/update them from user-space and those updates will be immediately reflected on the BPF side. But they are not global variables on the user-space side, they are just members of BPF skeleton’s rodatabss, or data members, which are initialized during the skeleton load phase.


Example

通过libbpf-bootstrapminimal application来解释BPF是如何使用全局变量。

 

my_pid是在BPF prog  minimal.bpf.c中声明的一个全局变量;编译时放置在.bss段中;

BPF prog截获进入sys_enter_write的系统调用,并获取当前上下文的pid,若pidmy_pid相同,则输出日志信息。

 

Application运行流程是:

open BPF application -> config my_pid -> load & verify BPF -> attach tracepoint -> trigger BPF prog

Code

User space

int main(int argc, char **argv)

{

                  struct minimal_bpf *skel;

                  int err;

 

                  /* Open BPF application */

                  skel = minimal_bpf__open();

                  if (!skel) {

                                    fprintf(stderr, "Failed to open BPF skeleton\n");

                                    return 1;

                  }

 

                  /* ensure BPF program only handles write() syscalls from our process */

                  skel->bss->my_pid = getpid();

 

                  /* Load & verify BPF programs */

                  err = minimal_bpf__load(skel);

                  if (err) {

                                    fprintf(stderr, "Failed to load and verify BPF skeleton\n");

                                    goto cleanup;

                  }

 

                  /* Attach tracepoint handler */

                  err = minimal_bpf__attach(skel);

                  if (err) {

                                    fprintf(stderr, "Failed to attach BPF skeleton\n");

                                    goto cleanup;

                  }

 

                  printf("Successfully started! Please run `sudo cat /sys/kernel/debug/tracing/trace_pipe` "

                         "to see output of the BPF programs.\n");

 

                  for (;;) {

                                    /* trigger our BPF program */

                                    fprintf(stderr, ".");

                                    sleep(1);

                  }

BPF prog

// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause

/* Copyright (c) 2020 Facebook */

#include

#include

 

char LICENSE[] SEC("license") = "Dual BSD/GPL";

 

int my_pid = 0;

 

SEC("tp/syscalls/sys_enter_write")

int handle_tp(void *ctx)

{

                  int pid = bpf_get_current_pid_tgid() >> 32;

 

                  if (pid != my_pid)

                                    return 0;

 

                  bpf_printk("BPF triggered from PID %d.\n", pid);

 

                  return 0;

}


Relocatable symbols in ELF

参照c 代码来解读汇编指令,


minimal.bpf.o:                 file format elf64-bpf

 

Disassembly of section tp/syscalls/sys_enter_write:

 

0000000000000000 :

       0:           85 00 00 00 0e 00 00 00     call 14    /* bpf_get_current_pid_tgid */

       1:           77 00 00 00 20 00 00 00     r0 >>= 32    /* store returned pid value */

       2:           18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll

                         0000000000000010:  R_BPF_64_64       my_pid

       4:           61 11 00 00 00 00 00 00     r1 = *(u32 *)(r1 + 0)

       5:           5d 01 05 00 00 00 00 00     if r1 != r0 goto +5   /* pid != my_pid*/

       6:           18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll

                         0000000000000030:  R_BPF_64_64       .rodata

       8:           b7 02 00 00 1c 00 00 00     r2 = 28

       9:           bf 03 00 00 00 00 00 00     r3 = r0

      10:          85 00 00 00 06 00 00 00     call 6

 

0000000000000058 :

      11:          b7 00 00 00 00 00 00 00     r0 = 0

      12:          95 00 00 00 00 00 00 00     exit

int handle_tp(void *ctx)

{

                  int pid = bpf_get_current_pid_tgid() >> 32;

 

                  if (pid != my_pid)

                                    return 0;

 

                  bpf_printk("BPF triggered from PID %d.\n", pid);

 

                  return 0;

}

insn #0:  call bpf_get_current_pid_tgid();

insn #1: 返回值右移32bit, 结果存入r0;

insn #5: 比较pid my_pid;

 

那么可以推测 insn #2insn #4 全局变量my_pid有关;

insn#2是两个BPF指令的长度;它是一条LD_imm64指令,BPF loaderload之前会修改 insn #2指令;

 

 要修改insn#2指令需要借助tp/syscalls/sys_enter_write的重定位表

typedef struct

{

  Elf64_Addr    r_offset;  // Offset from the beginning of section.

  Elf64_Xword   r_info;    // Relocation type and symbol index.

} Elf64_Rel;

可以看到有两条重定位信息,OFFSET表示在段内的偏移;

my_pid重定位信息显示,在段内偏移是第3条指令, relocation type 1(R_BPF_64_64),对应符号表的index 6 (.bss); my_pid的初始值为0


Global Variable Map Relocation

 

BPF prog如何引用Global variable

 

全局变量是通过bpf map加载到kernelmap类型为BPF_MAP_TYPE_ARRAY

·      open phaseBPF backend分析 BPF object file,会发现global variables 并初始化一个map结构,但这时并不创建map

·      Load phase,创建map,接着会处理所有包含ELF relocationprog section ,修改的BPF  insnmap fd 以立即数的方式编码,并loadreg。这样内核就可以通过fd找到内核态的map指针。

open phase

创建对应 .data/.rodata/.bss sectionmap结构,填充初始化信息;

bpf_object__init_maps

->bpf_object__init_global_data_maps

->bpf_object__init_internal_map()

 

收集bpf prog中重定位信息,本文例子中对应的是 解析.reltp/syscalls/sys_enter_write section的内容,并记录在prog->reloc_desc;

 

bpf_object__collect_relos

-> bpf_object__collect_prog_relos

            -> bpf_program__record_reloc  //  记录重定位描述信息

Load phase

user space

 

create & update map

创建/更新 bss map;更新bss段内的变量的实现后面章节有介绍。

 

bpf_object__create_maps

-> bpf_object__create_map

-> bpf_object__populate_internal_map()

            -> bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);

patch bytecode

在加载bpf prog到内核之前,loader map fd编码在指令中;

 

bpf_object__relocate()

-> bpf_object__relocate_data()

            -> case RELO_DATA:

                        insn[1].imm = insn[0].imm + relo->sym_off;

                        insn[0].src_reg = BPF_PSEUDO_MAP_VALUE;

                        insn[0].imm = obj->maps[relo->map_idx].fd;   // map fd

 

原有指令

       2:   18 01 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00     r1 = 0 ll

insn[0].code = 0x18

insn[0].dst_reg = r1

 

patch后指令:

insn[0].code = 0x18

insn[0].dst_reg = r1

insn[0].src_reg = BPF_PSEUDO_MAP_VALUE

insn[0]. imm = obj->maps[relo->map_idx].fd

insn[1].imm = insn[0].imm + relo->sym_off;   //  段内偏移


kernel space

 

create map

可以阅读array_map_alloc()代码;

需要注意的是,实际的数据是通过array->value访问,对于设置了BPF_F_MMAPABLE flagmap,要确保array->value 页面对齐。


update map

copy 用户数据到array->value


kernel patch ld_imm64 instructions

bpf verify阶段会再次修改ld_imm64指令,通过map fd找到实际访问的内存地址;并将该地址编码在指令中。

 

resolve_pseudo_ldimm64

-> map = __bpf_map_get(f);  // f.file->private_data;

-> map->ops->map_direct_value_addr(map, &addr, off);    //

            -> array_map_direct_value_addr()

                        -> struct bpf_array *array = container_of(map, struct bpf_array, map);

                        -> *imm = (unsigned long)array->value;  // 内核中全局变量所在的地址

-> addr += off;

insn[0].imm = (u32)addr;

insn[1].imm = addr >> 32;

此时insn内容为

insn[0].code = 0x18

insn[0].dst_reg = r1

insn[0].src_reg = BPF_PSEUDO_MAP_VALUE

insn[0]. imm = (u32)addr;

insn[1].imm = addr >> 32;

 

map的有效地址被loadr1

 

这样BPF prog访问全局变量就不存在因查找带来的overhead

直接访问内存。

ld_imm64指令变换


Sharing Global Variable Between User Space and Kernel Space

其本质是用户空间与内核空间共享/访问同一块内存;

通过BPF skeleton可以实现在用户空间对全局变量的操作。

struct minimal_bpf {

                  ……

                  struct minimal_bpf__bss {

                                    int my_pid;

                  } *bss;

                  struct minimal_bpf__rodata {

                  } *rodata;

};

 

Open Phase

open 阶段, skeleton loader之间共享内存来初始化 bss内的全局变量;

 

BPF loader在初始化bpf map时,会通过mmap()申请一片内存,


struct minimal_bpf  *bss 最终指向了 mmap() 返回的地址;

 

s->maps[0].mmaped = (void **)&obj->bss; 

**mmaped = s->maps[i].mmaped;

*mmaped = (*map)->mmaped;

 

此时用户空间可以重写 skel->bss->my_pid 来更新 bss段内的变量值。

注意这时还没有创建 .bss map;仅仅在对 .bss内变量做初始化。

Load Phase

load bpf成功后, bss map已经被create,可以得到其map fd

再次执行mmap();这次入参fd被设定为新创建mapfd

**mmaped = s->maps[i].mmaped;

*mmaped = mmap(map->mmaped, mmap_sz, prot, MAP_SHARED | MAP_FIXED, map_fd, 0);

 

 

struct minimal_bpf  *bss再次被更新,指向了kernel存放全局变量的内存地址。

 

由于设置了map_fdmmap()触发的内核态函数是array_map_mmap(), 该函数映射内核态array->value处的内存到用户空间,至此用户可以自由修改bpf prog的全局变量。

阅读(1201) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~