首页　| 　博文目录　| 　关于我

博客访问： 3858673
博文数量： 880
博客积分： 0
博客等级：民兵
技术积分： 6155
用户组：普通用户
注册时间： 2016-11-11 09:12

个人简介

To be a better coder

文章分类

全部博文（880）

python（39）
未分配的博文（841）

文章存档

2022年（5）

2021年（60）

2020年（175）

2019年（207）

2018年（210）

2017年（142）

2016年（81）

我的朋友

相关博文

tracepoint介绍

分类： LINUX

2020-08-05 13:53:42

内核中的每个tracepoint提供一个钩子来调用probe函数。一个tracepoint可以打开或关闭。打开时，probe函数关联到tracepoint；关闭时，probe函数不关联到tracepoint。tracepoint关闭时对kernel产生的影响很小，只是增加了极少的时间开销（一个分支条件判断），极小的空间开销（一条函数调用语句和几个数据结构）。当一个tracepoint打开时，用户提供的probe函数在每次这个tracepoint执行是都会被调用。

如果用户准备为kernel加入新的tracepoint，每个tracepoint必须以下列格式声明：

 #include 
    DECLARE_TRACE(tracepoint_name,
                 TPPROTO(trace_function_prototype),
		 TPARGS(trace_function_args));

上面的宏定义了一个新的tracepoint叫tracepoint_name。与这个tracepoint关联的probe函数必须与TPPROTO宏定义的函数prototype一致，probe函数的参数列表必须与TPARGS宏定义的一致。

或许用一个例子来解释会比较容易理解。Kernel里面已经包含了一些tracepoints，其中一个叫做sched_wakeup，这个tracepoint在每次scheduler唤醒一个进程时都会被调用。它是这样定义的：

 DECLARE_TRACE(sched_wakeup,
	         TPPROTO(struct rq *rq, struct task_struct *p),
		 TPARGS(rq, p))

实际在kernel中插入这个tracepoint点的是一行如下代码：

trace_sched_wakeup(rq, p);

注意，插入tracepoint的函数名就是将trace_前缀添加到tracepoint_name的前面。除非有一个实际的probe函数关联到这个tracepoint，trace_sched_wakeup()这个只是一个空函数。下面的操作就是将一个probe函数关联到一个tracepoint：

 void my_sched_wakeup_tracer(struct rq *rq, struct task_struct *p);
    register_trace_sched_wakeup(my_sched_wakeup_tracer);

register_trace_sched_wakeup()函数实际上是DEFINE_TRACE()定义的，它把probe函数my_sched_wakeup_tracer()和tracepoint sched_wakeup关联起来。

当需要获取内核的debug信息时，通常你会通过以下printk的方式打印信息：

void trace_func() { //…… printk("输出信息"); //…… } 
	
	
		1
		

		2
	

		3
		

		4
	

		5
		

		6

缺点：

内核中printk是统一控制的，各个模块的printk都会被打印，无法只打印需要关注的模块。
如果需要修改/新增打印信息，需要修改所有受影响的printk语句。这些printk分散在代码多处，每个地方都需要修改。
嵌入式系统中，如果printk信息量大，console（如果有）有大量的打印输出，用户无法在console输入命令，影响人机交互。

二、内核解决方案

内核采用“插桩”的方法抓取log，“插桩”也称为trace point。每种trace point有一个name、一个enable开关、一系列桩函数、注册桩函数的函数、卸载桩函数的函数。“桩函数”功能类似于printk，不过“桩函数”并不会把信息打印到console，而是输出到内核的ring buffer（环形缓冲区），缓冲区中的信息通过debugfs对用户呈现。

逻辑架构如下：
trace point逻辑图
接下来说明涉及到一些内核数据结构，代码参考：

数据结构	代码路径
DEFINE_TRACE(name) DECLARE_TRACE(name, proto, args)	include/linux/tracepoint.h
struct tracepoint	include/linux/tracepoint-defs.h

trace point依次执行桩函数，每个桩函数实现不同的debug功能。内核通过register_trace_##name将桩函数添加到trace point中，通过unregister_trace_##name从trace point中移除。（注：##表示字符串连接）。
内核通过DEFINE_TRACE(name)定义struct tracepoint变量来描述trace point。

struct tracepoint { const char *name; /* Tracepoint name */ struct static_key key; int (*regfunc)(void); void (*unregfunc)(void); struct tracepoint_func __rcu *funcs; }; 
	
	
		1
									

		2
								

		3
									

		4
								

		5
									

		6
								

		7

@ name* trace point的名字，内核中通过hash表管理所有的trace point，找到对应的hash slot后，需要通过name来识别具体的trace point。
@key trace point状态，1表示disable，0表示enable。
@regfunc 添加桩函数的函数
@unregfunc 卸载桩函数的函数
@funcs trace point中所有的桩函数链表

内核通过#define DECLARE_TRACE(name, proto, args)定义trace point用到的函数，定义的函数原型如下（从代码中摘取了几个，不止以下3个）：

static inline void trace_##name(proto) register_trace_##name(void (*probe)(data_proto), void *data) unregister_trace_##name(void (*probe)(data_proto), void *data) 
	
	
		1
																								

		2
																							

		3
																								

		4
																							

		5

#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) //\ extern struct tracepoint __tracepoint_##name; \ static inline void trace_##name(proto) \ { \ if (static_key_false(&__tracepoint_##name.key)) \ __DO_TRACE(&__tracepoint_##name, \ TP_PROTO(data_proto), \ TP_ARGS(data_args), \ TP_CONDITION(cond), 0); \ if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ rcu_read_lock_sched_notrace(); \ rcu_dereference_sched(__tracepoint_##name.funcs);\ rcu_read_unlock_sched_notrace(); \ } \ } \ __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ PARAMS(cond), PARAMS(data_proto), PARAMS(data_args)) \ static inline int \
	register_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_register(&__tracepoint_##name, \ (void *)probe, data); \ } \ static inline int \
	register_trace_prio_##name(void (*probe)(data_proto), void *data,\ int prio) \ { \ return tracepoint_probe_register_prio(&__tracepoint_##name, \ (void *)probe, data, prio); \ } \ static inline int \
	unregister_trace_##name(void (*probe)(data_proto), void *data) \ { \ return tracepoint_probe_unregister(&__tracepoint_##name,\ (void *)probe, data); \ } \ static inline void \
	check_trace_callback_type_##name(void (*cb)(data_proto)) \ { \ } \ static inline bool						\
	trace_##name##_enabled(void) \ { \ return static_key_false(&__tracepoint_##name.key); \ } 
	
	
		1
																										

		2
																									

		3
																										

		4
																									

		5
																										

		6
																									

		7
																										

		8
																									

		9
																										

		10
																									

		11
																										

		12
																									

		13
																										

		14
																									

		15
																										

		16
																									

		17
																										

		18
																									

		19
																										

		20
																									

		21
																										

		22
																									

		23
																										

		24
																									

		25
																										

		26
																									

		27
																										

		28
																									

		29
																										

		30
																									

		31
																										

		32
																									

		33
																										

		34
																									

		35
																										

		36
																									

		37
																										

		38
																									

		39
																										

		40
																									

		41
																										

		42
																									

		43
																										

		44
																									

		45

???第2行声明一个外部trace point变量。"static inline"部分定义了一些trace point用到的公共函数。

???第5行判断trace point是否disable，如果没有disable，那么调用__DO_TRACE遍历执行trace point中的桩函数（通过“函数指针”来实现执行桩函数）。

???trace point提供了统一的框架，用void *指向任何函数,所以各个trace point取出桩函数指针后，需要转换成自己的函数指针类型， TP_PROTO(data_proto)传递函数指针类型用于转换，具体的转换在：(–> 这一行)

#define __DO_TRACE(tp, proto, args, cond, rcuidle) //\ do { \ struct tracepoint_func *it_func_ptr; \ void *it_func; \ void *__data; \ //......................... it_func_ptr = rcu_dereference_raw((tp)->funcs); \
									\ if (it_func_ptr) { \ do { \
				it_func = (it_func_ptr)->func; \
				__data = (it_func_ptr)->data; \ --> ((void(*)(proto))(it_func))(args); \ } while ((++it_func_ptr)->func); \ } \ //......................... } while (0) 
	
	
		1
																													

		2
																												

		3
																													

		4
																												

		5
																													

		6
																												

		7
																													

		8
																												

		9
																													

		10
																												

		11
																													

		12
																												

		13
																													

		14
																												

		15
																													

		16
																												

		17

桩函数的proto的传递的例

DEFINE_EVENT_CONDITION(f2fs__submit_page_bio, f2fs_submit_page_write, --> TP_PROTO(struct page *page, struct f2fs_io_info *fio), TP_ARGS(page, fio), TP_CONDITION(page->mapping) ); 
	
	
		1
																																	

		2
																																

		3
																																	

		4
																																

		5

第2行(–>)声明了桩函数原型。

#define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) 
	
	
		1
																																				

		2

#define DEFINE_EVENT(template, name, proto, args) DECLARE_TRACE(name, PARAMS(proto), PARAMS(args)) 
	
	
		1
																																					

		2

#define DECLARE_TRACE(name, proto, args) __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), cpu_online(raw_smp_processor_id()), PARAMS(void *__data, proto), PARAMS(__data, args)) 
	
	
		1
																																				

		2
																																			

		3
																																				

		4
																																			

		5

???至此执行到__DECLARE_TRACE宏，参考前面说明，提到了何时转换成桩函数指针类型。

???从上面可以看出trace point的机制很简单，就是把用于debug的函数指针组织在一个struct trace point变量中，然后依次执行各个函数指针。不过为了避免各个模块重复写代码，内核用了比较复杂的宏而已。

???另外我们也可以发现，使用trace point必须要通过register_trace_##name将桩函数（也就是我们需要的debug函数）添加到trace point中，这个工作只能通过moudule或者修改内核代码实现，对于开发者来说，操作比较麻烦。ftrace开发者们意识到了这点，所以提供了trace event功能，开发者不需要自己去注册桩函数了，易用性较好，后面文章会谈到trace event是如何实现的以及如何使用。

阅读(5082) | 评论(0) | 转发(0) |

上一篇：kprobe的原理解析与应用

下一篇：ceph_mgr

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6