2009-08-17 09:43:07

Linux 内核使用的 GNU C 扩展

=========================== Linux 内核使用的 GNU C 扩展 ===========================

GNC CC 是一个功能非常强大的跨平台 C 编译器,它对 C 语言提供了很多扩展, 这些扩展对优化、目标代码布局、更安全的检查等方面提供了很强的支持。本文把 支持 GNU 扩展的 C 语言称为 GNU C。

Linux 内核代码使用了大量的 GNU C 扩展,以至于能够编译 Linux 内核的唯一编 译器是 GNU CC,以前甚至出现过编译 Linux 内核要使用特殊的 GNU CC 版本的情 况。本文是对 Linux 内核使用的 GNU C 扩展的一个汇总,希望当你读内核源码遇 到不理解的语法和语义时,能从本文找到一个初步的解答,更详细的信息可以查看。文中的例子取自 Linux 2.4.18。

语句表达式 ==========

GNU C 把包含在括号中的复合语句看做是一个表达式,称为语句表达式,它可以出 现在任何允许表达式的地方,你可以在语句表达式中使用循环、局部变量等,原本 只能在复合语句中使用。例如:

++++ include/linux/kernel.h 159: #define min_t(type,x,y) \ 160: ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; }) ++++ net/ipv4/tcp_output.c 654: int full_space = min_t(int, tp->window_clamp, tcp_full_space(sk));

复合语句的最后一个语句应该是一个表达式,它的值将成为这个语句表达式的值。 这里定义了一个安全的求最小值的宏,在标准 C 中,通常定义为:

#define min(x,y) ((x) < (y) ? (x) : (y))

这个定义计算 x 和 y 分别两次,当参数有副作用时,将产生不正确的结果,使用 语句表达式只计算参数一次,避免了可能的错误。语句表达式通常用于宏定义。

Typeof ======

使用前一节定义的宏需要知道参数的类型,利用 typeof 可以定义更通用的宏,不 必事先知道参数的类型,例如:

++++ include/linux/kernel.h 141: #define min(x,y) ({ \ 142: const typeof(x) _x = (x); \ 143: const typeof(y) _y = (y); \ 144: (void) (&_x == &_y); \ 145: _x < _y ? _x : _y; })

这里 typeof(x) 表示 x 的值类型,第 142 行定义了一个与 x 类型相同的局部变 量 _x 并初使化为 x,注意第 144 行的作用是检查参数 x 和 y 的类型是否相同。 typeof 可以用在任何类型可以使用的地方,通常用于宏定义。

零长度数组 ==========

GNU C 允许使用零长度数组,在定义变长对象的头结构时,这个特性非常有用。例 如:

++++ include/linux/minix_fs.h 85: struct minix_dir_entry { 86: __u16 inode; 87: char name[0]; 88: };

结构的最后一个元素定义为零长度数组,它不占结构的空间。在标准 C 中则需要 定义数组长度为 1,分配时计算对象大小比较复杂。

可变参数宏 ==========

在 GNU C 中,宏可以接受可变数目的参数,就象函数一样,例如:

++++ include/linux/kernel.h 110: #define pr_debug(fmt,arg...) \ 111: printk(KERN_DEBUG fmt,##arg)

这里 arg 表示其余的参数,可以是零个或多个,这些参数以及参数之间的逗号构 成 arg 的值,在宏扩展时替换 arg,例如:



printk("<7>" "%s:%d", filename, line)

使用 ## 的原因是处理 arg 不匹配任何参数的情况,这时 arg 的值为空,GNU C 预处理器在这种特殊情况下,丢弃 ## 之前的逗号,这样



printk("<7>" "success!\n")


标号元素 ========

标准 C 要求数组或结构变量的初使化值必须以固定的顺序出现,在 GNU C 中,通 过指定索引或结构域名,允许初始化值以任意顺序出现。指定数组索引的方法是在 初始化值前写 '[INDEX] =',要指定一个范围使用 '[FIRST ... LAST] =' 的形式, 例如:

+++++ arch/i386/kernel/irq.c 1079: static unsigned long irq_affinity [NR_IRQS] = { [0 ... NR_IRQS-1] = ~0UL };

将数组的所有元素初使化为 ~0UL,这可以看做是一种简写形式。

要指定结构元素,在元素值前写 'FIELDNAME:',例如:

++++ fs/ext2/file.c 41: struct file_operations ext2_file_operations = { 42: llseek: generic_file_llseek, 43: read: generic_file_read, 44: write: generic_file_write, 45: ioctl: ext2_ioctl, 46: mmap: generic_file_mmap, 47: open: generic_file_open, 48: release: ext2_release_file, 49: fsync: ext2_sync_file, 50 };

将结构 ext2_file_operations 的元素 llseek 初始化为 generic_file_llseek, 元素 read 初始化为 genenric_file_read,依次类推。我觉得这是 GNU C 扩展中 最好的特性之一,当结构的定义变化以至元素的偏移改变时,这种初始化方法仍然 保证已知元素的正确性。对于未出现在初始化中的元素,其初值为 0。

Case 范围 =========

GNU C 允许在一个 case 标号中指定一个连续范围的值,例如:

++++ arch/i386/kernel/irq.c 1062: case '0' ... '9': c -= '0'; break; 1063: case 'a' ... 'f': c -= 'a'-10; break; 1064: case 'A' ... 'F': c -= 'A'-10; break;

case '0' ... '9':


case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9':

声明的特殊属性 ==============

GNU C 允许声明函数、变量和类型的特殊属性,以便手工的代码优化和更仔细的代 码检查。要指定一个声明的属性,在声明后写

__attribute__ (( ATTRIBUTE ))

其中 ATTRIBUTE 是属性说明,多个属性以逗号分隔。GNU C 支持十几个属性,这 里介绍最常用的:

* noreturn

属性 noreturn 用于函数,表示该函数从不返回。这可以让编译器生成稍微优化的 代码,最重要的是可以消除不必要的警告信息比如未初使化的变量。例如:

++++ include/linux/kernel.h 47: # define ATTRIB_NORET __attribute__((noreturn)) .... 61: asmlinkage NORET_TYPE void do_exit(long error_code) ATTRIB_NORET;


属性 format 用于函数,表示该函数使用 printf, scanf 或 strftime 风格的参 数,使用这类函数最容易犯的错误是格式串与参数不匹配,指定 format 属性可以 让编译器根据格式串检查参数类型。例如:

++++ include/linux/kernel.h? 89: asmlinkage int printk(const char * fmt, ...) 90: __attribute__ ((format (printf, 1, 2)));


* unused

属性 unused 用于函数和变量,表示该函数或变量可能不使用,这个属性可以避免 编译器产生警告信息。

* section ("section-name")

属性 section 用于函数和变量,通常编译器将函数放在 .text 节,变量放在 .data 或 .bss 节,使用 section 属性,可以让编译器将函数或变量放在指定的 节中。例如:

++++ include/linux/init.h 78: #define __init __attribute__ ((__section__ (".text.init"))) 79: #define __exit __attribute__ ((unused, __section__(".text.exit"))) 80: #define __initdata __attribute__ ((__section__ (".data.init"))) 81: #define __exitdata __attribute__ ((unused, __section__ (".data.exit"))) 82: #define __initsetup __attribute__ ((unused,__section__ (".setup.init"))) 83: #define __init_call __attribute__ ((unused,__section__ (".initcall.init"))) 84: #define __exit_call __attribute__ ((unused,__section__ (".exitcall.exit")))

连接器可以把相同节的代码或数据安排在一起,Linux 内核很喜欢使用这种技术, 例如系统的初始化代码被安排在单独的一个节,在初始化结束后就可以释放这部分 内存。

* aligned (ALIGNMENT)

属性 aligned 用于变量、结构或联合类型,指定变量、结构域、结构或联合的对 齐量,以字节为单位,例如:

++++ include/asm-i386/processor.h 294: struct i387_fxsave_struct { 295: unsigned short cwd; 296: unsigned short swd; 297: unsigned short twd; 298: unsigned short fop; 299: long fip; 300: long fcs; 301: long foo; ...... 308: } __attribute__ ((aligned (16)));

表示该结构类型的变量以 16 字节对齐。通常编译器会选择合适的对齐量,显示指 定对齐通常是由于体系限制、优化等原因。

* packed

属性 packed 用于变量和类型,用于变量或结构域时表示使用最小可能的对齐,用 于枚举、结构或联合类型时表示该类型使用最小的内存。例如:

++++ include/asm-i386/desc.h 51: struct Xgt_desc_struct { 52: unsigned short size; 53: unsigned long address __attribute__((packed)); 54: };

域 address 将紧接着 size 分配。属性 packed 的用途大多是定义硬件相关的结 构,使元素之间没有因对齐而造成的空洞。

当前函数名 ==========

GNU CC 预定义了两个标志符保存当前函数的名字,__FUNCTION__ 保存函数在源码 中的名字,__PRETTY_FUNCTION__ 保存带语言特色的名字。在 C 函数中,这两个 名字是相同的,在 C++ 函数中,__PRETTY_FUNCTION__ 包括函数返回类型等额外 信息,Linux 内核只使用了 __FUNCTION__。

++++ fs/ext2/super.c 98: void ext2_update_dynamic_rev(struct super_block *sb) 99: { 100: struct ext2_super_block *es = EXT2_SB(sb)->s_es; 101: 102: if (le32_to_cpu(es->s_rev_level) > EXT2_GOOD_OLD_REV) 103: return; 104: 105: ext2_warning(sb, __FUNCTION__, 106: "updating to rev %d because of new feature flag, " 107: "running e2fsck is recommended", 108: EXT2_DYNAMIC_REV);

这里 __FUNCTION__ 将被替换为字符串 "ext2_update_dynamic_rev"。虽然 __FUNCTION__ 看起来类似于标准 C 中的 __FILE__,但实际上 __FUNCTION__ 是被编译器替换的,不象 __FILE__ 被预处理器替换。

内建函数 ========

GNU C 提供了大量的内建函数,其中很多是标准 C 库函数的内建版本,例如 memcpy,它们与对应的 C 库函数功能相同,本文不讨论这类函数,其他内建函数 的名字通常以 __builtin 开始。

* __builtin_return_address (LEVEL)

内建函数 __builtin_return_address 返回当前函数或其调用者的返回地址,参数 LEVEL 指定在栈上搜索框架的个数,0 表示当前函数的返回地址,1 表示当前函数 的调用者的返回地址,依此类推。例如:

++++ kernel/sched.c 437: printk(KERN_ERR "schedule_timeout: wrong timeout " 438: "value %lx from %p\n", timeout, 439: __builtin_return_address(0));

* __builtin_constant_p(EXP)

内建函数 __builtin_constant_p 用于判断一个值是否为编译时常数,如果参数 EXP 的值是常数,函数返回 1,否则返回 0。例如:

++++ include/asm-i386/bitops.h 249: #define test_bit(nr,addr) \ 250: (__builtin_constant_p(nr) ? \ 251: constant_test_bit((nr),(addr)) : \ 252: variable_test_bit((nr),(addr)))

很多计算或操作在参数为常数时有更优化的实现,在 GNU C 中用上面的方法可以 根据参数是否为常数,只编译常数版本或非常数版本,这样既不失通用性,又能在 参数是常数时编译出最优化的代码。

* __builtin_expect(EXP, C)

内建函数 __builtin_expect 用于为编译器提供分支预测信息,其返回值是整数表 达式 EXP 的值,C 的值必须是编译时常数。例如:

++++ include/linux/compiler.h 13: #define likely(x) __builtin_expect((x),1) 14: #define unlikely(x) __builtin_expect((x),0) ++++ kernel/sched.c 564: if (unlikely(in_interrupt())) { 565: printk("Scheduling in interrupt\n"); 566: BUG(); 567: }

这个内建函数的语义是 EXP 的预期值是 C,编译器可以根据这个信息适当地重排 语句块的顺序,使程序在预期的情况下有更高的执行效率。上面的例子表示处于中 断上下文是很少发生的,第 565-566 行的目标码可能会放在较远的位置,以保证 经常执行的目标码更紧凑。

补充几个从CLF看来的GCC知识(GCC之外的工具), 对某些内核代码的理解有帮助:P

Q: 请问:__attribute__((nocast))???

__attribute__((nocast))是个什么意思?查了 gcc 手册也没有解说!

A:(by tclwp) 这个不是gcc做的事 在编译内核时, 配置完后 make C=1 或 make C=2 就会调用外部程序sparse (CHECKER)做检查 (有多种选择,自己安装)

还是来看看Linus Torvalds在演讲会上的说明:

......................... Basically you can add attributes to any kind of data type. In Sparse one of the attributes on data types is which address space it belongs to. This is a define that goes away if you don't use Sparse.

So GCC, who doesn't know anything about address spaces, will never see any other code. So GCC treats the exact same code as it always used to do; but when you run it with a Sparse checker, it will notice that when you do a copy_to_user call, the first argument has to be a user pointer. Well, address_space (1), the tool itself doesn't really care about user or kernel; you can use it for anything. If it gets anything that isn't a user pointer, for example, if you switch the arguments around by mistake, which has happened, it will complain with a big fat warning saying "Hey, the address spaces don't match." ..................


A:(by tclwp) _context__(1) add a "+1" context marker 用于代码的静态上下文匹配检查(由CHECKER检查, 见代码中 "#ifdef __CHECKER__") 也可用于自旋锁lock 和 unlock的匹配检查 还是看看Linus Torvalds 自己的说明 Make context count warning be controllable with "-Wcontext" flag.


Sparse "context" checking.. From: Linus Torvalds Date: Sat Oct 30 2004 - 22:23:19 EST

Next message: OGAWA Hirofumi: "Re: [2.6 patch] kill fatfs_syms.c" Previous message: Andi Kleen: "Re: [PATCH] Add panic blinking to 2.6" Next in thread: Roland Dreier: "Re: Sparse "context" checking.." Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


I just committed the patches to the kernel to start supporting a new automated correctness check that I added to sparse: the counting of static "code context" information.

The sparse infrastructure is pretty agnostic, and you can count pretty much anything you want, but it's designed to test that the entry and exit contexts match, and that no path through a function is ever entered with conflicting contexts.

In particular, this is designed for doing things like matching up a "lock" with the pairing "unlock", and right now that's exactly what the code does: it makes each spinlock count as "+1" in the context, and each spinunlock count as "-1", and then hopefully it should all add up.

It doesn't always, of course. Since it's a purely static analyser, it's unhappy about code like

int fn(arg) { if (arg) spin_lock(lock); ... if (arg) spin_unlock(lock); }

because the code is not statically deterministic, and the stuff in between can be called with or without a lock held. That said, this has long been frowned upon, and there aren't that many cases where it happens.

Right now the counting is only enabled if you use sparse, and add the "-Wcontext" flag to the sparse command line by hand - and the spinlocks have only been annotated for the SMP case, so right now it only works for CONFIG_SMP. Details, details.

Also, since sparse does purely local decisions, if you actually _intend_ to grab a lock in one function and release it in another, you need to tell sparse so, by annotating the function that acquires the lock (with "__acquires(lockname)") and the function that releases it (with, surprise surprise, "__releases(lockname)") in the declaration. That tells sparse to update the context in the callers appropriately, but it also tells sparse to expect the proper entry/exit contexts for the annotated functions themselves.

I haven't done the annotation for any functions yet, so expect warnings. If you do a checking run, the warnings will look something like:

CHECK kernel/resource.c kernel/resource.c:59:13: warning: context imbalance in 'r_start' - wrong count at exit kernel/resource.c:69:13: warning: context imbalance in 'r_stop' - unexpected unlock

which just shows that "r_start" acquired a lock, and sparse didn't expect it to, while "r_stop" released a lock that sparse hadn't realized it had. In this case, the cause is pretty obvious, and the annotations are equally so.

A more complicated case is

CHECK kernel/sys.c kernel/sys.c:465:2: warning: context imbalance in 'sys_reboot' - different lock contexts for basic block

where that "different lock contexts" warning means that sparse determined that some code in that function was reachable with two different lock contexts. In this case it's actually harmless, since what happens in this case is that the code after rebooting the machine is unreachable, and sparse just doesn't understand that.

But in other cases it's more fundamental, and the lock imbalance is due to dynamic data that sparse just can't understand. The warning in that case can be disabled by hand, but there doesn't seem to be that many of them. A full kernel build for me has about 200 warnings, and most of them seem to be the benign kind (ie the kind above where one function acquires the lock and another releases it, and they just haven't been annotated as such).

The sparse thing could be extended to _any_ context that wants pairing, and I just wanted to let people know about this in case they find it interesting..

Linus --------------------------------------------------------------------------------

Linus> In particular, this is designed for doing things like Linus> matching up a "lock" with the pairing "unlock", and right Linus> now that's exactly what the code does: it makes each Linus> spinlock count as "+1" in the context, and each spinunlock Linus> count as "-1", and then hopefully it should all add up.

Do you have a plan for how to handle functions like spin_trylock()? I notice in the current tree you just didn't annotate spin_trylock().

Thanks, Roland --------------------------------------------------------------------------------

On Sat, 30 Oct 2004, Roland Dreier wrote: > > Linus> In particular, this is designed for doing things like > Linus> matching up a "lock" with the pairing "unlock", and right > Linus> now that's exactly what the code does: it makes each > Linus> spinlock count as "+1" in the context, and each spinunlock > Linus> count as "-1", and then hopefully it should all add up. > > Do you have a plan for how to handle functions like spin_trylock()? I > notice in the current tree you just didn't annotate spin_trylock().

Actually, the _current_ tree does actually annotate spin_trylock() (as of just before I sent out the email). It looks like

#define spin_trylock(lock) __cond_lock(_spin_trylock(lock))

where __cond_lock() for sparse is

include/linux/compiler.h:# define __cond_lock(x) ((x) ? ({ __context__(1); 1; }) : 0)

ie we add a "+1" context marker for the success case.

NOTE! This works with sparse only because sparse does immediate constant folding, so if you do

if (spin_trylock(lock)) { .. spin_unlock(lock); }

sparse linearizes that the right way unconditionally, and even though there is a data-dependency, the data depenency is constant. However, if some code does

success = spin_trylock(lock); if (success) { .. spin_unlock(lock); }

sparse would complain about it, because sparse doesn't do any _real_ data flow analysis.

So sparse can follow all the obvious cases, including trylock and "atomic_dec_and_lock()".


