http://www.linuxsmiths.com/blog/?p=253
************************************************
gcc 选项:-mhard-float **********
#include kernel_fpu_begin();
.................... kernel_fpu_end(); ********** ************************************************ Linux Kernel and Floating Point Posted on April 25, 2010 by admin Consider the following kernel module code snippet that does a floating point divide. (The complete module code is here). 先考虑下面内核模块代码,涉及到一个浮点数的除法 static noinline double dummy_float_divide(double arg1, double arg2) { return (arg1 / arg2); }
When we compile the given module, we get. CC [M] /home/lsmiths/linux_kernel_fp_support/fp.o Building modules, stage 2. MODPOST 1 modules WARNING: "__divdf3" [/home/lsmiths/linux_kernel_fp_support/fp.ko] undefined! CC /home/lsmiths/linux_kernel_fp_support/fp.mod.o LD [M] /home/lsmiths/linux_kernel_fp_support/fp.ko An attempt to load this module fails with the following error in dmesg. fp: Unknown symbol __divdf3 __divdf3 sounds like a function for dividing floating point numbers, but how did it make it to our module. We never used it! 我们在编译这个模块的时候,得到如下的信息. !!"未知的符号__divdf3" __divdf3可能是一个提供浮点除法运算的函数,但是为什么会出现在我们的模块中,我们从来没有调用它
Disassmbling the module object (objdump -d fp.o), yields 00000000 : 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 ec 10 sub $0x10,%esp 6: 8b 45 10 mov 0x10(%ebp),%eax 9: 8b 55 14 mov 0x14(%ebp),%edx c: 89 44 24 08 mov %eax,0x8(%esp) 10: 8b 45 08 mov 0x8(%ebp),%eax 13: 89 54 24 0c mov %edx,0xc(%esp) 17: 8b 55 0c mov 0xc(%ebp),%edx 1a: 89 04 24 mov %eax,(%esp) 1d: 89 54 24 04 mov %edx,0x4(%esp) 21: e8 fc ff ff ff call 22 <--- This may be the __divdf3 // 反汇编的时候发现,这个位置出现了 // __divdf3这个函数 call 26: c9 leave 27: c3 ret Lets confirm it by looking at the relocation table readelf -r fp.o output has the following. ?Relocation section '.rel.text' at offset 0x3ee0 contains 1 entries: Offset Info Type Sym.Value Sym. Name 00000022 00001902 R_386_PC32 00000000 __divdf3 <-- The assembly code above // 察看符号链接的部分可以得到这个地址 references this function // 这里有调用__divdf3地址
So what is happening is that, gcc replaced the expression (arg1/arg2) by a call to __divdf3 function which is supposed to carry out the floating point division using integer arithmetic. Why did gcc do that ? and not generate actual assembly instructions to do floating point divide. 这里出现了什么情况呢,gcc使用一个对__divdf3的函数调用代替了(arg1/arg2)这个原本可能引发浮点数除法的函数调用,gcc为什么要这么做,而不是直接产生浮点数除法的汇编代码?
This is because, the module code was compiled with -msoft-float gcc option, which instructs gcc to not generate floating point assembly instructions and instead generate calls to the glibc s/w floating point emulation functions. -msoft-float is useful when compiling programs for platforms that do not have hardware floating point support. Nothing is wrong with this. Infact if you compile an equivalent user program with -msoft-float, it should work (pls read note below) 这是因为,这个模块在编译的时候加入了 -msoft-float 的gcc选项,这个选项使得gcc不直接产生浮点运算的汇编代码而是通过产生glibc中的s/w模拟浮点运算的函数调用来实现浮点数运算. -msoft-float在当前的运行平台没有浮点数运算硬件支持的时候,非常的有用.
P.S. Actually it depends on whether your glibc is compiled with software floating point emulation support. Usually x86 based default glibc distributions come w/o soft floating point emulation, as almost all x86 platforms have h/w floating point support. If h/w floating point support is present, it is preferred because of its speed and the fact that it puts less load on the CPU (for applications with extensive floating point usage, f.e. sone gaming applications or CAD design applications etc). P.S. 事实上,是否使用硬件计算浮点数运算取决于你的glibc是否使用软件浮点数模拟编译出来的.通常x86基础的默认glibc版本带有w/o软件浮点计算模拟器,同时几乎所有的x86平台都有h/w浮点数支持.如果h/w浮点数支持存在,那么优先选择硬件的浮点数运算器件,因为时间开销更小,同时会减少CPU的负荷.
You can use the following command to see if your glibc distribution has software floating point emulation support # ldd /bin/ls | grep libc | awk '{print $3}' | xargs readlink -f | xargs nm -D | grep __divdf3 下面命令可以察看你自己的glibc版本是否有软件的浮点数模拟器.
I believe the reason why default glibc does not come with soft float support enabled, is to prevent applications from accidentally using soft float. Otherwise if some application is unintentionally compiled with -msoft-float, the user will never know and the application will be using the inefficient soft float, even though h/w float support is available :-( So, till now we know the following things : 1. Linux kernel (and all its modules) are compiled with -msoft-float gcc option (to know why, read on) 2. Linux kernel (and all its modules) are _not_ linked with glibc and hence we do not have access to soft floating point emulation functions (like __divdf3). 3. Linux kernel itself does not provide its own implementation of __divdf3 (and other soft floating point functions). 我相信默认的glibc没有软件浮点数运算器主要是防止应用程序偶然的使用软浮点数运算.或者可能是如果某些软件没有显示的使用 -msoft-float选项编译,那么用户就不会知道了解到当前的这个应用程序使用了低效率的浮点数运算方式.现在我们知道, 1 linux内核以及它所有模块在编译阶段都有-msoft-float选项 2 linux内核以及他所有模块没有和gblic链接,所以我们不能直接访问软件浮点数模拟器 3 Linux内核没有提供自己的浮点数运算实现方法)
The above explains why we get the error while compiling and loading the module, but the inquisitive of us will still be having few questions. Lets try to find answers to those questions. What is floating point and how is it handled ?Before we get into the main topic of the discussion, i.e. the state of floating point support in Linux, and the reasons behind that, lets take a quick look at what it takes to support floating point operations. 以上的解释是针对在编译和载入模块的时候产生的错误,但是我们的好奇使得我们还有几个未解决的问题.什么是浮点数以及如何handled? Floating point usage is not very common. So much so that x86 designers did not make the floating point unit (the CPU real estate needed for floating point operations) part of the original CPU. In-fact floating point instructions were supported by a special coprocessor. For the 8086 this was called 8087. Similarly for other 80×86 processors the corresponding floating point coprocessor was called 80×87. Till the 80386 processor, this coprocessor came as a separate chip which used to sit alongside the main CPU and all the floating point calculations were directed to it, which then could use its floating point unit (FPU) to do the calculations and pass the result back to the main processor. Starting 80486, the FPU was integrated with the main CPU, but still the FPU was a logically separate unit, i.e. it used a separate register set to load/store the floating point values and it used a different ALU for carrying out the floating point calculations. 浮点数运算很不普遍.以至于x86设计者让浮点运算部件成为CPU的部件.事实上,浮点运算指令由一个特殊的协处理器支持.8086中,这个协处理器叫8087.其他的x86结构的处理器,对应的部件都是x87.直到80386处理器,这个协处理器作为一个单独芯片,位于CPU旁边并且所有的浮点数运算都会发送到这个协处理器芯片上,使用协处理器的运算部件进行运算,并将结果传回给CPU.从80486开始,FPU(Floating Process Unit)被整合进主CPU,但是同时FPU是一个逻辑独立的部件.例如,它使用一个独立的寄存器来输入数据传出结果,使用与CPU不同的ALU来进行浮点运算.
The reason for keeping the FPU separate is twofold. 1. floating point operations are very rarely used, and 2. floating point operations are expensive. This design has a very important impact on how floating point is handled in the present operating systems. Had the floating point support been native to a processor, just like the integer support, then it would not be treated any differently and we would use them just like we use the integer operations. This blog would not exist ! 保留FPU的独立的原因主要分为两点: 1 浮点运算很少使用 2 浮点运算相对开销较大 这种设计方式有一个很重要的方面是,如何当前操作系统中的处理浮点运算.假使在本地的CPU就有浮点运算支持,那么就如同对待整数运算一样,没有什么不同的,那么也就不会有这篇文章存在了.
In this article, wherever necessary, we will take the x87 FPU as an example, but all this should apply to any other processor and its corresponding FPU. 我们在本文中将以x87FPU作为例子,但是通用于其他FPU.
Other ways of handling floating point 其他的处理浮点运算的办法
What we just discussed above is called the hardware floating point support, as the floating point operations are handled in the hardware. Since the FPU is separate from the processor there is a possibility that we do not have the FPU in a certain system. Note that this does not apply to modern x86 based systems since FPU comes on the same die as the main processor, so if you buy the processor, you get the FPU also. Other architectures, especially those used for embedded system design, might still make the FPU as an add-on for cost reasons. In such cases, where the FPU is not present in the system and we need to still do a few floating point calculations, we have the following options. 上面讨论的是浮点运行的硬件支持.由于FPU被从CPU中独立出来,所以在特定系统中可能会没有FPU(这条不适用于现代x86处理器系统,现在的CPU基本上都有FPU).在非x86体系下,尤其是有些被用于嵌入式系统的体系中,可能仍然把FPU作为一个附加的东西.在这样条件下,如果没有FPU,我们还需要做一些浮点运算,那么我们还有下面这写选择.
1 Do not use floating point Instead use the fixed point arithmetic using integer operations. This can be used if our floating point usage is not much and we do not need very high precision. Also, every application does it in its own way leading to lots of inconsitencies and possible errors. 1 不用浮点运算 使用整数操作的定点运算来代替浮点运算.当你的浮点运算不太多,而且不需要太高精度的时候,可以采用.
2 Use a floating point emulation library The application program written in high-level language uses the floating point operations asis, but the compiler, instead of generating floating point instructions for them, generates calls to the floating point emulation functions. These emulation functions are provided by some library, against which the program is then linked. The GNU C Library glibc also comes with support for floating point emulation. Note that the default glibc distribution might not have the floating point emultion (FPE) support, but glibc has a configure option using which we can compile glibc with FPE support. 2 使用浮点运算模拟库 如果一个应用程序需要使用浮点运算操作,此时编译器不生成浮点运算指令,而是调用浮点运算模拟函数.这些函数由某些库提供,同时要链接进去.GNU C库的glibc提供浮点运算模拟.默认的glibc是没有浮点运算模拟支持的,但是在编译配置的时候可以编译生成浮点运算模拟支持(FPE Float Prcess Emulation) This needs support from the compiler, as it has to identify floating point operations and generate FPE calls for them. Usually compilers provide some commandline option for this.gcc provides the -msoft-float option for this purpose. This is not the default and w/o this option gcc generates floating point instructions. 这种方法需要编译器支持,编译器来决定哪些操作产生FPE调用.通常编译器会提供某些选项, 类似gcc的 -msoft-float,达到这个目的.
3 Kernel floating point emulation If we need to emulate floating point operations and we want to hide it from the applications, we can have the kernel emulate them. This can be kept completely transparent from the applications and they won’t even know if the underlying processor has a h/w FPU or not, but for the slowness that it might cause. 3 内核浮点数模拟 如果我们需要模拟浮点数操作,并且我们想对应用程序隐藏实现,我们可以使用内核来模拟.这种方法可以对应用程序完全隐藏当前浮点数的实现方法,但是会有性能损失.
This is implemented by the CPU generating an exception every time it encounters a floating point instruction, and the kernel exception handler then emulating the instruction using integer arithmetic. 这种方法的实现是:在每次碰到浮点数运算指令时,CPU产生异常,此时CPU捕获异常同时使用整数运算来模拟浮点数指令功能.
For this we need support from the CPU, i.e. it should generate an exception on encountering a floating point instruction. x86 processors provide this support by means of an Emulation bit(bit #2 in CR0 register). If the h/w FPU is not present then this bit will be set. When theEmulation bit (abbreviated as EM) is set, the x86 CPU will raise the Device Not Available(#NM) exception every single time it encounters a floating point instruction. A Linux kernelcompiled with floating point emulation support, will then handle the emulation inside theexception handler, and the application will run seamlessly. If the Linux kernel is notcompiled with FPE support, it raises SIGFPE to the application. 达到这种目的需要CPU的支持,CPU能够在遇到浮点运算指令的时候产生异常.x86的处理器提供这种支持是通过一个模拟位 Emulation bit(CR0寄存器的第二位).如果h/wFPU没有,那么这个bit就会被置1.当Emulation bit被设置之后,x86的CPU在碰到浮点运算的指令时就会发出Device Not Available的异常.Linux内核在编译过程中加入了对模拟器支持,会处理这个异常,应用程序会正常运行.如果Linux内核编译时候没有加入FPE支持,则会发送SIGFPE信号给应用程序.
The Floating Point ContextFloating point unit, uses its own set of registers for doing the floating point arithmetic, f.e. thex87 FPU (coprocessor unit for x86 processor) uses the following registers for floating pointarithmetic * 8 data registers (ST0-ST7) 8个数据寄存器 * The status register 状态寄存器 * The control register 控制寄存器 * The tag word register tag寄存器 * The last instruction point register 最后一条指令的寄存器 * Last data (operand) pointer register 最后一个数据寄存器指针寄存器 * Opcode register 运算寄存器 浮点数运算部件的上下文,(在做浮点运算过程中)使用自己的寄存器.
These are registers used specifically for floating point arithmetic and are completely separate from the native x86 registers used for integer arithmetic. These constitute the floating point context of the CPU. This (apart from the native processor context) need to be saved/restored with each process context switch. This seems like a big price to pay :-( 这些寄存器仅限于进行浮点运算中,完全独立于本地x86的整数寄存器.这些寄存器构成了浮点运算的CPU上下文.每次process上下文切换的时候,需要保存恢复,可能需要比较大的开销.
Cheer up ! we have a smart way to handle this. Read on ... go on...
Because floating point usage is not very common (infact many times a process will not execute any floating point instruction in its whole quantum) and because floating point registers are so large and plentiful, it does not make sense to save and restore floating point registers on every context switch. 由于浮点运算本身不是很普遍,而且浮点寄存器相对比较大比较充足,所以就没有必要在每次进行上下文切换的时候进行保存恢复了.
Most of the times this save/restore effort will be wasted, as the registers would not have been dirtied. x86 designers were smart enough to think about this beforehand and hence they added a bit in the CR0 register which can be used by the operating system to do this save/restore efficiently, i.e. floating point registers are saved at context switch out time, only if the going-out process executed some floating point instruction in that quantum, hence modifying the CPU FP registers. Similarly, the floating point registers are restored only when the process wants to execute some floating point instruction, hence needing the FP registers. 大多数情况下,保存回复都没有作用,由于寄存器没有变成dirty. x86设计者聪明的考虑了这点,所以他们在CR0中加入了1bit让操作系统来决定是否进行保存回复操作.类似的,浮点寄存器只有在当前进程需要进行执行浮点操作的时候,才会进行保存回复操作,所以需要FP寄存器.
I was referring to the Task Switched bit (bit #3) in the CR0 register. As the name implies, the processor sets this bit on every task switch. Pls note that since Linux does not use the CPU provided task switching facility, but instead does the task switch by hand, Linux has to set the TS bit explicitly as part of the task switch. Irrespective of how the TS bit is set, its significance is that when this bit is set the CPU generates a Device Not Available (#NM) exception, when a floating point instruction is executed (for the TS bit to have effect the EM bit should be cleared, else irrespective of the TS bit the CPU raises the #NM exception for every floating point instruction). This one feature provided by the CPU can be used by the OS to do efficient context switches involving floating point context. CR0的第三个bit,Task Switched bit. 如其名字所示,处理器在每次任务切换的时候设置这个bit. Linux没有使用CPU提供的任务切换的便利,而是手工做的人物切换.Linux需要自己去社这TS bit作为进程切换的部分. 无论TS bit是如何进行设置的,它都意味着当这个bit被设置时,CPU产生一个Device Not Available的异常,或者在一个浮点数指令要执行的时候会产生一个Device Not Available. 这个由CPU提供的特性可以被OS用于有效的进行浮点数运算的上下文切换.
How ? 怎么做到的
Lets look at how Linux uses this to do intelligent save and restore of FP registers. Lets first see how and when is the TS bit set in the CR0 register, since if the TS bit is not set, the Device Not Available (#NM) exception will not be generated and we won’t be notified of floating point instruction execution. 首先看下什么时候,TS bit(CRO寄存器中保存)怎么被设置的.如果TS bit没有设置,Device Not Available的异常不会产生,我们也无法在碰到浮点运算的时候得到通知.
The TS bit is set from cpu_init() initially, so that the first process that runs a floating point instruction causes the Device Not Available (#NM) exception. TS bit is then cleared from the Device Not Available (#NM) exception handler, so that no further floating point instructions executing from the current process, in its current quantum, cause the Device Not Available (#NM) exception. The TS bit is then set again when the current process is scheduled out. so that the new process executing a floating point instruction also causes the Device Not Available (#NM) exception. This is done from the context switch-out path — __switch_to()->__unlazy_fpu(). TS bit在cpu_init()中初始化,这样第一个进程运行浮点运算的时候就会产生Device Not Available异常. TS bit在Device Not Available处理函数中被清理掉,这样可以接受下面的浮点运算的异常发出. TS bit在当前的进程被调度出的时候再度设置,这样在新的进程运行浮点指令的时候,同样会产生Device Not Available异常. 这个步骤
In short, the Linux kernel wants to be notified of (and only) the first floating point instruction that a process executes in a quantum. It then takes appropriate action to restore the floating point state of that process. This ensures that the floating point state of a process is restored (i.e. saved FP state of the process loaded on to the CPU FP state) only (and only) when the process executes at least one floating point instruction. If a process does not execute any floating point instruction in a certain quantum, there is no need to restore the saved floating point state of that process. Also, since the saved FPU state of the current process did not change, we need not save the FPU state when this process is switched out. In such case the CPU FPU state remains the same as it was before the current process started running and if that corresponds to the next-to-run process’ FPU state, we need not even restore its FPU state, as the CPU’s FPU is already has that state. What this means is that if a process does not execute any floating point instruction in a certain quantum, we neither restore nor save the floating point context of that process. So we incur the FP context save/restore overhead when really required :-) 简要说,Linux内核希望在第一次浮点运算指令的时候被通知到. 然后内核会采取合理的行为保存这个进程的浮点运算的状态. 当进程执行仅仅至少一条浮点运算指令的时候,保证此进程浮点运算的状态被保留.如果进程不执行任何浮点运算指令,没有必要保存回复浮点运算的状态.同样,因为保存了进程的FPU状态,我们就无需在进程换出的时候保存FPU状态了. 在这种情况下,CPU FPU状态会一直保留之前的状态,如果对应的下一个进程的FPU状态一致的话,就无需再恢复FPU状态. 总的来说就是,如果当前进程没有执行任何浮点运算指令,那么我们既不会保存也不会回复浮点运算上下文.只有在真正需要浮点运算的阿时候哦,才回进行FP的上下文的回复保存操作.
Kernel function math_state_restore() is at the heart of all this. It is called from the Device Not Available (#NM) exception handler, which as we saw before, is called when the TS bit is set and some floating point instruction is executed. math_state_restore()函数完成上述逻辑,此函数在Device Not Available处理中调用,只有在TS bit设置并且某些浮点运算指令被执行的时候才会被调用.
asmlinkage void math_state_restore(void) { ... clts(); // we do not want to be called again in this process quantum /* * Now that we are going to use the FPU load this process' FPU state in the FPU */ if (unlikely(restore_fpu_checking(tsk))) { stts(); force_sig(SIGSEGV, tsk); return; } thread->status |= TS_USEDFPU; // so that __switch_to->unlazy_fpu can save the FP state of this process } ... clts() clears the TS bit as we do not want to be called for all floating point instructions, just the first one. It marks the TS_USEDFPU bit in the current process’ thread->status field. This bit is later checked by the context switch-out code to decide whether to save the FP registers as part of the scheduled out task’s context. Thus Linux kernel ensures that it saves the FP context for a process only if that process executes at least one floating point instruction in its last quantum, hence changing its already saved FPU state. This is the conditional save. 当只相应第一次的浮点运算指令时候,clts()函数清理TS bit.***.这个bit稍后会在进程上下文换出的时候用到,用来决定手否存储FP寄存器到进程上下文.内核保证,如果当前进程至少执行过一次浮点数运算指令,那么就为此进程保存FP寄存器,并且改变进程保存了的FPU的状态.
This is about the save optimization. The restore optimization is also present in the math_state_restore() function shown above. Note that, unlike other integer registers, we do not restore the FPU state unconditionally from the context switch-in code. Instead the FP restore is done from the math_state_restore() function, which signifies that the process has executed some floating point instruction, and hence it is necessary to restore the FP state of the process. As we see the floating point state is restored not at the context switch-in time. but just before the process is going to use the floating point state. This is called the lazy restore. ***.内核并不会无条件从还如的上下文中恢复FPU的状态.如果FP在math_state_restore()函数中被处理,表明当前的process会有浮点指令,因此有必要将FP状态恢复.
Using floating point in kernel 内核中使用浮点运算
We learnt how Linux uses conditional save and lazy restore techniques to allow application programs to use the hardware floating point support while avoiding the unnecessary overhead of saving/restoring the FP context on every context switch (even when not required). The assumption in the above discussion is that the only way the FP state of the CPU can change is by the application executing floating point instructions. It assumes that the kernel code will not modify the FP state of the CPU. This effectively means that the kernel code cannot use floating point instructions.
Well.. to be more precise, we cannot use floating point operations in the kernel just like that. 在kernel中也可以使用浮点运算
We have to follow some discipline. The good news is that the Linux kernel developers have made it very easy to use floating point operations inside the kernel. You just need to surround the floating point code with kernel_fpu_begin() and kernel_fpu_end() and you can safely use floating point operations in the kernel code. kernel使用浮点运算 主要是用 kernel_fpu_begin()和kernel_fpu_end()包围,就可以安全使用浮点操作在kernel代码中(这里用法有点类似临界区代码)
So what magic do these two functions do. Note how the Linux kernel had solved the problem of avoiding unneeded save and restore of FP context when scheduling in/out the user processes. In short the Linux kernel does the following 这个使用方法: * It sets the TS bit in the CR0 register, before a new process can start execution. This is so that the CPU raises the Device Not Available (#NM) exception when that process runs its first floating point instruction. The kernel can then do the lazy restore of the floating point context of the process. 在一个新的进程启动的时候,两个函数设置TS bit(CR0中的一bit).在进程第一执行浮点数运算的时候,可使得CPU可发出Device Not Available异常.
* From the Device Not Available (#NM) exception handler, it sets the TS_USEDFPU flag in the thread->status field. This can then be used by the context switch out code to conditionally save the floating point state of this process. 在Device Not Available异常的处理函数中,它设置TS_USEDFPU标志在thread->status项目中.这样做使得进程上下文切换的时候,当前浮点运算的状态可以有条件的写入到进程中去.
If we treat the kernel mode also like another process (i.e. something that is capable of changing the FP state of the CPU), we can extend the above logic to allow kernel to use floating point operations safely. This is exactly what kernel_fpu_begin() and kernel_fpu_end() do. 可以设想内核态是另外一个进程,我们扩展上面的逻辑方式使得进程可以使用浮点运算. static inline void kernel_fpu_begin(void) { struct thread_info *me = current_thread_info(); preempt_disable(); if (me->status & TS_USEDFPU) __save_init_fpu(me->task); else clts(); } static inline void kernel_fpu_end(void) { stts(); preempt_enable(); }
So if you want to use some floating point operations in the kernel, which can change the FP state of the CPU, we need to first save the FP state of the current process (__save_init_fpu() does that), but only if the current process was doing some floating point operations (me->status & TS_USEDFPU). Then we need to clear the TS bit, so that the CPU does not raise the Device Not Available (#NM) exception anymore. 如果你想在kernel中使用浮点运算(会改变FP的状态),那么首先需要将当前的FP状态写入到正在运行的进程的状态中去(当前进程需要有浮点数运算).然后清楚TS bit,使得cpu不会发送 Device Not Available 异常.
Once the kernel is done with the floating point operations, it can call kernel_fpu_end() which again sets the TS bit. This causes the Device Not Available (#NM) exception when a new process runs some floating point operations and hence we need to restore its floating point state (since the kernel modified the CPU FP state). 一旦kernel做完了浮点运算,它可以调用kernel_fpu_end(),这个函数会重新设置TS bit.CPU重新回复对 Device Not Available 异常的相应处理,同时需要将进程的浮点运算状态恢复.
kernel_fpu_begin() and kernel_fpu_end() make sense only if you are using the hardware floating point support in the kernel. For this you will have to compile the kernel (or the module) with -mhard-float option. kernel_fpu_begin() and kernel_fpu_end()保证了CPU可以使用浮点运算的硬件.在编译内核或者模块的时候需要加入 -mhard-float的选项
One more important thing to keep in mind is that while we are inside kernel_fpu_begin() and kernel_fpu_end() we should not sleep. This is because while we are modifying the CPU FP state, we do not want anyother context to use that FP state. 当Kernel运行在kernel_fpu_begin() 和 kernel_fpu_end()之间的时候,不能使用sleep.