全部博文(32)
分类: LINUX
2011-10-30 22:24:01
@仅供交流学习使用,勿做商业使用
Linux Kernel Code: 2.6.35.7
interrupt or exception previlidge checkprivilege check: 允许用户程序调用内核程序;而禁止内核调用用户程序,以防用户恶意程序; 如此,在用户态(3)和内核态(0)都可以发生中断,并执行中断处理程序(0)了。
关于privilege check最大的疑惑在于:中断是可以在用户态发生的;
From ULK3:
Makes sure the interrupt was issued by an authorized source. First, it compares the Current Privilege Level (CPL), which is stored in the two least significant bits of the cs register, with the Descriptor Privilege Level (DPL ) of the Segment Descriptor included in the GDT. Raises a "General protection" exception if the CPL is lower than the DPL, because the interrupt handler cannot have a lower privilege than the program that caused the interrupt. For programmed exceptions, makes a further security check: compares the CPL with the DPL of the gate descriptor included in the IDT and raises a "General protection" exception if the DPL is lower than the CPL. This last check makes it possible to prevent access by user applications to specific trap or interrupt gates.
Note:
6.12.1.1 Protection of Exception- and Interrupt-Handler Procedures
The privilege-level protection for exception- and interrupt-handler procedures is similar to that used for ordinary procedure calls when called through a call gate (see Section 5.8.4, “Accessing a Code Segment Through a Call Gate”). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL.
An attempt to violate this rule results in a general-protection exception (#GP). The protection mechanism for exception- and interrupt-handler procedures is different in the following ways:
64-ia-32-architectures-software-developer-vol-1-manual.pdf
64-ia-32-architectures-software-developer-vol-2a-2b-instruction-set-a-z-manual.pdf
64-ia-32-architectures-software-developer-vol-3a-3b-system-programming-manual.pdf
GDT是per-cpu的
In uniprocessor systems there is only one GDT, while in multiprocessor systems there is one GDT for every CPU in the system. All GDTs are stored in the cpu_gdt_table array, while the addresses and sizes of the GDTs (used when initializing the gdtr registers) are stored in the cpu_gdt_descr array.
Intel(R) 64 and IA-32 Architectures Software Developer's Manual, Volume 3A: 17.1.1 Address Translation in Real-Address Mode:
When using 8086-style address translation, it is possible to specify addresses larger than 1 MByte. For example, with a segment selector value of FFFFH and an offset of FFFFH, the linear (and physical) address would be 10FFEFH (1 megabyte plus 64 KBytes).
setup_idt(); setup_gdt(); protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4));为何要在跳入保护模式之前,填充临时的idt&gdt呢?
address translationIntel 64 and IA-32 Architectures Software Developer's Manual Volume 3 (3A & 3B):System Programming Guide:
3.1 MEMORY MANAGEMENT OVERVIEW
If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses that the processor can generate on its address bus.
Jump into Protected Mode
9.9.1 Switching to Protected Mode
Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3 (3A & 3B):System Programming Guide:
9.8 SOFTWARE INITIALIZATION FOR PROTECTED-MODE OPERATION
irq_desc->handle_irq在哪里初始化呢?
struct irq_desc irq_desc[NR_IRQS] __cacheline_aligned_in_smp = {
Documentation/Docbook/genericirq.tmpl:
The interrupt flow handlers (either predefined or architecture specific) are assigned to specific interrupts by the architecture either during bootup or during device initialization.
assigned to during bootup:
init_IRQ()->XXX->init_ISA_irqs():
drivers/gpio/pca953x.c:pca953x_irq_setup():
for (lvl = 0; lvl < chip->gpio_chip.ngpio; lvl++) {arch/mips/lasat/interrupt.c:arch_init_irq():
for (i = LASAT_IRQ_BASE; i <= LASAT_IRQ_END; i++)
system_call在trap_init中注册,定义在/arch/x86/kernel/entry_32.S文件中
interrupt数组定义在.init.rodata段,entry_32.S:
.section .init.rodata,"a".init段中的数据会在init完成之后free:
/* Init code and data - will be freed after init */
INIT_DATA_SECTION宏定义于include/asm-generic/vmlinux.lds.h:
#define INIT_DATA_SECTION(initsetup_align) \INIT_DATA同样定义于该文件:
/* init and exit section handling */
释放初始化内存的调用路径:
start_kernel()->rest_init()->new kernel thread: kernel_init()->init_post()->free_initmem();
softirq, tasklet, workqueue
这样做有如下效果:
因为do_softirq禁止的是本地的softirq,所以其他cpu上的softirq可以正常执行,另外,由于softirq_action中只有一个可重入的函数,并无数据结构需要跨CPU保护,所以即使同一类型的softirq也可以同时在不同的CPU上执行;但诚如上面所说,所有种类的延迟函数,在同一个CPU上,都是串行执行的;
tasklet是在softirq的基础上实现的,所以具有上述的大部分特点,只是tasklet_struct中包含需要跨CPU保护的data,所以在tasklet_action中,执行相应tasklet时会检查对应的标志,如果其他CPU,已经在执行,则重新插入本cpu的tasklet_head的链表中,等待下次执行。
如此tasklet具有了softirq的另外一个特性:同一类型的softirq同时只能在一个CPU上执行;当然,不同类型的tasklet可以同时在不同的CPU上执行。
workqueue在进程上下文执行——执行时并没有对中断作假设,所以可以睡眠。
TODO: TSS…The processor transfers execution to another task in one of four cases:
The current program, task, or procedure executes a JMP or CALL instruction to a TSS descriptor in the GDT.
The current program, task, or procedure executes a JMP or CALL instruction to a task-gate descriptor in the GDT or the current LDT.
An interrupt or exception vector points to a task-gate descriptor in the IDT.
The current task executes an IRET when the NT flag in the EFLAGS register is set.
注意,并非所有的jmp/call都会引起task switch,同样,也并非所有的interrupt/exception/iret会引起task switch;
All of these methods for dispatching a task identify the task to be dispatched with a segment selector that points to a task gate or the TSS for the task. When dispatching a task with a CALL or JMP instruction, the selector in the instruction may select the TSS directly or a task gate that holds the selector for the TSS. __When dispatching a task to handle an interrupt or exception, the IDT entry for the interrupt or exception
must contain a task gate that holds the selector for the interrupt- or exceptionhandler TSS.__
以上引自Intel Manual 3A chap-7
TODO: TSSTSS的关注点:
How many TSSs are there?
If TSS Descriptor saved in GDT, where TSSs were located?
由于在SMP系统中,GDT是per-cpu的,由上图可以看出每个CPU有一个通用TSSd和一个 double fault专用TSSd;
FROM ULK3: 3.3.2. Task State Segment
The 80x86 architecture includes a specific segment type called the Task State Segment (TSS), to store hardware contexts.
Although Linux doesn't use hardware context switches, it is nonetheless forced to set up a TSS for each distinct CPU in the system.
This is done for two main reasons:
其中说,Linux并不使用hardware context switches!
但是,也没有禁止(PS. 我目前不知道禁止Intel CPU task switch的方法),所以所有的task(kernel path, or user processes)共用同一个TSS(d),不要钻double fault的牛角尖,:)
用意在于避免禁止中断时间过长的软中断,执行时为何要禁止中断?
refer to: Intel Manual 3a: 6.8 ENABLING AND DISABLING INTERRUPTS
禁止中断,并不能禁止non-maskable interrupts & exceptions,于是造成了中断嵌套(nested interrupts).
when the IF flag is set, interrupts delivered to the INTR or through the local APIC pin are processed as normal external interrupts.
在中断禁止期间,并不会ack中断,清除INTR状态,那么在重新设置IF标志位之后,先前的INTR状态,是否能得到处理呢?
要弄清这个,需要理解:
软中断做的是一些可延迟的费时间的事,当然不能在中断里执行了。
下面附有do_softirq代码,可以看到在执行可延迟函数第一件事就是开中断。但在开始之前,禁用了下半部中断(local_bh_disable)。这样就算被中断了,返回内核时也不会被抢占,还是执行这里的代码。也不会被调度。
那么这样的后果就是软中断上下文里的会一直执行下去,直到到达了限定次数,然后唤醒守护进程。
再返回看一下do_softirq()的代码,发现确实如此,在其实际执行softirq_action之前,确实是打开了中断的,所以可以说softirq在执行实际的延迟函数时,并没有禁用中断。
上面的分析,忽略了一个效果,就是local_bh_disable造成了在本地CPU上,softirq的串行执行,因为在do_softirq的最开始会判断是否in_interrupt.
其实,我还有另外一个不成熟的想法:
之所以,interrupt/exception handler必须尽量的短,是因为在执行完handler之后,才ack irq line,清除irq line的状态,让这条line上新的irq可以被识别到。
这里中断状态可以从两个角度观察:
CPU内部可以通过clear IF flag来禁止CPU对外部中断的响应,但是外部中断依然可以发生,处不处理,irq line的状态就在那里,重新set IF flag之后,就会被看到;但是如果中断发生之后,不立即清除irq line的状态,即使有新的相同中断发生,也无法识别到irq line状态的改变。
这想法确实不成熟,模糊的地方在于ack irq line的时间,可以到do_IRQ中去看一下:
do_IRQ()->handle_irq()->eg. handle_level_irq():
void可以看到在中断处理的最开始,就调用mask_ack_irq()对中断进行了ack,呵呵,但同时还多了一个mask,就是说,即使现在ack了,该中断也是被mask了的,这是对外部APIC该种中断的禁止,APIC如果再发现这种,也不用改变irq line线的状态了。
static inline void mask_ack_irq(struct irq_desc *desc, int irq)
from ULK3:
Each IRQ line can be selectively disabled. Thus, the PIC can be programmed to disable IRQs. That is, the PIC can be told to stop issuing interrupts that refer to a given IRQ line, or to resume issuing them. Disabled interrupts are not lost; the PIC sends them to the CPU as soon as they are enabled again. This feature is used by most interrupt handlers, because it allows them to process IRQs of the same type serially.
Selective enabling/disabling of IRQs is not the same as global masking/unmasking of maskable interrupts. When the IF flag of the eflags register is clear, each maskable interrupt issued by the PIC is temporarily ignored by the CPU.
如何在real-mode将内核代码置于1M之上?
系统启动时,kernel的第二部分,被放在0x100000起始的位置,也就是1M以上。
这是如何做到的呢,此时CPU还处在real-mode?
答案很简单:kernel是bootloader放的,通过对u-boot代码的阅读,u-boot加载kernel image时是进入了protected-mode的,当加载完成之后,需要将控制权交给linux os kernel的之前那一刻,又将CPU带回real-mode.
OK, real-mode CPU可以寻址1M以上的空间,但只能寻到64k,很显然,这不够存放第二部分kernel image。
补:interrupt内核驱动架构:
参考:Document/DocBook/genericirq.tmpl
Wiki:
wangjianchangdx2011-10-30 22:41:42
refer to:
http://code.google.com/p/wjcdx-learning/wiki/interrupt