Chinaunix首页 | 论坛 | 博客
  • 博客访问: 304654
  • 博文数量: 69
  • 博客积分: 3093
  • 博客等级: 中校
  • 技术积分: 626
  • 用 户 组: 普通用户
  • 注册时间: 2009-08-17 13:48
文章分类

全部博文(69)

文章存档

2011年(27)

2010年(11)

2009年(31)

分类: LINUX

2009-12-07 16:49:25

First entry to the Linux kernel 2.4
  • After minimum hardware setup, the bootstrap code (described in head.S) calls "start_kernel".

arch/i386/kernel/head.S

...
call SYMBOL_NAME(start_kernel)
...

  • start_kernel executes many initialize routines and activate a thread of init().
  • init calls do_basic_setup() and start the first process, /sbin/init.
  • In do_basic_setup(), sockets are initialized (sock_init) and kernel-embedded initialize routines are sequentially called (do_initcalls).
init/main.c

asmlinkage void __init start_kernel(void)
{
...
printk(linux_banner); // "linux_banner" is defined in init/version.c (W.N.).
...
... // Dozens of initialize routines
...
kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
...
cpu_idle();
}

static int init(void * unused)
{
...
do_basic_setup();
...
execve("/sbin/init",argv_init,envp_init);
...
}

static void __init do_basic_setup(void)
{
...
sock_init(); // net/socket.c (SEE BELOW)
...
do_initcalls()
;
...
}

static void __init do_initcalls(void)
{
initcall_t *call;

call = &__initcall_start;
do {
(*call)();
call++;
} while (call < &__initcall_end);
...
}

  • Let's confirm the contents of sock_init().
  • There is a familiar welcome message, and sock_init() clears net_families[] with null pointers. At this moment, no protocols are registered in the kernel.
net/socket.c

...

/*
* The protocol list. Each protocol is registered in here.
*/

static struct net_proto_family *net_families[NPROTO]; // Current NPROTO is defined as 32
// in (W.N.).
...

void __init sock_init(void)
{
int i;

printk(KERN_INFO "Linux NET4.0 for Linux 2.4\n");
printk(KERN_INFO "Based upon Swansea University Computer Society NET3.039\n");

/*
* Initialize all address (protocol) families.
*/

for (i = 0; i < NPROTO; i++)
net_families[i] = NULL;
...
/*
* Initialize the protocols module.
*/

register_filesystem(&sock_fs_type);
sock_mnt = kern_mount(&sock_fs_type);

/* The real protocol initialization is performed when
* do_initcalls is run.
*/
...
}

initcall array in .initcall.init section

  • To trace the process of do_initcalls(), introduce a short printk() statement in it as follows. The code means that there is an array holding each entry address of initializer. The array starts at __initcall_start and ends before __initcall_end. In addition, I inserted the same notifier in drivers/net/loopback.c. Loopback device is the most simple network interface and every kernel includes it.
modified do_initcalls() in init/main.c
static void __init do_initcalls(void)
{
initcall_t *call;

call = &__initcall_start;
do {
printk(KERN_INFO "+++ do_initcall: %08X\n", call); // Dump the entry address of initializer (W.N.).
(*call)();
call++;
} while (call < &__initcall_end);

/* Make sure there is no pending stuff from the initcall sequence */
flush_scheduled_tasks();
}



modified loopback_init() in drivers/net/loopback.c
int __init loopback_init(struct net_device *dev)
{
printk(KERN_INFO "=== Executing loopback_init ===\n");
dev->mtu = PAGE_SIZE - LOOPBACK_OVERHEAD;
dev->hard_start_xmit = loopback_xmit;
dev->hard_header = eth_header;
dev->hard_header_cache = eth_header_cache;
dev->header_cache_update= eth_header_cache_update;
dev->hard_header_len = ETH_HLEN; /* 14 */
dev->addr_len = ETH_ALEN; /* 6 */
...
};

  • Then recompile the kernel, install, reboot it, and confirm the startup message. Note that printk tells calling address starts at C029F4E8 and reaches at C029F574 (the stepwidth is 4 bytes). I imposed comments at important locations.
  • At first, you can see the banner message from sock_init().
  • You can also find the important fact that loopback_init() is lastly called in partition_setup().
  • dummy_init_module() and network interface initializer (RTL-8139 in my case) succeed.
  • Then, inet_init() and af_unix_init().
dmesg on my laptop
Linux version 2.4.3 (root@mebius) (gcc version 2.95.3 20010315 (Debian release)) #9 Tue Apr 3 17:37:
44 JST 2001
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000ebc00 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000007ff0000 (usable)
BIOS-e820: 0000000007ff0000 - 0000000007fffc00 (ACPI data)
BIOS-e820: 0000000007fffc00 - 0000000008000000 (ACPI NVS)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
On node 0 totalpages: 32752
zone(0): 4096 pages.
zone(1): 28656 pages.
zone(2): 0 pages.
Kernel command line: root=/dev/hda1 mem=131008K
Initializing CPU#0
Detected 333.350 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 665.19 BogoMIPS
Memory: 126564k/131008k available (1076k kernel code, 4056k reserved, 387k data, 184k init, 0k highm
em)
Dentry-cache hash table entries: 16384 (order: 5, 131072 bytes)
Buffer-cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
CPU: Before vendor init, caps: 0183f9ff 00000000 00000000, vendor = 0
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: After vendor init, caps: 0183f9ff 00000000 00000000 00000000
CPU: After generic, caps: 0183f9ff 00000000 00000000 00000000
CPU: Common caps: 0183f9ff 00000000 00000000 00000000
CPU: Intel Mobile Pentium II stepping 0a
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfd9be, last bus=0
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
got res[10000000:10000fff] for resource 0 of Ricoh Co Ltd RL5c475
Limiting direct PCI/PCI transfers.
Linux NET4.0 for Linux 2.4 // Message from sock_init()
Based upon Swansea University Computer Society NET3.039

+++ do_initcall: C029F4E8 // do_initcalls() START
+++ do_initcall: C029F4EC
+++ do_initcall: C029F4F0
// apm_init() in arch/i386/kernel/kernel.o
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.14)
+++ do_initcall: C029F4F4
+++ do_initcall: C029F4F8
+++ do_initcall: C029F4FC
// kswapd_init() in mm/mm.o
Starting kswapd v1.8
+++ do_initcall: C029F500
+++ do_initcall: C029F504
+++ do_initcall: C029F508
+++ do_initcall: C029F50C
+++ do_initcall: C029F510
+++ do_initcall: C029F514
+++ do_initcall: C029F518
+++ do_initcall: C029F51C
+++ do_initcall: C029F520
+++ do_initcall: C029F524
+++ do_initcall: C029F528
// partition_setup() in fs/fs.o
pty: 256 Unix98 ptys configured
block: queued sectors max/low 84058kB/28019kB, 256 slots per queue
RAMDISK driver initialized: 16 RAM disks of 8000K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfc90-0xfc97, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xfc98-0xfc9f, BIOS settings: hdc:pio, hdd:pio
hda: TOSHIBA MK8113MAT, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: 16006410 sectors (8195 MB), CHS=996/255/63, UDMA(33)
Partition check:
hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 >
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
=== Executing loopback_init === // loopback initialization is here!
+++ do_initcall: C029F52C // ext2_fs() in fs/fs.o
+++ do_initcall: C029F530
+++ do_initcall: C029F534
+++ do_initcall: C029F538
+++ do_initcall: C029F53C
+++ do_initcall: C029F540

loop: loaded (max 8 devices)
+++ do_initcall: C029F544
Serial driver version 5.05 (2000-12-13) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
+++ do_initcall: C029F548 // dummy_init_module() in drivers/net/net.o
+++ do_initcall: C029F54C // rtl8139_init_module() in drivers/net/net.o
8139too Fast Ethernet driver 0.9.15c loaded
PCI: Found IRQ 9 for device 00:03.0
PCI: The same IRQ used for device 00:07.2
eth0: RealTek RTL8139 Fast Ethernet at 0xc8800c00, 08:00:1f:06:79:20, IRQ 9
eth0: Identified 8139 chip type 'RTL-8139B'
+++ do_initcall: C029F550
+++ do_initcall: C029F554
+++ do_initcall: C029F558
+++ do_initcall: C029F55C
+++ do_initcall: C029F560
+++ do_initcall: C029F564 // inet_init() in net/network.o
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)

+++ do_initcall: C029F568 // af_unix_init() in net/network.o
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
+++ do_initcall: C029F56C
+++ do_initcall: C029F570
+++ do_initcall: C029F574 // atalk_init() in net/network.o
NET4: AppleTalk 0.18a for Linux NET4.0 // do_initcalls() END
fatfs: bogus cluster size
reiserfs: checking transaction log (device 03:01) ...
Using r5 hash to sort names
ReiserFS version 3.6.25
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 184k freed
Adding Swap: 128516k swap-space (priority -1)
eth0: Setting half-duplex based on auto-negotiated partner ability 0000.

initcalls mechanism

  • Now it is time to understand the mechanism of initcalls definitions.
  • Let's focus into the inet_init() in net/network.o.
  • "cd net" and "grep -r inet_init *" tells us that inet_init() is defined in net/ipv4/af_inet.c.
net/ipv4/af_inet.c
static int __init inet_init(void)
{
...
printk(KERN_INFO "NET4: Linux TCP/IP 1.0 for NET4.0\n");
...
}

module_init(inet_init);

  • We can confirm the startup message.
  • See __init and module_init(), these macros are defined in include/linux/init.h.
include/linux/init.h
#ifndef MODULE
#ifndef __ASSEMBLY__
...
typedef int (*initcall_t)(void);
...
extern initcall_t __initcall_start, __initcall_end;
#define __initcall(fn) \
static initcall_t __initcall_##fn __init_call = fn
...
#endif /* __ASSEMBLY__ */

/*
* Mark functions and data as being only used at initialization
* or exit time.
*/
#define __init __attribute__ ((__section__ (".text.init")))
...
#define __init_call __attribute__ ((unused,__section__ (".initcall.init")))
...
/**
* module_init() - driver initialization entry point
* @x: function to be run at kernel boot time or module insertion
*
* module_init() will add the driver initialization routine in
* the "__initcall.int" code segment if the driver is checked as
* "y" or static, or else it will wrap the driver initialization
* routine with init_module() which is used by insmod and
* modprobe when the driver is used as a module.
*/
#define module_init(x) __initcall(x);
...
#else // MODULE
...
#define __init
...
#define __initcall(fn)
...
#define module_init(x) \
int init_module(void) __attribute__((alias(#x))); \
extern inline __init_module_func_t __init_module_inline(void) \
{ return x; }
...
#endif // MODULE

  • init.h is controlled by the global macro MODULE. Makefile turns on this macro (-DMODULE) in case of module.
  • Currently, CONFIG_INET option can not be defined as module ("M") and it is compiled into the kernel ("y").
  • After preprocessing, af_inet.c will be rewritten to as follows.
net/ipv4/af_inet.c
static int __attribute__ ((__section__ (".text.init"))) inet_init(void)
{
...
printk(KERN_INFO "NET4: Linux TCP/IP 1.0 for NET4.0\n");
...
}

initcall_t __initcall_inet_init __attribute__ ((unused,__section__ (".initcall.init"))) = inet_init;

  • These expansions mean that
    • The text code of inet_init() will be stored into .text.init section. This mechanism is aided for freeing memory space of initializers after kernel startup.
    • A new variable __initcall_inet_init which holds the entry address of inet_init will be stored into .initcall.init section. Note that the macro definition in init.h says this variable is declared as static, so we can not confirm it in the kernel symbol table (only global scope addresses are registered in the table).
  • For hacking purpose, remove the static keyword and delcare __initcall_*** variables as global. CAUTION: This change may bring link errors in case of name collisions (netfilter is one of the case).
  • OK, let's hack the kernel!
modified include/linux/init.h
...
#define __initcall(fn) \
initcall_t __initcall_##fn __init_call = fn
...

How to inspect the kernel?

  • Linux kernel is not a special one. It is nothing more than an ELF object file like as /bin/ls.
  • So, you can examine the kernel (vmlinux) using nm, objdump, readelf, and so on.
  • By default, top level Makefile creates a System.map file for debugging, but it is nothing but a list of labels. So, I modified the Makefile to create an informative link map using "--cref -Map linux.map" options.
  • Now, "make vmlinux" creates vmlinux and outputs a linux.map.
/usr/src/linux/Makefile

vmlinux: $(CONFIGURATION) init/main.o init/version.o linuxsubdirs
$(LD) $(LINKFLAGS) $(HEAD) init/main.o init/version.o \
--start-group \
$(CORE_FILES) \
$(DRIVERS) \
$(NETWORKS) \
$(LIBS) \
--end-group \
--cref -Map linux.map \
-o vmlinux
$(NM) vmlinux | grep -v '\(compiled\)\|\(\.o$$\)\|\( [aUw] \)\|\(\.\.ng$$\)\|\(LASH[RL]DI\)'
| sort > System.map

  • Then, confirm the sections in vmlinux by objdump command. "-h" option outputs information from the section headers.
  • Yes, there are ".text.init" and ".initcall.init" sections described above.
  • Let's examine the contents of .initcall.init section.
objdump -h vmlinux
vmlinux:     file format elf32-i386


Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0010bf68 c0100000 c0100000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .text.lock 00001130 c020bf68 c020bf68 0010cf68 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .rodata 0004407c c020d0a0 c020d0a0 0010e0a0 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .kstrtab 000062fe c0251120 c0251120 00152120 2**5
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 __ex_table 00001418 c0257420 c0257420 00158420 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 __ksymtab 00001d68 c0258838 c0258838 00159838 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .data 00013abc c025a5a0 c025a5a0 0015b5a0 2**5
CONTENTS, ALLOC, LOAD, DATA
7 .data.init_task 00002000 c0270000 c0270000 00170000 2**5
CONTENTS, ALLOC, LOAD, DATA
8 .text.init 0000f56c c0272000 c0272000 00172000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
9 .data.init 0001de60 c0281580 c0281580 00181580 2**5
CONTENTS, ALLOC, LOAD, DATA
10 .setup.init 00000108 c029f3e0 c029f3e0 0019f3e0 2**2
CONTENTS, ALLOC, LOAD, DATA
11 .initcall.init 00000090 c029f4e8 c029f4e8 0019f4e8 2**2
CONTENTS, ALLOC, LOAD, DATA
12 .data.page_aligned 00000800 c02a0000 c02a0000 001a0000 2**5
CONTENTS, ALLOC, LOAD, DATA
13 .data.cacheline_aligned 00001fe0 c02a0800 c02a0800 001a0800 2**5
CONTENTS, ALLOC, LOAD, DATA
14 .bss 0002b3d8 c02a27e0 c02a27e0 001a27e0 2**5
ALLOC
15 .comment 00003bc9 00000000 00000000 001a27e0 2**0
CONTENTS, READONLY
16 .note 00001a90 00000000 00000000 001a63a9 2**0
CONTENTS, READONLY

  • The following is a part of linux.map.
  • __initcall_start and __initcall_end appeared in do_initcalls() are defined in the top and bottom of this section.
  • Their addresses are C029F4E8 and C029F578, respectively.
  • As I removed static keyword in __initcall() macro, __initcall_*** variables are visible to us. From this link table, we can completely trace the boot process.
  • TIPS: Every kernel hacker does grep, grep, and grep..., but this is quite time-consuming work. linux.map contains all of function entry points (its name and address) and tells us which object file contains it. Why not "less linux.map"?
linux.map

0xc029f4e8 __initcall_start=.

.initcall.init 0xc029f4e8 0x90
*(.initcall.init)
.initcall.init
0xc029f4e8 0xc arch/i386/kernel/kernel.o
0xc029f4e8 __initcall_dmi_scan_machine
0xc029f4ec __initcall_cpuid_init
0xc029f4f0 __initcall_apm_init
.initcall.init
0xc029f4f4 0x4 kernel/kernel.o
0xc029f4f4 __initcall_uid_cache_init
.initcall.init
0xc029f4f8 0xc mm/mm.o
0xc029f4f8 __initcall_kmem_cpucache_init
0xc029f4fc __initcall_kswapd_init
0xc029f500 __initcall_init_shmem_fs
.initcall.init
0xc029f504 0x3c fs/fs.o
0xc029f504 __initcall_bdflush_init
0xc029f508 __initcall_init_pipe_fs
0xc029f50c __initcall_fasync_init
0xc029f510 __initcall_filelock_init
0xc029f514 __initcall_dnotify_init
0xc029f518 __initcall_init_misc_binfmt
0xc029f51c __initcall_init_script_binfmt
0xc029f520 __initcall_init_elf_binfmt
0xc029f524 __initcall_init_proc_fs
0xc029f528 __initcall_partition_setup
0xc029f52c __initcall_init_ext2_fs
0xc029f530 __initcall_init_fat_fs
0xc029f534 __initcall_init_msdos_fs
0xc029f538 __initcall_init_iso9660_fs
0xc029f53c __initcall_init_reiserfs_fs
.initcall.init
0xc029f540 0x4 drivers/block/block.o
0xc029f540 __initcall_loop_init
.initcall.init
0xc029f544 0x4 drivers/char/char.o
0xc029f544 __initcall_rs_init
.initcall.init
0xc029f548 0x8 drivers/net/net.o
0xc029f548 __initcall_dummy_init_module
0xc029f54c __initcall_rtl8139_init_module
.initcall.init
0xc029f550 0x4 drivers/ide/idedriver.o
0xc029f550 __initcall_ide_cdrom_init
.initcall.init
0xc029f554 0x4 drivers/cdrom/driver.o
0xc029f554 __initcall_cdrom_init
.initcall.init
0xc029f558 0x4 drivers/pci/driver.o
0xc029f558 __initcall_pci_proc_init
.initcall.init
0xc029f55c 0x1c net/network.o
0xc029f55c __initcall_p8022_init
0xc029f560 __initcall_snap_init
0xc029f564 __initcall_inet_init
0xc029f568 __initcall_af_unix_init
0xc029f56c __initcall_netlink_proto_init
0xc029f570 __initcall_packet_init
0xc029f574 __initcall_atalk_init
0xc029f578 __initcall_end=.
0xc02a0000 .=ALIGN(0x1000)
0xc029f578 __init_end=.
0xc02a0000 .=ALIGN(0x1000)

  • Now, we noticed important things.
    1. The first code relating the network is sock_init().
    2. loopback device is initialized in partition_setup().
    3. dummy and Ethernet devices are initialized.
    4. TCP/IP is initialized in inet_init().
    5. UNIX domain sockets are initialized in af_unix_init().
  • I will discuss each subjects later on (please be patient).
阅读(1270) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~