Chinaunix首页 | 论坛 | 博客
  • 博客访问: 60163
  • 博文数量: 35
  • 博客积分: 2000
  • 博客等级: 大尉
  • 技术积分: 390
  • 用 户 组: 普通用户
  • 注册时间: 2009-04-23 13:36
文章分类

全部博文(35)

文章存档

2011年(1)

2010年(1)

2009年(33)

我的朋友
最近访客

分类: LINUX

2009-04-25 21:31:58

1boot
2内核初始化
3init开始根据inittab执行
4等一系列系统进程创建完成启动login
调用x窗口,这时候输入用户名和密码登陆到linux
附1dmesg结果
Linux version 2.4.20-8 (bhcompile@porky.devel.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 Thu Mar 13 17:54:28 EST 2003
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000ca000 - 00000000000cc000 (reserved)
 BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000007ef0000 (usable)
 BIOS-e820: 0000000007ef0000 - 0000000007efc000 (ACPI data)
 BIOS-e820: 0000000007efc000 - 0000000007f00000 (ACPI NVS)
 BIOS-e820: 0000000007f00000 - 0000000008000000 (usable)
 BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
128MB LOWMEM available.
On node 0 totalpages: 32768
zone(0): 4096 pages.
zone(1): 28672 pages.
zone(2): 0 pages.
Kernel command line: ro root=LABEL=/
Initializing CPU#0
Detected 697.571 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 1395.91 BogoMIPS
Memory: 124588k/131072k available (1347k kernel code, 5008k reserved, 999k data, 132k init, 0k highmem)
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer-cache hash table entries: 8192 (order: 3, 32768 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 0383fbff 00000000 00000000 00000000
CPU:             Common caps: 0383fbff 00000000 00000000 00000000
CPU: Intel Pentium III (Coppermine) stepping 08
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfd9a0, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Using IRQ router PIIX [8086/7110] at 00:07.0
PCI: Cannot allocate resource region 4 of device 00:07.1
Limiting direct PCI/PCI transfers.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16)
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS MULTIPORT SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
NET4: Frame Diverter 0.46
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00beta-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
    ide1: BM-DMA at 0x1478-0x147f, BIOS settings: hdc:DMA, hdd:pio
hdc: VMware Virtual IDE CDROM Drive, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
ide-floppy driver 0.99.newide
ide-floppy driver 0.99.newide
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 1024 buckets, 8Kbytes
TCP: Hash tables configured (established 8192 bind 16384)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
RAMDISK: Compressed image found at block 0
Freeing initrd memory: 247k freed
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 11 for device 00:10.0
scsi: ***** BusLogic SCSI Driver Version 2.1.15 of 17 August 1998 *****
scsi: Copyright 1995-1998 by Leonard N. Zubkoff
scsi0: Configuring BusLogic Model BT-958 PCI Wide Ultra SCSI Host Adapter
scsi0:   Firmware Version: 5.07B, I/O Address: 0x1440, IRQ Channel: 11/Level
scsi0:   PCI Bus: 0, Device: 16, Address: 0xE8800000, Host Adapter SCSI ID: 7
scsi0:   Parity Checking: Enabled, Extended Translation: Enabled
scsi0:   Synchronous Negotiation: Ultra, Wide Negotiation: Enabled
scsi0:   Disconnect/Reconnect: Enabled, Tagged Queuing: Enabled
scsi0:   Scatter/Gather Limit: 128 of 8192 segments, Mailboxes: 211
scsi0:   Driver Queue Depth: 211, Host Adapter Queue Depth: 192
scsi0:   Tagged Queue Depth: Automatic, Untagged Queue Depth: 3
scsi0:   Error Recovery Strategy: Default, SCSI Bus Reset: Enabled
scsi0: *** BusLogic BT-958 Initialized Successfully ***
scsi0 : BusLogic BT-958
  Vendor: VMware,   Model: VMware Virtual S  Rev: 1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02
scsi0: Target 0: Queue Depth 28, Asynchronous
scsi0: Target 1: Queue Depth 3, Asynchronous
scsi0: Target 2: Queue Depth 3, Asynchronous
scsi0: Target 3: Queue Depth 3, Asynchronous
scsi0: Target 4: Queue Depth 3, Asynchronous
scsi0: Target 5: Queue Depth 3, Asynchronous
scsi0: Target 6: Queue Depth 3, Asynchronous
scsi0: Target 7: Queue Depth 3, Asynchronous
scsi0: Target 8: Queue Depth 3, Asynchronous
scsi0: Target 9: Queue Depth 3, Asynchronous
scsi0: Target 10: Queue Depth 3, Asynchronous
scsi0: Target 11: Queue Depth 3, Asynchronous
scsi0: Target 12: Queue Depth 3, Asynchronous
scsi0: Target 13: Queue Depth 3, Asynchronous
scsi0: Target 14: Queue Depth 3, Asynchronous
scsi0: Target 15: Queue Depth 3, Asynchronous
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 18874368 512-byte hdwr sectors (9664 MB)
Partition check:
 sda: sda1 sda2 sda3
Journalled Block Device driver loaded
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Freeing unused kernel memory: 132k freed
scsi0: Tagged Queuing now active for Target 0
usb.c: registered new driver usbdevfs
usb.c: registered new driver hub
usb-uhci.c: $Revision: 1.275 $ time 17:59:01 Mar 13 2003
usb-uhci.c: High bandwidth mode enabled
PCI: Found IRQ 9 for device 00:07.2
PCI: Sharing IRQ 9 with 00:12.0
usb-uhci.c: USB UHCI at I/O 0x1060, IRQ 9
usb-uhci.c: Detected 2 ports
usb.c: new USB bus registered, assigned bus number 1
hub.c: USB hub found
hub.c: 2 ports detected
usb-uhci.c: v1.275:USB Universal Host Controller Interface driver
usb.c: registered new driver hiddev
usb.c: registered new driver hid
hid-core.c: v1.8.1 Andreas Gal, Vojtech Pavlik
hid-core.c: USB HID support drivers
mice: PS/2 mouse device common for all mice
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,2), internal journal
Adding Swap: 257032k swap-space (priority -1)
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
IA-32 Microcode Update Driver: v1.11
microcode: CPU0 no microcode found! (sig=688, pflags=1)
parport0: PC-style at 0x378 [PCSPP,TRISTATE]
ip_tables: (C) 2000-2002 Netfilter core team
VMware vmxnet virtual NIC driver release 3.1.0 build-34685
PCI: Found IRQ 10 for device 00:11.0
Found vmxnet/PCI at 0x10a4, irq 10.
vmxnet: numRxBuffers=(100*24) numTxBuffers=(100*64) driverDataSize=9000
divert: allocating divert_blk for eth0
eth0: vmxnet ether at 0x10a4 assigned IRQ 10.
pcnet32.c:v1.27b 01.10.2002 tsbogend@alpha.franken.de
vmxnet_init_ring: offset=9000 length=9000
ide-floppy driver 0.99.newide
hdc: ATAPI 20X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.12
parport0: PC-style at 0x378 [PCSPP,TRISTATE]
lp0: using parport0 (polling).
lp0: console ready
mtrr: your processor doesn't support write-combining
cdrom: This disc doesn't have any tracks I recognize!
附2system.map的部分内容
c0100000 A _text
c0100000 t startup_32
c01000a5 t checkCPUtype
c0100133 t is486
c0100142 t is386
c010018c t L6
c010018e t ready
c010018f t check_x87
c01001b6 t setup_idt
c01001d3 t rp_sidt
c01001e0 T stack_start
c01001e8 t int_msg
c01001fc t ignore_int
c010021e T idt_descr
c0100224 T cpu_gdt_descr
c0101000 T swapper_pg_dir
c0102000 T pg0
c0103000 T pg1
c0104000 T empty_zero_page
c0105000 T _stext
c0105000 T stext
c0105000 t rest_init
c0105040 t init
c0105170 t do_linuxrc
c0105270 T prepare_namespace
c01053c0 t huft_build
c0105900 t huft_free
c0105930 t inflate_codes
c0105e10 t inflate_stored
c0105fc0 t inflate_fixed
c0106140 t inflate_dynamic
c0106760 t inflate_block
c0106870 t inflate
c0106920 t makecrc
c01069b0 t gunzip
c0106f80 t __constant_c_and_count_memset
c0107020 T disable_hlt
c0107030 T enable_hlt
c0107040 T default_idle
c0107080 t poll_idle
c01070b0 T cpu_idle
c0107110 T machine_real_restart
c01071d0 T machin
e_restart
c0107250 T machine_halt
c0107260 T machine_power_off
c0107280 T show_regs
c0107428 t kernel_thread_helper
c0107440 T arch_kernel_thread
c0107500 T exit_thread
c0107510 T flush_thread
c0107590 T release_thread
c01075e0 T copy_thread
c01077a0 T dump_thread
c01078c0 T dump_task_regs
c0107970 T __switch_to
c0107aa0 T sys_fork
c0107af0 T sys_clone
c0107b60 T sys_vfork
c0107bb0 T sys_execve
c0107c30 T get_wchan
c0107cb0 t get_free_idx
c0107cf0 T sys_set_thread_area
c0107e90 T sys_get_thread_area
c0107fd0 t __constant_memcpy
c01080e0 t __constant_c_and_count_memset
c0108180 t __constant_copy_to_user
c0108200 t __constant_copy_from_user
c01082a0 T __up
c01082c0 T __down
c0108370 T __down_interruptible
c0108440 T __down_trylock
c010847c T __down_failed
c0108488 T __down_failed_interruptible
c0108494 T __down_failed_trylock
c01084a0 T __up_wakeup
c01084b0 T copy_siginfo_to_user
c0108570 T sys_sigsuspend
c01085f0 T sys_rt_sigsuspend
c01086c0 T sys_sigaction
c01087f0 T sys_sigaltstack
c0108820 t restore_sigcontext
c0108960 T sys_sigreturn
c0108a40 T sys_rt_sigreturn
c0108b50 t setup_sigcontext
c0108c80 t setup_frame
c0108e90 t setup_rt_frame
c0109170 t handle_signal
c0109290 T do_signal
c0109370 t sigorsets
c01093b0 t __constant_copy_from_user
c010944c T lcall7
c010949c T lcall27
c01094ec T ret_from_fork
c0109504 T system_call
c010953c T ret_from_sys_call
c010954d t restore_all
c010955c t signal_return
c0109574 t v86_signal_return
c0109584 t tracesys
c01095a7 t tracesys_exit
c01095b1 t badsys
c01095c0 T ret_from_intr
c01095c7 t ret_from_exception
c01095e0 t reschedule
c01095ec T divide_error
c01095f4 t error_code
c0109630 T coprocessor_error
c010963c T simd_coprocessor_error
c0109648 T device_not_available
c0109678 t device_not_available_emulate
c0109688 T debug
c0109694 T nmi
c01096c4 T int3
c01096d0 T overflow
c01096dc T bounds
c01096e8 T invalid_op
c01096f4 T coprocessor_segment_overrun
c0109700 T double_fault
c010970c T invalid_TSS
c0109718 T segment_not_present
c0109724 T stack_segment
c0109730 T general_protection
c010973c T alignment_check
c0109748 T page_fault
c0109754 T machine_check
c0109760 T spurious_interrupt_bug
c0109770 T show_trace
c0109860 T show_trace_task
c0109880 T show_stack
c0109910 T dump_stack
c0109930 T show_registers
c0109ae0 t handle_BUG
c0109b90 T die
c0109c10 T do_divide_error
c0109c80 T do_int3
c0109cd0 T do_overflow
c0109d20 T do_bounds
c0109d70 T do_invalid_op
c0109de0 T do_device_not_available
c0109e30 T do_double_fault
c0109e80 T do_coprocessor_segment_overrun
c0109ed0 T do_invalid_TSS
c0109f20 T do_segment_not_present
c0109f70 T do_stack_segment
c0109fc0 T do_alignment_check
c010a030 T do_general_protection
c010a0e0 t mem_parity_error
c010a120 t io_check_error
c010a180 t unknown_nmi_error
c010a1c0 t default_do_nmi
c010a240 t dummy_nmi_callback
c010a250 T do_nmi
c010a290 T set_nmi_callback
c010a2a0 T unset_nmi_callback
c010a2b0 T do_debug
c010a3e0 T math_error
c010a500 T do_coprocessor_error
c010a520 T simd_math_error
c010a600 T do_simd_coprocessor_error
c010a6a0 T do_spurious_interrupt_bug
c010a6b0 T math_state_restore
c010a6f0 T math_emulate
c010a740 T set_intr_gate
c010a770 t do_trap
c010a890 T no_action
c010a8a0 t enable_none
c010a8b0 t startup_none
c010a8c0 t disable_none
c010a8d0 t ack_none
c010a8f0 T get_irq_list
c010aa70 T handle_IRQ_event
c010aae0 T disable_irq
c010ab40 T enable_irq
c010abd0 T do_IRQ
c010acc0 T request_irq
c010ada0 T free_irq
c010ae30 T probe_irq_on
c010af70 T probe_irq_mask
c010aff0 T probe_irq_off
c010b060 T setup_irq
c010b100 t parse_hex_value
c010b1b0 t prof_cpu_mask_read_proc
c010b1e0 t prof_cpu_mask_write_proc
c010b220 t register_irq_proc
c010b2b0 T init_irq_proc
c010b330 T disable_irq_nosync
c010b380 t __constant_c_and_count_memset
c010b41a t .text.lock.irq
c010b440 T save_v86_state
c010b550 t mark_screen_rdonly
c010b620 T sys_vm86old
c010b710 T sys_vm86
c010b820 t do_sys_vm86
c010b930 t do_int
c010bc20 T handle_vm86_trap
c010bd00 T handle_vm86_fault
c010c5e0 t irq_handler
c010c650 T release_x86_irqs
c010c6b0 t do_vm86_irq_handling
c010c9b0 t __constant_c_and_count_memset
c010ca50 t __constant_copy_to_user
c010cad0 t __constant_copy_from_user
c010cb70 t putreg
c010cc40 t getreg
c010cca0 T ptrace_disable
c010ccd0 t ptrace_get_thread_area
c010cdd0 t ptrace_set_thread_area
c010cf20 T sys_ptrace
c010d5a0 T syscall_trace
c010d630 t __constant_copy_to_user
c010d6b0 t __constant_copy_from_user
c010d750 t common_interrupt
c010d763 t call_do_IRQ
c010d770 t IRQ0x00_interrupt
c010d778 t IRQ0x01_interrupt
c010d780 t IRQ0x02_interrupt
c010d788 t IRQ0x03_interrupt
c010d790 t IRQ0x04_interrupt
c010d798 t IRQ0x05_interrupt
c010d7a0 t IRQ0x06_interrupt
c010d7a8 t IRQ0x07_interrupt
c010d7b0 t IRQ0x08_interrupt
c010d7b8 t IRQ0x09_interrupt
c010d7c0 t IRQ0x0a_interrupt
c010d7c8 t IRQ0x0b_interrupt
c010d7d0 t IRQ0x0c_interrupt
c010d7dc t IRQ0x0d_interrupt
c010d7e8 t IRQ0x0e_interrupt
c010d7f4 t IRQ0x0f_interrupt
c010d800 t end_8259A_irq
c010d820 t startup_8259A_irq
c010d840 T disable_8259A_irq
c010d870 T enable_8259A_irq
c010d8a0 T i8259A_irq_pending
c010d8d0 T make_8259A_irq
c010d920 T mask_and_ack_8259A
c010d9f0 t math_error_irq
c010da20 t set_bitmap
c010dac0 T sys_ioperm
c010dbe0 T sys_iopl
c010dc60 t __constant_c_and_count_memset
c010dd00 t alloc_ldt
c010de90 T init_new_context
c010df90 T destroy_context
c010dfe0 t read_ldt
c010e0d0 t read_default_ldt
c010e120 t write_ldt
c010e320 T sys_modify_ldt
c010e3a0 t __constant_copy_from_user
c010e43b t .text.lock.ldt
c010e480 t show_cpuinfo
c010e6a0 t c_start
附录三kernel启动
The previous post explained how computers boot up right up to the point where the boot loader, after stuffing the kernel image into memory, is about to jump into the kernel entry point. This last post about booting takes a look at the guts of the kernel to see how an operating system starts life. Since I have an empirical bent I'll link heavily to the sources for Linux kernel 2.6.25.6 at the Linux Cross Reference. The sources are very readable if you are familiar with C-like syntax; even if you miss some details you can get the gist of what's happening. The main obstacle is the lack of context around some of the code, such as when or why it runs or the underlying features of the machine. I hope to provide a bit of that context. Due to brevity (hah!) a lot of fun stuff - like interrupts and memory - gets only a nod for now. The post ends with the highlights for the Windows boot.

At this point in the Intel x86 boot story the processor is running in real-mode, is able to address 1 MB of memory, and RAM looks like this for a modern Linux system:

RAM contents after boot loader runs
RAM contents after boot loader is done

The kernel image has been loaded to memory by the boot loader using the BIOS disk I/O services. This image is an exact copy of the file in your hard drive that contains the kernel, e.g. /boot/vmlinuz-2.6.22-14-server. The image is split into two pieces: a small part containing the real-mode kernel code is loaded below the 640K barrier; the bulk of the kernel, which runs in protected mode, is loaded after the first megabyte of memory.

The action starts in the real-mode kernel header pictured above. This region of memory is used to implement the Linux boot protocol between the boot loader and the kernel. Some of the values there are read by the boot loader while doing its work. These include amenities such as a human-readable string containing the kernel version, but also crucial information like the size of the real-mode kernel piece. The boot loader also writes values to this region, such as the memory address for the command-line parameters given by the user in the boot menu. Once the boot loader is finished it has filled in all of the parameters required by the kernel header. It's then time to jump into the kernel entry point. The diagram below shows the code sequence for the kernel initialization, along with source directories, files, and line numbers:

Architecture-specific Linux Kernel Initialization
Architecture-specific Linux Kernel Initialization

The early kernel start-up for the Intel architecture is in file arch/x86/boot/header.S. It's in assembly language, which is rare for the kernel at large but common for boot code. The start of this file actually contains boot sector code, a left over from the days when Linux could work without a boot loader. Nowadays this boot sector, if executed, only prints a "bugger_off_msg" to the user and reboots. Modern boot loaders ignore this legacy code. After the boot sector code we have the first 15 bytes of the real-mode kernel header; these two pieces together add up to 512 bytes, the size of a typical disk sector on Intel hardware.

After these 512 bytes, at offset 0×200, we find the very first instruction that runs as part of the Linux kernel: the real-mode entry point. It's in header.S:110 and it is a 2-byte jump written directly in machine code as 0×3aeb. You can verify this by running hexdump on your kernel image and seeing the bytes at that offset - just a sanity check to make sure it's not all a dream. The boot loader jumps into this location when it is finished, which in turn jumps to header.S:229 where we have a regular assembly routine called start_of_setup. This short routine sets up a stack, zeroes the bss segment (the area that contains static variables, so they start with zero values) for the real-mode kernel and then jumps to good old C code at arch/x86/boot/main.c:122.

main() does some house keeping like detecting memory layout, setting a video mode, etc. It then calls go_to_protected_mode(). Before the CPU can be set to protected mode, however, a few tasks must be done. There are two main issues: interrupts and memory. In real-mode the interrupt vector table for the processor is always at memory address 0, whereas in protected mode the location of the interrupt vector table is stored in a CPU register called IDTR. Meanwhile, the translation of logical memory addresses (the ones programs manipulate) to linear memory addresses (a raw number from 0 to the top of the memory) is different between real-mode and protected mode. Protected mode requires a register called GDTR to be loaded with the address of a Global Descriptor Table for memory. So go_to_protected_mode() calls setup_idt() and setup_gdt() to install a temporary interrupt descriptor table and global descriptor table.

We're now ready for the plunge into protected mode, which is done by protected_mode_jump, another assembly routine. This routine enables protected mode by setting the PE bit in the CR0 CPU register. At this point we're running with paging disabled; paging is an optional feature of the processor, even in protected mode, and there's no need for it yet. What's important is that we're no longer confined to the 640K barrier and can now address up to 4GB of RAM. The routine then calls the 32-bit kernel entry point, which is startup_32 for compressed kernels. This routine does some basic register initializations and calls decompress_kernel(), a C function to do the actual decompression.

decompress_kernel() prints the familiar "Decompressing Linux…" message. Decompression happens in-place and once it's finished the uncompressed kernel image has overwritten the compressed one pictured in the first diagram. Hence the uncompressed contents also start at 1MB. decompress_kernel() then prints "done." and the comforting "Booting the kernel." By "Booting" it means a jump to the final entry point in this whole story, given to Linus by God himself atop Mountain Halti, which is the protected-mode kernel entry point at the start of the second megabyte of RAM (0×100000). That sacred location contains a routine called, uh, startup_32. But this one is in a different directory, you see.

The second incarnation of startup_32 is also an assembly routine, but it contains 32-bit mode initializations. It clears the bss segment for the protected-mode kernel (which is the true kernel that will now run until the machine reboots or shuts down), sets up the final global descriptor table for memory, builds page tables so that paging can be turned on, enables paging, initializes a stack, creates the final interrupt descriptor table, and finally jumps to to the architecture-independent kernel start-up, start_kernel(). The diagram below shows the code flow for the last leg of the boot:

Architecture-independent Linux Kernel Initialization
Architecture-independent Linux Kernel Initialization

start_kernel() looks more like typical kernel code, which is nearly all C and machine independent. The function is a long list of calls to initializations of the various kernel subsystems and data structures. These include the scheduler, memory zones, time keeping, and so on. start_kernel() then calls rest_init(), at which point things are almost all working. rest_init() creates a kernel thread passing another function, kernel_init(), as the entry point. rest_init() then calls schedule() to kickstart task scheduling and goes to sleep by calling cpu_idle(), which is the idle thread for the Linux kernel. cpu_idle() runs forever and so does process zero, which hosts it. Whenever there is work to do - a runnable process - process zero gets booted out of the CPU, only to return when no runnable processes are available.

But here's the kicker for us. This idle loop is the end of the long thread we followed since boot, it's the final descendent of the very first jump executed by the processor after power up. All of this mess, from reset vector to BIOS to MBR to boot loader to real-mode kernel to protected-mode kernel, all of it leads right here, jump by jump by jump it ends in the idle loop for the boot processor, cpu_idle(). Which is really kind of cool. However, this can't be the whole story otherwise the computer would do no work.

At this point, the kernel thread started previously is ready to kick in, displacing process 0 and its idle thread. And so it does, at which point kernel_init() starts running since it was given as the thread entry point. kernel_init() is responsible for initializing the remaining CPUs in the system, which have been halted since boot. All of the code we've seen so far has been executed in a single CPU, called the boot processor. As the other CPUs, called application processors, are started they come up in real-mode and must run through several initializations as well. Many of the code paths are common, as you can see in the code for startup_32, but there are slight forks taken by the late-coming application processors. Finally, kernel_init() calls init_post(), which tries to execute a user-mode process in the following order: /sbin/init, /etc/init, /bin/init, and /bin/sh. If all fail, the kernel will panic. Luckily init is usually there, and starts running as PID 1. It checks its configuration file to figure out which processes to launch, which might include X11 Windows, programs for logging in on the console, network daemons, and so on. Thus ends the boot process as yet another Linux box starts running somewhere. May your uptime be long and untroubled.

The process for Windows is similar in many ways, given the common architecture. Many of the same problems are faced and similar initializations must be done. When it comes to boot one of the biggest differences is that Windows packs all of the real-mode kernel code, and some of the initial protected mode code, into the boot loader itself (C:\NTLDR). So instead of having two regions in the same kernel image, Windows uses different binary images. Plus Linux completely separates boot loader and kernel; in a way this automatically falls out of the open source process. The diagram below shows the main bits for the Windows kernel:

Windows Kernel Initialization
Windows Kernel Initialization

The Windows user-mode start-up is naturally very different. There's no /sbin/init, but rather Csrss.exe and Winlogon.exe. Winlogon spawns Services.exe, which starts all of the Windows Services, and Lsass.exe, the local security authentication subsystem. The classic Windows login dialog runs in the context of Winlogon.

This is the end of this boot series. Thanks everyone for reading and for feedback. I'm sorry some things got superficial treatment; I've gotta start somewhere and only so much fits into blog-sized bites. But nothing like a day after the next; my plan is to do regular "Software Illustrated" posts like this series along with other topics. Meanwhile, here are some resources:

    * The best, most important resource, is source code for real kernels, either Linux or one of the BSDs.
    * Intel publishes excellent Software Developer's Manuals, which you can download for free.
    * Understanding the Linux Kernel is a good book and walks through a lot of the Linux Kernel sources. It's getting outdated and it's dry, but I'd still recommend it to anyone who wants to grok the kernel. Linux Device Drivers is more fun, teaches well, but is limited in scope. Finally, Patrick Moroney suggested Linux Kernel Development by Robert Love in the comments for this post. I've heard other positive reviews for that book, so it sounds worth checking out.
    * For Windows, the best reference by far is Windows Internals by David Solomon and Mark Russinovich, the latter of Sysinternals fame. This is a great book, well-written and thorough. The main downside is the lack of source code.




下面是关于如何装在根文件系统的内容
One minor additional complexity. The initial root filesystem is, by default, assembled from the contents of the usr/ subdirectory in the kernel source tree (it's a compressed cpio archive) and linked into the kernel image; alternatively, a compressed filesystem image can be linked into the kernel image or provided in a separate file. Part of the boot process (in init/main.c:do_basic_setup()) involves executing 'initcalls', which are stored in an array of pointers to functions to be called at boot time, constructed by the linker. One of these initcalls is init/initramfs.c:populate_rootfs(), which initializes the nonswappable memory-backed filesystem which is always mounted at / (the *real* root filesystem is mounted over the top of it, later on). The rootfs is never unmounted: you can see it as the first entry in /proc/mounts. Then it uncompresses the cpio archive or arranges for the filesystem to be backed by that compressed filesystem image, if either are present, and executes /init on that filesystem, if present, to complete the boot via an 'early userspace', chroot to the real root filesystem once it's found it, and exec the real init. So the job of finding root filesystems is *completely* customizable. You can assemble it from a RAID array with some components pulled over the network if you like (I've done this in extremis as part of disaster recovery).

Finally, if that didn't work and we still don't have a useful root filesystem with an /sbin/init on it, just before calling init_post(), the system may call prepare_namespace() in init/do_mounts.c. This can try to dig up a root filesystem in a variety of ways: pausing for a configurable amount of time so the user can do something to provide a filesystem, waiting for delayed device probes in case the root filesystem is on some slow-to-start thing like a SCSI disk or a USB key, doing automated RAID probing (somewhat dangerous because it can't tell if the array it's assembling is actually made of pieces that are meant to go together: the recommended way to boot off RAID is to use one of the earlier customizable boot processes and run the mdadm tool in there to do the assembly), mounting a block device specified via root= on the kernel command line, or even asking the user to insert a separate floppy containing the root filesystem (I'm not sure *anyone* does this anymore, even in emergencies).

I haven't got into the half a dozen horrible ways the various early userspaces can signal their completion (echoing the real device numbers into a file in /proc, executing the horrible 'pivot_root()' syscall, or just deleting everything on the rootfs and doing a 'chroot exec /sbin/init' into the real root filesystem, which is the modern way to boot up because it doesn't rely on any horrible early-userspace-specific hacks). For more, see Documentation/filesystems/ramfs-rootfs-initramfs.txt and Documentation/initrd.txt in your favourite Linux kernel tree.
 
阅读(1197) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~