linux
分类: LINUX
2012-04-11 17:09:46
By
Understanding the Linux boot process is crucial for being able to effectively troubleshoot a Linux system, when boot problems occur. For example, someone could have made some undocumented changes during the previous shift to a critical file that affects the boot process. Now, it's your shift, and you have to bring the machine down for a hardware upgrade. Upon boot up, the system crashes, and you're stuck there until you solve the issue. Learning the steps involved in the boot process, and knowing what happens at each phase, will help you quickly diagnose any problems that come up.
Many current Linux distributions mask the boot process from the user by
using a boot splash screen that can usually be bypassed by pressing the
This cannot be stressed enough: you need to get familiar with the boot process before problems arise. There are some key files involved with the boot process; e.g., an error in /boot/grub/grub.conf, /etc/fstab or /etc/inittab, and you're sure to have problems. So, in addition to knowing the boot process, it's a good idea to really learn these files.
Caution: if you do anything with /boot/grub/grub.conf, or any critical configuration file, do it only on a machine that's a practice computer (sandbox), until you fully understand the process. I use Virtualbox virtualization software, to practice recovering from various self-inflicted boot problems.
[ When working with critical files - and doubly so when you're not sure what you're doing - making backups will save you again and again. At the very least, you should make a copy, or preferably a series of copies of the file before changing it (e.g., "grub.conf.3.orig" for "grub.conf"); there are times when critical errors can go unnoticed for a while. -- Ben ]
Power-On Self Test
The first thing to occur in the boot process is the Power-On Self Test (POST) that is performed by the compter's Basic Input/Output System (BIOS), when it is powered on. This self test is just that -- an internal check of the system's internal components. Other architectures may use different ways to accomplish this, but the purpose is identical. This checking normally involves the system RAM, CPUs, video card, hard drives, and other motherboard components. Many other types of internal cards such as RAID cards and various other types of controller cards perform their own self-testing. If errors occur, some form of alert will be generated or sent to a panel or console; the POST uses a series of beeps to indicate the specific error encountered. If all goes well, the BIOS reads the master boot record on the hard disk and loads the program that is found there into memory.
Master Boot Record
The Master Boot Record (MBR) is the first 512 bytes of the boot drive that is read into memory by the BIOS. (This is assuming we are using an x86 architecture.) The first 446 bytes of that 512 will normally contain a low-level boot code that points to a boot loader somewhere else on the disk - it can even point to another hard disk. The next 64 bytes contain the partition table for the disk (four 16-byte entries known as the IBM Partition Table Scheme). Finally, the remaining 2 bytes are the "Magic Number" (used for error detection).
Boot LoaderThe purpose of a bootloader is to load the operating system. Many boot loaders are available; however, LILO and GRUB are the most common for Linux. Windows has its own, called the New Technology Loader (NTLDR). You may have at one time or another seen the message "NTLDR is missing"; this is a Windows bootloader error.
Regardless of the bootloader being used, it is important to keep in mind that bootloaders are very complex and easily rendered inoperative by inexperienced users with root access. In my opinion, the best practice is to explore their workings only on machines you are using as sandboxes or in virtual machines to sharpen your skills. One slight typographical error while working with, e.g., grub.conf, and you may be spending more time than you anticipated trying to fix your error; however, this too, can be a great learning experience, as long as it is on a sandbox and you can afford the time.
GRUB
The most common bootloader program in use today for modern Linux systems is the GRand Unified Bootloader (GRUB). This is the bootloader we will be talking about, here. The GRUB bootloader is a program written to the master boot record and /boot partition of a hard drive that loads the operating system. The bootloader code used to fit in the first 446 bytes of the MBR, but, owing to progressively increasing complexity of operating systems and the need to boot almost any operating system, it has grown in size. Currently, a subset of the bootloader code is written to the MBRi, and the remainder is written to the /boot partition. In addition, the GRUB bootloader is modular in design and works in stages which I will only cover briefly.
The stages of GRUB
GRUB works in stages called Stage1, Stage1_5, and Stage 2. I will give a brief overview of what each stage does.
Stage1
The stage1 code of GRUB is written within the 512 bytes of the master boot record. Because of the limited size of the master boot record area, GRUB stage1 will usually point to the next stage of GRUB, stage1_5 or stage2. GRUB may or may not need to load stage1_5 depending on the types of filesystems present.
[root@localhost grub]# file stage1Stage1_5
Stage1_5 is the intermediate stage between stage1 and stage2. If you look at your /boot/grub area, you should see various stage1_5 files with names of various types of filesystems. Stage1_5 deals with specific types of filesystems, and is named after them. This code will allow the filesystem to be interpreted appropriately.
These are the stage1_5 files located in my /boot/grub directory:
e2fs_stage1_5Stage2
This is the main GRUB image, and will usually reside on the filesystem in the /boot partition at /boot/grub/stage2. It reads the /boot/grub/grub.conf file for configuration information that details how it will load the kernel. It also has an interactive interface that will allow you to troubleshoot, re-install, or modify how GRUB works. Stage2 will present the user with a graphical boot menu entry. If the appropriate key is not pressed in the time allotted to enter the hidden menu, or nothing is selected in the time frame allowed, GRUB will boot the default entry.
GRUB has a really helpful tab-completion feature that you can use to assist you in getting your system up and running if, for example, you have a non-bootable system due to a improperly specified line in your grub.conf file. I have used it in this way many times to get unbootable systems working again.
/boot/grub/grub.conf
This file specifies the kernel to load and the initrd image file with all of the modules to load for your system.
Here is a typical grub.conf file for Red Hat Linux:
# grub.conf generated by anacondaThis file is fairly easy to edit; because of this, it is a common location for errors.
GNU GRUB version 0.97 (640K lower / 3072K upper memory)
Nothing but a GRUB prompt - now what do we do? If you're making any changes
to this file, remember to print it out first, and make a backup copy; it may
save you if you make a typo. You can use the hardcopy to recall the options
necessary to boot your machine and fix your mistake. There are various
errors, and it pays to become familiar with some of the common ones such as
an incorrectly specified kernel name, or a root partition that is
improperly designated in this file. If you do see this GRUB prompt, ask
yourself this question, "What do I do now to recover my system?"
If you have no idea at this point, and you are managing Linux systems, it
would probably be a good idea to get on a practice box and create some
errors within the grub.conf, and get to know your bootloader. Here's what
could be done in the above situation - notice where I hit the
Booting manually with GRUB tab completion
grub> root (hd0,0)
The "root (hd0,0)" specifies the first drive, and the
first partition on that drive. On the kernel line, you can type '/', hit
When GRUB transfers control to the kernel and the kernel is booting, you may see a lot of text output. On Red Hat-based systems, the line in grub.conf specifying the kernel may have an "rhgb quiet" appended to it that prevents this. It stands for "Red Hat graphical boot quiet" and will suppress kernel boot messages. When I need to see boot messages, I interrupt the GRUB cycle by hitting the escape key, hitting 'e' for edit (the procedure to modify the grub kernel arguments is at the bottom of the GRUB screen), and editing the kernel line. Removing the "rhgb quiet" allows me to see the kernel messages so I can determine if any of them are relevant to, e.g., a kernel panic or a similar problem.
What is actually going on at this time is that the kernel is probing your hardware and configuring itself for your hardware. The kernel is also loading modules in the initrd image that it needs to operate your hardware. Note that the information will scroll very quickly up your screen - so, if you believe your problem exists at the kernel level, be sure to watch closely, as this is sure to give you a clue about where your problem lies. Once the kernel is done with its initialization, it starts the system's first process, which is /sbin/init.
[ Much, but not all, of the boot information is available in the /var/log/dmesg file, once you've booted. -- Ben ]
INIT
Init is the first process running on your system. It reads the /etc/inittab file, executes /etc/rc.d/rc.sysinit, then boots into the runlevel as defined in /etc/inittab.
Init starts out with a Process ID (PID) of 1. In the image above, there's a
line saying "INIT: version 2.86 booting"; this is /sbin/init
taking over at this point in the boot process. On the line right after that
one, you see the messages being displayed by the /etc/rc.d/rc.sysinit shell
script; as a matter of fact, the entire screen contains messages from that
script, so you can get an idea of some of the functions it performs. Init
will also normally start several instances of /sbin/getty or /sbin/mingetty,
which are your virtual terminals. This is why you can hit
Next, we will look at the /etc/inittab file, specifically, at the configuration file for Init:
#
Depending on what Init is doing, you may see typical Init script
verification messages getting printed to the screen, i.e., [ OK ] or
[Failed] to aid in troubleshooting. You may see a message like "Press
'I' to enter interactive startup" (on Red Hat-based systems);
this is an indication of rc.sysinit executing, and allows the operator a certain level
of control over the still-booting system. rc.sysinit ends
with your default runlevel (as defined in /etc/inittab) being
started. This is another common place for errors, because servers will
usually have the "id:5:initdefault:" line set
to 3, so the machine boots to runlevel 3 instead of runlevel 5.
Another common place for errors is the line pertaining to
"ca::ctrlaltdel:/sbin/shutdown -t3 -r now"
commented out to prevent the 3-finger salute () from restarting the server. People are human and make
mistakes; I have seen typographical errors in both places that can
cause problems. Unlike the "mount -a"
command, which will alert you if errors exist in your mount points in
the /etc/fstab file, executing the command "init q"
will reread your /etc/inittab, but will not check for errors in
the runlevels themselves; the best way to know if errors exist is to learn
this file and to be very, very careful if you decide to modify any of the
/etc/rc*/* files.
When you enter your runlevel, you will see further Init messages being printed to the screen (depending if your machine is configured to do so), again ending with a [ OK ] or [Failed] depending on whether it started successfully or not. These are your startup services within your runlevels. When you look at your /etc/inittab file, you will see a line like "id:5:initdefault:"; this is your default runlevel. The default runlevel on most servers will be set to 3; on desktops, of course, it's set to 5, so we can have an X Window System session start as soon as the system boots up.
To get an overview of what processes get started or stopped for any particular runlevel, we should look within the /etc/rcX.d (where X is your runlevel) directory. Inside these directories, you will see symbolic links to the files in your /etc/init.d/ directory. The file names will be prefixed with either a 'K' or an 'S' (signifying kill or start) for the given daemon at that runlevel. The number immediately after the letter positions the script in the start order, because the processes are started alphabetically. With Red Hat-based systems, the "chkconfig" command will alter the symbolic links to start or stop the daemon in a desired runlevel; the 'S' or 'K' will change appropriately and the number will most likely change as well.
LoginThat sums up the outline of the boot process for a typical Linux machine. At this point, you should have a better understanding of what goes on up until the login prompt or dialog on your screen. Last of all, always pay close attention to the details: the machine will most likely tell you what is wrong if you're experiencing problems.