器量大者,福泽必厚
全部博文(587)
分类: LINUX
2014-06-13 10:15:13
Contents |
You are supposed to be able to kill any process with `kill -9 [PID]`, but you may come across a process that can't be killed. Usually this happens when you are trying to kill a
Look for process in the state 'D' (uninterruptible sleep) or in the state 'Z' (defunct zombie). The following command will list processes in state 'D' or 'Z'. Note that if no processes are in state 'D' or 'Z' then this will still print the `ps` header, but nothing else.
ps Haxwwo stat,pid,ppid,user,wchan:25,command | grep -e "^STAT" -e "^D" -e "^Z"
For testing you might want to add normally suspended/sleeping processes:
ps Haxwwo stat,pid,ppid,user,wchan:25,command | grep -e "^STAT" -e "^D" -e "^Z" -e "^S"
After you send a kill signal to a stuck process you must also send a kill signal to the `rpciod` kernel thread (it will restart when needed).
ps Haxwwo pid,command | grep "rpciod" | grep -v grep
You can sometimes kill a process by unmounting filesystems that it is stuck waiting for. If that doesn't cause the process to generate an IO error or a segfault then go back and try killing the process again.
Use both `mount` and `cat /proc/mounts` to see what filesystems are mounted. Sometimes `mount` will not show NFS mounts where a previous `umount` is still pending -- yet another headache when dealing with NFS.
You can use `fuser` to show which processes have filedescriptors open to a given filesystem. In the command below DEV must be the device name such as '/dev/sda1' or an NFS network name such as 'some_nfs:/home/user'. Do not use the mount point directory name for NFS mounts because this will cause `fuser` to hang. Again, for NFS, use only the nfs_server:/path name.
fuser -v -m [DEV]
You can force an NFS share to unmount by using the lazy option with `umount`. This may cause the stuck process as well as other processes to segfault as mem-mapped files and the like suddenly disappear. Other weird things can happen as this does not actually force the connection to close for any processes that were connected. For example, shells may still work, but if you `cd` into other directories you may end up with a meaningless working directory yet still able to `ls` files.
umount -l [MOUNT_POINT_OR_DEV]
Sometimes the only thing to do is reboot, but even `reboot` and `halt` will first try to sync filesystems by default and they will end up stuck. This sounds like a Catch-22, but the fix is simple by specifying the options '-n' to not sync any mounted filesystems and '-f' for force a reboot without calling `shutdown`.
reboot -n -f
cat /dev/random >/dev/null & PID=$! CMDLINE="!-2" CMD=${CMDLINE%% *} WCHAN=$(cat /proc/${PID}/wchan) echo "command: ${CMD}, pid: ${PID}, wchan: ${WCHAN}" strace -p ${PID} gdb ${CMD} ${PID} (gdb) disassemble Dump of assembler code for function __kernel_vsyscall: 0xb7f6b420 <__kernel_vsyscall+0>: push %ecx 0xb7f6b421 <__kernel_vsyscall+1>: push %edx 0xb7f6b422 <__kernel_vsyscall+2>: push %ebp 0xb7f6b423 <__kernel_vsyscall+3>: mov %esp,%ebp 0xb7f6b425 <__kernel_vsyscall+5>: sysenter 0xb7f6b427 <__kernel_vsyscall+7>: nop 0xb7f6b428 <__kernel_vsyscall+8>: nop 0xb7f6b429 <__kernel_vsyscall+9>: nop 0xb7f6b42a <__kernel_vsyscall+10>: nop 0xb7f6b42b <__kernel_vsyscall+11>: nop 0xb7f6b42c <__kernel_vsyscall+12>: nop 0xb7f6b42d <__kernel_vsyscall+13>: nop 0xb7f6b42e <__kernel_vsyscall+14>: jmp 0xb7f6b423 <__kernel_vsyscall+3> 0xb7f6b430 <__kernel_vsyscall+16>: pop %ebp 0xb7f6b431 <__kernel_vsyscall+17>: pop %edx 0xb7f6b432 <__kernel_vsyscall+18>: pop %ecx 0xb7f6b433 <__kernel_vsyscall+19>: ret End of assembler dump.
Mounting an NFS filesystem with the soft option will help prevent stuck processes when a network connection is lost.
showmount -e remote_nfs_server mount remote_ls mount -o soft nfs_server:/path /media/mount_point