分类: 系统运维
2017-02-16 17:59:43
最近在使用docker容器的时候,发现宿主机的agetty进程cpu占用率达到100%。
在Google上搜了下,引起这个问题的原因是在使用"docker run"运行容器时使用了 "/sbin/init"和"--privileged"参数。
使用/sbin/init启动容器并加上--privileged参数,相当于docker容器获得了宿主机的全权委托权限。这时docker容器内部的init与宿主机的init产生了混淆。
# 引用google到的一段话:
I've done all my testing on them without using --privileged, especially since that's so dangerous (effectively, you're telling this second init process on your system that it's cool to go ahead and manage your system resources, and then giving it access to them as well). I always think of --privileged as a hammer to be used very sparingly.
出于对安全的考虑,在启动容器时,docker容器里的系统只具有一些普通的linux权限,并不具有真正root用户的所有权限。而--privileged=true参数可以让docker容器具有linux root用户的所有权限。
为了解决这个问题,docker后来的版本中docker run增加了两个选项参数"--cap-add"和"--cap-drop"。
--cap-add : 获取default之外的linux的权限
--cap-drop: 放弃default linux权限
从docker官网的文档中可以查到,docker容器具有的default权限及--cap-add可以获取到的扩展权限如下:
Default 权限:
Capability Key |
Capability Description |
SETPCAP |
Modify process capabilities. |
MKNOD |
Create special files using mknod(2). |
AUDIT_WRITE |
Write records to kernel auditing log. |
CHOWN |
Make arbitrary changes to file UIDs and GIDs (see chown(2)). |
NET_RAW |
Use RAW and PACKET sockets. |
DAC_OVERRIDE |
Bypass file read, write, and execute permission checks. |
FOWNER |
Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file. |
FSETID |
Don’t clear set-user-ID and set-group-ID permission bits when a file is modified. |
KILL |
Bypass permission checks for sending signals. |
SETGID |
Make arbitrary manipulations of process GIDs and supplementary GID list. |
SETUID |
Make arbitrary manipulations of process UIDs. |
NET_BIND_SERVICE |
Bind a socket to internet domain privileged ports (port numbers less than 1024). |
SYS_CHROOT |
Use chroot(2), change root directory. |
SETFCAP |
Set file capabilities. |
通过--cap-add获取到的权限:
Capability Key |
Capability Description |
SYS_MODULE |
Load and unload kernel modules. |
SYS_RAWIO |
Perform I/O port operations (iopl(2) and ioperm(2)). |
SYS_PACCT |
Use acct(2), switch process accounting on or off. |
SYS_ADMIN |
Perform a range of system administration operations. |
SYS_NICE |
Raise process nice value (nice(2), setpriority(2)) and change the nice value for arbitrary processes. |
SYS_RESOURCE |
Override resource Limits. |
SYS_TIME |
Set system clock (settimeofday(2), stime(2), adjtimex(2)); set real-time (hardware) clock. |
SYS_TTY_CONFIG |
Use vhangup(2); employ various privileged ioctl(2) operations on virtual terminals. |
AUDIT_CONTROL |
Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules. |
MAC_OVERRIDE |
Allow MAC configuration or state changes. Implemented for the Smack LSM. |
MAC_ADMIN |
Override Mandatory Access Control (MAC). Implemented for the Smack Linux Security Module (LSM). |
NET_ADMIN |
Perform various network-related operations. |
SYSLOG |
Perform privileged syslog(2) operations. |
DAC_READ_SEARCH |
Bypass file read permission checks and directory read and execute permission checks. |
LINUX_IMMUTABLE |
Set the FS_APPEND_FL and FS_IMMUTABLE_FL i-node flags. |
NET_BROADCAST |
Make socket broadcasts, and listen to multicasts. |
IPC_LOCK |
Lock memory (mlock(2), mlockall(2), mmap(2), shmctl(2)). |
IPC_OWNER |
Bypass permission checks for operations on System V IPC objects. |
SYS_PTRACE |
Trace arbitrary processes using ptrace(2). |
SYS_BOOT |
Use reboot(2) and kexec_load(2), reboot and load a new kernel for later execution. |
LEASE |
Establish leases on arbitrary files (see fcntl(2)). |
WAKE_ALARM |
Trigger something that will wake up the system. |
BLOCK_SUSPEND |
Employ features that can block system suspend. |
所以,在运行容器时,可以不用--privileged参数的尽量不用,用--cap-add参数替代。如果必须使用--privileged=true参数的,可以通过在宿主机和容器中执行以下命令将agetty关闭。
shell> systemctl stop getty@tty1.service
shell> systemctl mask getty@tty1.service
参考资料: