Docker isolates resources: 1. File system; 2. Network; 3. Communication between processes; 4. Users and user groups for permissions; 5. PID and host within the process PID; 6. Host name and domain name, etc.
The operating environment of this tutorial: linux5.9.8 system, docker-1.13.1 version, Dell G3 computer.
The essence of Docker container is a process on the host.
Docker achieves resource isolation through namespace, resource limitation through cgroups, and high efficiency through *copy-on-write mechanism* file operations.
The namespace mechanism provides a resource isolation solution.
PID, IPC, Network and other system resources are no longer global, but belong to a specific Namespace.
The resources under each namespace are related to the resources under other namespaces. Transparent, invisible.
One of the main purposes of the Linux kernel implementing namespace is to implement lightweight virtualization (container) services. Processes in the same namespace can perceive each other's changes and know nothing about external processes. To achieve independence and isolation.
What namespace can isolate
If a container wants to not interfere with other containers, it needs to be able to do the following:
The file system needs to be isolated
The network also needs to be isolated
Inter-process communication must also be isolated
For permissions, users and user groups also need to be isolated
The PID in the process also needs to be isolated from the PID in the host
Containers must also have their own host names
With the above isolation, we believe that a container can be isolated from the host and other containers of.
It happens that Linux namespace can do this.
namespace | Isolated content | System call parameters |
---|---|---|
UTS | Host name and domain name | CLONE_NEWUTS |
IPC | Semaphore, message queue and shared memory | CLONE_NEWIPC |
Network | Network devices, network stacks, ports, etc. | CLONE_NEWNET |
PID | Process number | CLONE_NEWPID |
Mount | Mount point (file system) | CLONE_NEWNS |
User | Users and User Groups | CLONE_NEWUSER |
UTS namespace
UTS (UNIX TIme-sharing System) namespace provides isolation of host and domain name, so that each Docker container can have an independent host name and domain name on the network can be viewed as an independent node rather than a process on the host machine.
In Docker, each image is basically named hostname after the service name it provides, and will not have any impact on the host.
IPC namespace
The IPC resources designed for Inter-Process Communication (IPC) include common semaphores, message queues and shared memory.
When applying for IPC resources, you apply for a globally unique 32-bit ID.
The IPCnamespace contains the system IPC identifier and the file system that implements the POSIX message queue.
Processes in the same IPC namespace are visible to each other, but processes in different namespaces are invisible to each other.
PID namespace
The isolation of PID namespace is very practical. It renumbers the process PID, that is, two processes under different namespaces can have the same PID, each PID namespaces have their own counting procedures.
The kernel maintains a tree structure for all PID namespaces. The topmost one is created when the system is initialized and is called the root namespace. The newly created PID namespace is called the child namespace, and the original PID namespace is the child namespace of the newly created PID namespace, and the original PID namespace is the parent namespace of the newly created PID namespace.
In this way, different PID namespaces will form a hierarchical system. The parent node to which they belong can see the processes in the child nodes and can affect the processes in the child nodes through signals and other methods. However, the child node cannot see anything in the PID namespace of the parent node.
mount namespace
mount namespace provides support for isolating file systems by isolating file system mount points.
After isolation, changes in file structures in different mount namespaces will not affect each other.
network namespace
Network namespace mainly provides isolation of network resources, including network equipment, IPv4, IPv6 protocol stack, IP routing table, firewall, /proc/ net directory, /sys/class/net directory, sockets, etc.
user namespace
User namespace isolates installation-related identifiers and attributes
namespace operations
The namespace API includes clone() setns() unshare() and some files under /proc
In order to determine which namespaces are isolated, you need to specify one or more of the following 6 parameters separated by | The 6 parameters are CLONE_NEWUTS, CLONE_NEWIPC, CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWUSER mentioned in the table above
clone()
Use clone() to create an independent namespace Process is the most common approach and the most basic way for Docker to use namespace.
int clone(int(*child_func)(void *),void *child_stack,int flags, void *arg);
clone() is a more general implementation of the Linux system call fork(). You can control how many functions are used through flags.
There are more than 20 kinds of CLONE_* flags, which control all aspects of the clone process.
/proc/[pid]/ns
Users can enter /proc/[pid ]/ns file, you can see files pointing to different namespaces.
ls -l /proc/10/ns
The namespace number in square brackets
If the namespace numbers pointed to by two processes are the same, then they are in the same namespace
The purpose of setting link is that even if all processes under the namespace have ended, this namespace will always exist, and subsequent processes can join in.
Mounting the /proc/[pid]/ns directory file using the --bind method can also achieve the function of link
touch ~/utsmount --bind /proc/10/ns/uts ~/uts
setns()
Docker When using the docker exec command to execute a new command on an already running command, you need to use setns().
Through the setns() system call, the process joins an existing namespace from the original namespace
Usually in order not to affect the caller of the process and to make the newly added pid namespace take effect, the process will be added in setns() After the function is executed, use clone() to create a child process to continue executing the command and let the original process end running.
int setns(int fd, in nstype); #fd 表示要加入namespace的文件描述符。是一个指向/proc/[pid]/ns目录的文件描述符,打开目录链接可以获得 #nstype 调用者可以检查fd指向的namespace类型是否符合实际要求,该参数为0则不检查
In order to make use of the newly added namespace, it is necessary to introduce the execve() series of functions, which can execute user commands. The most commonly used one is to call /bin/bash and accept parameters
unshare()
Namespace isolation on the original process through unshare()
Unshare is very similar to clone. Unshare does not need to start a new process and can be used on the original process. .
docker does not use the
fork() system call
fork does not belong to the namespace API
cgroups is a mechanism provided by the Linux kernel. This mechanism can integrate (or separate) a series of system tasks and their subtasks into different levels based on resource levels according to needs. within the group, thereby providing a unified framework for system resource management.
cgroups is another powerful kernel tool in Linux. With cgroups, you can not only limit the resources isolated by namespace, but also set weights for resources, calculate usage, and control the start of tasks (processes or counties). Stop and wait. To put it bluntly: cgroups can limit and record the physical resources (including CPU, Memory, IO, etc.) used by task groups, and is the cornerstone of building a series of virtualization management tools such as Docker.
The role of cgroups
cgroups provides a unified interface for resource management at different user levels, from individual resource control to the operating system level For virtualization, cgroups provides four major functions.
Recommended learning: "docker video tutorial"
The above is the detailed content of Which resources docker isolates. For more information, please follow other related articles on the PHP Chinese website!