Home > Article > Operation and Maintenance > What you must know about Linux namespaces

What you must know about Linux namespaces

WBOYforward: 2022-01-25 17:37:453810browse

This article brings you relevant knowledge about Linux namespaces. Namespaces provide a lightweight form of virtualization, allowing us to view the global properties of the running system from different aspects. We hope to Everyone is helpful.

1. Basic Concepts

Namespace (Linux namespace) is a feature of the Linux kernel that is implemented for container virtualization. Each container we create has its own namespace, and the applications running in it are as if they are running in an independent operating system. The namespace ensures that the containers do not affect each other.

The namespace mechanism of Linux provides a solution for resource isolation. System resources such as PID, IPC, and Network are no longer global, but belong to a specific Namespace. Namespace is a kind of encapsulation and isolation of global system resources, so that processes in different namespaces have independent global system resources. Changing system resources in a namespace will only affect processes in the current namespace and has no impact on processes in other namespaces.

Traditionally, in Linux and other derivatives of UNIX, many resources are managed globally. For example, all processes in the system are conventionally identified by PID, which means that the kernel must manage a global list of PIDs. Moreover, the system-related information (including the system name and some information about the kernel) returned by all callers through the uname system call is the same. User IDs are managed in a similar way, that is, each user is identified by a globally unique UID number.

Global ID allows the kernel to selectively allow or deny certain privileges. While the root user with UID 0 is allowed to do basically anything, other user IDs are restricted. For example, the user with UID n is not allowed to kill the process belonging to user m (m≠ n). But this does not prevent users from seeing each other, i.e. user n can see that another user m is also active on the computer. As long as users can only manipulate their own processes, this is fine, since there is no reason why users should not be allowed to see other users' processes.

But there are situations where this effect may be undesirable. If the provider of web hosting intends to provide users with full access to a Linux computer, including root privileges. Traditionally, this required a computer for each user, which was prohibitively expensive. Using the virtualization environment provided by KVM or VMWare is one way to solve the problem, but the resource allocation is not very good. Each user of the computer requires a separate kernel and a fully installed set of supporting user-level applications.

Namespaces provide a different solution that requires fewer resources. In a virtualized system, a physical computer can run multiple cores, possibly multiple different operating systems in parallel. The namespace only uses one kernel to operate on one physical computer, and all the aforementioned global resources are abstracted through the namespace. This makes it possible to place a set of processes into containers that are isolated from each other. Isolation allows members of a container to have no relationship with other containers. But you can also reduce the separation between containers by allowing them to share certain amounts of information. For example, a container can be set up to use its own set of PIDs but still share parts of the file system with other containers.

2. Implementation

The implementation of namespace requires two parts: the namespace structure of each subsystem, packaging all previous global components into the namespace; and associating a given process to The mechanism of each namespace to which it belongs.

Subsystem's previously global properties are now encapsulated into namespaces, and each process is associated with a selected namespace. Every namespace-aware kernel subsystem must provide a data structure that collects all objects provided as namespaces. struct nsproxy is used to assemble pointers to subsystem-specific namespace wrappers. In the file nsproxy.h there is:

/*
 * A structure to contain pointers to all per-process
 * namespaces - fs (mount), uts, network, sysvipc, etc.
 *
 * The pid namespace is an exception -- it's accessed using
 * task_active_pid_ns.  The pid namespace here is the
 * namespace that children will use.
 *
 * 'count' is the number of tasks holding a reference.
 * The count for each namespace, then, will be the number
 * of nsproxies pointing to it, not the number of tasks.
 *
 * The nsproxy is shared by tasks which share all namespaces.
 * As soon as a single namespace is cloned or unshared, the
 * nsproxy is copied.
 */struct nsproxy {
	atomic_t count;
	struct uts_namespace *uts_ns;
	struct ipc_namespace *ipc_ns;
	struct mnt_namespace *mnt_ns;
	struct pid_namespace *pid_ns_for_children;
	struct net 	     *net_ns;
	struct time_namespace *time_ns;
	struct time_namespace *time_ns_for_children;
	struct cgroup_namespace *cgroup_ns;};

The following scopes of the current kernel can sense the namespace

1. The UTS namespace contains the name, version, underlying architecture type, etc. of the running kernel. information. UTS is the abbreviation of UNIX Timesharing System.

2. All information related to inter-process communication (IPC) stored in struct ipc_namespace.

3. The view of the mounted file system is given in struct mnt_namespace.

4. Information about the process ID is provided by struct pid_namespace.

5. The information saved by struct user_namespace is used to limit the resource usage of each user.

6. struct net_ns contains all network-related namespace parameters.

When I discuss the corresponding subsystem, I will introduce the contents of each namespace container. Since fork can be used to establish a new namespace when creating a new process, appropriate flags must be provided to control this behavior. Each namespace has a corresponding flag, which is in the sched.h file:

#define CLONE_NEWCGROUP		0x02000000	/* New cgroup namespace */
#define CLONE_NEWUTS		0x04000000	/* New utsname namespace */
#define CLONE_NEWIPC		0x08000000	/* New ipc namespace */
#define CLONE_NEWUSER		0x10000000	/* New user namespace */
#define CLONE_NEWPID		0x20000000	/* New pid namespace */
#define CLONE_NEWNET		0x40000000	/* New network namespace */

The functions of different types of namespaces:

IPC：用于隔离进程间通讯所需的资源（ System V IPC, POSIX message queues），PID命名空间和IPC命名空间可以组合起来用，同一个IPC名字空间内的进程可以彼此看见，允许进行交互，不同空间进程无法交互

Network：Network Namespace为进程提供了一个完全独立的网络协议栈的视图。包括网络设备接口，IPv4和IPv6协议栈，IP路由表，防火墙规则，sockets等等。一个Network Namespace提供了一份独立的网络环境，就跟一个独立的系统一样。

Mount：每个进程都存在于一个mount Namespace里面， mount Namespace为进程提供了一个文件层次视图。如果不设定这个flag，子进程和父进程将共享一个mount Namespace，其后子进程调用mount或umount将会影响到所有该Namespace内的进程。如果子进程在一个独立的mount Namespace里面，就可以调用mount或umount建立一份新的文件层次视图。

PID：：linux通过命名空间管理进程号，同一个进程，在不同的命名空间进程号不同！进程命名空间是一个父子结构，子空间对于父空间可见。

User：用于隔离用户

UTS：用于隔离主机名

每个进程都关联到自身的命名空间视图，在任务定义的结构体task_struct中有如下定义：

struct task_struct {.../* 命名空间 */struct nsproxy *nsproxy;...}

因为使用了指针，多个进程可以共享一组子命名空间。这样，修改给定的命名空间，对所有属于该命名空间的进程都是可见的。
init_nsproxy定义了初始的全局命名空间，其中维护了指向各子系统初始的命名空间对象的指针。在kernel/nsproxy.c文件内有

struct nsproxy init_nsproxy = {
	.count			= ATOMIC_INIT(1),
	.uts_ns			= &init_uts_ns,#if defined(CONFIG_POSIX_MQUEUE) || defined(CONFIG_SYSVIPC)
	.ipc_ns			= &init_ipc_ns,#endif
	.mnt_ns			= NULL,
	.pid_ns_for_children	= &init_pid_ns,#ifdef CONFIG_NET
	.net_ns			= &init_net,#endif#ifdef CONFIG_CGROUPS
	.cgroup_ns		= &init_cgroup_ns,#endif#ifdef CONFIG_TIME_NS
	.time_ns		= &init_time_ns,
	.time_ns_for_children	= &init_time_ns,#endif};

三、UTS命名空间

UTS命名空间几乎不需要特别的处理，因为它只需要简单量，没有层次组织。所有相关信息都汇集到下列结构的一个实例中。在utsname.h文件内：

struct uts_namespace {
	struct new_utsname name;
	struct user_namespace *user_ns;
	struct ucounts *ucounts;
	struct ns_common ns;} __randomize_layout;

uts_namespace所提供的属性信息本身包含在struct new_utsname中：

struct oldold_utsname {
	char sysname[9];
	char nodename[9];
	char release[9];
	char version[9];
	char machine[9];};#define __NEW_UTS_LEN 64struct old_utsname {
	char sysname[65];
	char nodename[65];
	char release[65];
	char version[65];
	char machine[65];};struct new_utsname {
	char sysname[__NEW_UTS_LEN + 1];
	char nodename[__NEW_UTS_LEN + 1];
	char release[__NEW_UTS_LEN + 1];
	char version[__NEW_UTS_LEN + 1];
	char machine[__NEW_UTS_LEN + 1];
	char domainname[__NEW_UTS_LEN + 1];}

各个字符串分别存储了系统的名称（ Linux…）、内核发布版本、机器名，等等。使用uname工具可以取得这些属性的当前值，也可以在/proc/sys/kernel/中看到

z@z-virtual-machine:~$ cat /proc/sys/kernel/ostype
Linux
z@z-virtual-machine:~$ cat /proc/sys/kernel/osrelease5.3.0-40-generic

初始设置保存在init_uts_ns中，在init/version.c文件内：

struct uts_namespace init_uts_ns = {
	.ns.count = REFCOUNT_INIT(2),
	.name = {
		.sysname	= UTS_SYSNAME,
		.nodename	= UTS_NODENAME,
		.release	= UTS_RELEASE,
		.version	= UTS_VERSION,
		.machine	= UTS_MACHINE,
		.domainname	= UTS_DOMAINNAME,
	},
	.user_ns = &init_user_ns,
	.ns.inum = PROC_UTS_INIT_INO,#ifdef CONFIG_UTS_NS
	.ns.ops = &utsns_operations,#endif};

What you must know about Linux namespaces

1. Basic Concepts

2. Implementation

三、UTS命名空间

Related articles