Using Netlink for communication between user space and kernel space-LINUX-php.cn

用户空间和内核空间通讯-Netlink 上

In 2001, the ForCES IETF committee officially carried out standardization work on Netlink. Jamal Hadi Salim proposed defining Netlink as a protocol for communication between the routing engine component of a network device and its control and management component. However, his suggestion was not adopted in the end, and was replaced by the pattern we see today: Netlink was designed as a new protocol domain, domain.

Tobas, the father of Linux, once said, "Linux is evolution, not intelligent design." What's the meaning? In other words, Netlink also follows certain design concepts of Linux, that is, there is no complete specification document or design document. Just what? You know---"Read the f**king source code".

Of course, this article is not to analyze the implementation mechanism of Netlink on Linux, but to share with you the topics of "What is Netlink" and "How to make good use of Netlink". You only need to read the kernel when you encounter problems. Find out why in the source code.

What is Netlink

Regarding the understanding of Netlink, you need to grasp several key points:

1. Datagram-oriented connectionless messaging subsystem

2. Implemented based on the general BSD Socket architecture

Regarding the first point, it is easy for us to think of the UDP protocol. It is great to think of this. It is not unreasonable to understand Netlink based on the UDP protocol. As long as you can draw parallels and learn by analogy, be good at summarizing and associating, and finally realize knowledge transfer, this is the essence of learning. Netlink can realize bidirectional and asynchronous data communication between kernel->user and user->kernel. It also supports data communication between two user processes and even between two kernel subsystems. In this article, we will not consider the latter two, and focus on how to implement data communication between users <-> kernels.

When you saw the second point, did the following picture flash in your mind? If yes, it means you do have the root of wisdom; of course, if not, it doesn’t matter, the root of wisdom can grow slowly, haha.

We will mainly use socket(), bind(), sendmsg()

when practicing Netlink socket programming later. System calls such as

and recvmsg(), and of course the polling mechanism provided by socket.

Netlink communication type

Netlink supports two types of communication methods: unicast and multicast.

Unicast: Often used for 1:1 data communication between a user process and a kernel subsystem. User space sends commands to the kernel, and then receives the results of the commands from the kernel.

Multicast: Often used for 1:N data communication between a kernel process and multiple user processes. The kernel acts as the initiator of the session, and the user space application is the receiver. In order to implement this function, the kernel space program will create a multicast group, and then all user space processes that are interested in the messages sent by the kernel process will join the group to receive messages sent from the kernel. as follows:

用户空间和内核空间通讯-Netlink 上

The communication between process A and subsystem 1 is unicast, and the communication between process B and C and subsystem 2 is multicast. The picture above also tells us a message. Data transferred from user space to the kernel does not need to be queued, that is, the operation is completed synchronously; while data transferred from kernel space to user space needs to be queued, which is asynchronous. Understanding this can save us a lot of detours when developing application modules based on Netlink. If you send a message to the kernel and need to obtain certain information in the kernel, such as a routing table or other information, if the routing table is too large, then when the kernel returns data to you through Netlink, you can think about how to receive it. Data problems, after all, you have seen the output queue, you cannot turn a blind eye.

Netlink message format

Netlink message consists of two parts: message header and payload. The entire Netlink message is 4-byte aligned and is generally transmitted in host byte order. The message header is fixed at 16 bytes, and the message body length is variable:

用户空间和内核空间通讯-Netlink 上

Netlink message header

The message header is defined in the file and is represented by the structure nlmsghdr:

Click (here) to collapse or open

struct nlmsghdr
{
__u32 nlmsg_len; /* Length of message including header */
__u16 nlmsg_type; /* Message content */
__u16 nlmsg_flags; /* Additional flags */
__u32 nlmsg_seq; /* Sequence number */
__u32 nlmsg_pid; /* Sending process PID */
};

Explanation and description of each member attribute in the message header:

nlmsg_len: The length of the entire message, calculated in bytes. Includes the Netlink message header itself.

nlmsg_type: The type of message, that is, whether it is data or control message. Currently (kernel version 2.6.21) Netlink only supports four types of control messages, as follows:

NLMSG_NOOP-Empty message, do nothing;

NLMSG_ERROR - Indicates that the message contains an error;

NLMSG_DONE - If the kernel returns multiple messages through the Netlink queue, the last message in the queue is of type NLMSG_DONE, and the nlmsg_flags attribute of all remaining messages has the NLM_F_MULTI bit set to be valid.

NLMSG_OVERRUN-Not used yet.

nlmsg_flags: Additional descriptive information attached to the message, such as NLM_F_MULTI mentioned above. The excerpt is as follows:

As long as you know that nlmsg_flags has multiple values, as for the role and meaning of each value, you can definitely find the answer through Google and source code, so I won’t go into it here. All values in the previous 2.6.21 kernel:

用户空间和内核空间通讯-Netlink 上

nlmsg_seq: message sequence number. Because Netlink is oriented to datagrams, there is a risk of data loss, but Netlink provides a mechanism to ensure that messages are not lost, allowing program developers to implement it according to their actual needs. Message sequence numbers are generally used in conjunction with NLM_F_ACK type messages. If the user's application needs to ensure that every message it sends is successfully received by the kernel, then it needs the user program to set the sequence number itself when sending the message, and the kernel receives the message. Then extract the serial number, and then set the same serial number in the response message sent to the user program. Somewhat similar to TCP's response and confirmation mechanism.

Note: When the kernel actively sends a broadcast message to user space, this field in the message is always 0.

nlmsg_pid: When a data exchange channel is established through Netlink between a user space process and a certain subsystem in kernel space, Netlink will assign a unique digital identification to each such channel. Its main function is to correlate request messages and response messages from user space. To put it bluntly, if there are multiple user processes in user space and multiple processes in kernel space, Netlink must provide a mechanism to ensure that data interaction between each pair of "user-kernel" space communication processes is consistent. Disorders can occur.

用户空间和内核空间通讯-Netlink 上

That is, when processes A and B obtain information from subsystem 1 through Netlink, subsystem 1 must ensure that the response data sent back to process A will not be sent to process B. It is mainly suitable for scenarios where user space processes obtain data from kernel space. Normally, when a user-space process sends a message to the kernel, it usually assigns the process ID of the current process to this variable through the system call getpid(). That is, the user-space process does this when it hopes to get a response from the kernel. This field is set to 0 for messages actively sent from the kernel to user space.

Netlink message body

Netlink’s message body adopts TLV (Type-Length-Value) format:

用户空间和内核空间通讯-Netlink 上

Each attribute of Netlink is represented by struct nlattr{} in the file:

用户空间和内核空间通讯-Netlink 上

Error indication message provided by Netlink

content

When an error occurs during communication between user space applications and kernel space processes through Netlink, Netlink must notify user space of this error. Netlink encapsulates the error message separately, :

Click (here) to collapse or open

struct nlmsgerr
{
int error; //Standard error code, defined in the errno.h header file. You can use perror() to explain
struct nlmsghdr msg; //Indicates which message triggered the error value in the structure
};

Issues that need attention in Netlink programming

Based on Netlink user-kernel communication, there are two situations that may cause packet loss:

1. Memory exhausted;

2. The buffer overflow of the user space receiving process. The main reasons for buffer overflow may be: the user space process runs too slowly; or the receive queue is too short.

If Netlink cannot correctly deliver the message to the receiving process in user space, then the receiving process in user space will return an out of memory (ENOBUFS) error when calling the recvmsg() system call. This needs to be noted. In other words, the buffer overflow situation will not be sent in the sendmsg() system call from user->kernel. The reason has been mentioned before. Please think about it yourself.

Of course, if blocking socket communication is used, there is no hidden danger of memory exhaustion. Why is this? Go to Google quickly and look up what a blocking socket is. If you learn without thinking, you will be in vain; if you think without learning, you will be in danger.

Netlink address structure

In the TCP blog post, we mentioned the address structure and standard address structure used in the Internet programming process. Their relationship with the Netlink address structure is as follows:

The detailed definition and description of struct sockaddr_nl{} is as follows:

用户空间和内核空间通讯-Netlink 上

Click (here) to collapse or open

struct sockaddr_nl
{
sa_family_t nl_family; /*This field is always AF_NETLINK */
unsigned short nl_pad; /* Currently not used, filled with 0*/
__u32 nl_pid; /* process pid */
__u32 nl_groups; /* multicast groups mask */
};

nl_pid: This attribute is the process ID of sending or receiving messages. As we said before, Netlink can not only realize user-kernel space communication, but also enable real-time communication between two processes in user space, or two processes in kernel space. communication between. When this attribute is 0, it generally applies to the following two situations:

First, the destination we want to send is the kernel, that is, when sending from user space to kernel space, nl_pid in the Netlink address structure we construct is usually set to 0. One thing I need to explain to you here is that in the Netlink specification, the full name of PID is Port-ID (32bits), and its main function is to uniquely identify a netlink-based socket channel. Normally nl_pid is set to the process ID of the current process. However, for the case where multiple threads of a process use netlink socket at the same time, the setting of nl_pid is generally implemented as follows:

Click (here) to collapse or open

pthread_self() << 16 | getpid();

Second, when a multicast message is sent from the kernel to user space, if the user space process is in the multicast group, then nl_pid in its address structure is also set to 0, and at the same time, it must be combined with the following introduction another property of.

nl_groups: If a user space process wants to join a multicast group, it must execute the bind() system call. This field specifies the mask of the multicast group number that the caller wishes to join (note that it is not the group number, we will explain this field in detail later). If this field is 0, it means that the caller does not want to join any multicast group. For each protocol belonging to the Netlink protocol domain, up to 32 multicast groups can be supported (because the length of nl_groups is 32 bits), and each multicast group is represented by one bit.

Regarding the remaining knowledge points of Netlink, we will discuss them when they are useful in later practical sessions.

Not finished, to be continued...

The above is the detailed content of Using Netlink for communication between user space and kernel space. For more information, please follow other related articles on the PHP Chinese website!