Several classic Linux packet collection engines

Release: 2023-08-04 16:07:06
forward
1785 people have browsed it

This article lists four classic Linux packet collection engines. If there are others you think are OK, you can leave a message. These four are:

  • libpcap/libpcap-mmap
  • PF_RING
  • DPDK
  • xdp

libpcap

libpcap’s packet capture mechanism is to add a Bypass processing does not interfere with the processing of the system's own network protocol stack. The sent and received data packets are filtered and buffered through the Linux kernel, and finally passed directly to the upper-layer application.

  1. The data packet arrives at the network card device.
  2. The network card device performs DMA operations according to the configuration. ( 「First copy」 : Network card register-> buffer ring buffer allocated by the kernel for the network card)
  3. The network card sends an interrupt and wakes up the processor.
  4. The driver software reads from the ring buffer and fills in the kernel skbuff structure ( 「Second copy」 : Kernel network card buffer ring buffer->Kernel-specific data structure skbuff)
  5. Then call the netif_receive_skb function:
  • 5.1 If there is a packet capture program, enter the BPF filter through the network sub-interface, and copy the packets matching the rules to the system kernel cache ( 「3rd copy」 ). BPF associates a filter and two buffers with each packet capture program that requires service. BPF allocates buffers and usually its size is 4KB. The store buffer is used to receive data from the adapter; the hold buffer is used to copy packets to the application.
  • 5.2 Process the bridging function of the data link layer;
  • 5.3 Determine the upper layer protocol based on the skb->protocol field and submit it to the network layer Processing, entering the network protocol stack for high-level processing.
  • libpcap bypasses the processing of the protocol stack part of the Linux kernel packet collection process, allowing the user space API to directly call the socket PF_PACKET to obtain data packets from the link layer driver Copy it from the kernel buffer to the user space buffer ( 「4th copy」 )
  • libpcap-mmap

    libpcap-mmap is an improvement on the old libpcap implementation. New versions of libpcap basically use the packet_mmap mechanism. PACKET_MMAP uses mmap to reduce one memory copy ("The fourth copy is gone"), which reduces frequent system calls and greatly improves message capture. s efficiency.

    PF_RING

    We saw that libpcap had 4 memory copies before. libpcap_mmap has 3 memory copies. The core solution proposed by PF_RING is to reduce the number of copies of messages during transmission.

    We can see that compared to libpcap_mmap, pfring allows user space memory to mmap directly with rx_buffer. This reduces another copy (「Second copy of libpcap_mmap」: rx_buffer->skb)

    PF-RING ZC implements DNA (Direct NIC Access (direct network card access) technology maps the user memory space to the driver's memory space, allowing user applications to directly access the registers and data of the network card.

    In this way, the cache of data packets in the kernel is avoided and one copy is reduced (「The first copy of libpcap」, DMA to kernel buffer copy). This is completely zero copy.

    The disadvantage is that only one application can open the DMA ring at a time (note that today's network cards can have multiple RX/TX queues, allowing one application to be on each queue at the same time) , In other words, multiple applications in user mode need to communicate with each other to distribute data packets.

    DPDK

    pf-ring Both zc and dpdk can achieve zero copy of data packets. Both bypass the kernel, but the implementation principles are slightly different. PF-ring zc takes over the data packets through the zc driver (also at the application layer), and dpdk is implemented based on UIO.

    1 UIO mmap implements zero copy

    UIO (Userspace I/O) is an I/O technology that runs in user space. Generally, driver devices in Linux systems run in the kernel space and can be called by applications in the user space. However, UIO runs a small part of the driver in the kernel space and implements the vast majority of the driver in the user space. Function. Using the UIO mechanism provided by Linux, the Kernel can be bypassed and all packet processing work is completed in the user space.

    2 UIO PMD reduces interrupts and CPU context switching

    DPDK’s UIO driver blocks hardware-issued interrupts and then uses active polling in user mode. This The mode is called PMD (Poll Mode Driver).

    Compared with DPDK, pf-ring (no zc) uses NAPI polling and application layer polling, while pf-ring zc is similar to DPDK and only uses application layer polling.

    3 HugePages Reduce TLB miss

    After the MMU (Memory Management Unit) is introduced into the operating system, the CPU needs to access the memory twice to read the memory data. The first time is to query the page table to convert the logical address into a physical address, and then access the physical address to read data or instructions.

    In order to reduce the problem of too long query time caused by too many pages and too large page tables, TLB (Translation Lookaside Buffer) was introduced, which can be translated as an address translation buffer. The TLB is a memory management unit, generally stored in a register, which stores a small portion of the page table entries that are most likely to be accessed currently.

    After the TLB is introduced, the CPU will first go to the TLB to address. Since the TLB is stored in the register and it only contains a small part of the page table entries, the query speed is very fast. If the addressing in the TLB is successful (TLB hit), there is no need to query the page table in RAM; if the addressing in the TLB fails (TLB miss), you need to query the page table in RAM. After querying, the page will be updated. into the TLB.

    DPDK uses HugePages, which supports page sizes of 2MB and 1GB under x86-64, which greatly reduces the total number of pages and the size of the page table, thus greatly reducing the probability of TLB miss and improving CPU addressing performance. .

    4 Other optimizations

    • SNA (Shared-nothing Architecture), the software architecture is decentralized and tries to avoid global sharing and bring about global competition. , losing the ability to scale horizontally. Under the NUMA system, memory is not used remotely across Nodes.
    • SIMD (Single Instruction Multiple Data), from the earliest mmx/sse to the latest avx2, the capabilities of SIMD have been increasing. DPDK uses batch processing of multiple packets at the same time, and then uses vector programming to process all packets in one cycle. For example, memcpy uses SIMD to increase speed.
    • cpu affinity: CPU affinity

    XDP

    xdp represents the eXpress data path , use ebpf for packet filtering. Compared with dpdk, which sends data packets directly to user mode and uses user mode as a fast data processing plane, xdp creates a data fast plane in the driver layer. The data packet is processed before the data is dma'd by the network card hardware into the memory and skb is allocated.

    Please note that XDP does not perform Kernel bypass on data packets, it just does a little pre-checking in advance.

    Compared with DPDK, XDP has the following advantages:

    • No need for third-party code libraries and licenses
    • Supports both polled and interrupt-based networks
    • No need Allocate huge pages
    • No dedicated CPU required
    • No need to define a new security network model

    XDP usage scenarios include:

    • DDoS Defense
    • Firewall
    • XDP_TX-based load balancing
    • Network statistics
    • Complex network sampling
    • High-speed trading platform

    OK, the above is today’s sharing. If you think there are other packet collection engines, you can leave a message to share.


The above is the detailed content of Several classic Linux packet collection engines. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:Linux中文社区
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!