Home > Operation and Maintenance > Nginx > Learn about Linux ABI in ten minutes

Learn about Linux ABI in ten minutes

WBOY
Release: 2023-08-03 16:33:04
forward
1058 people have browsed it

十分钟让你了解 Linux ABI

LCTT Translation Note: Yesterday, AlmaLinux said that it would give up its 1:1 compatibility with RHEL, but will maintain ABI compatibility with RHEL so that it can be used in Software running on RHEL can run seamlessly on AlmaLinux. Some students may not be very clear about the concept of ABI, so I translated this article for everyone to understand.

Many Linux enthusiasts are familiar with Linus Torvalds' famous admonition: "We don't destroy user space", but perhaps not everyone who hears this sentence is aware of its meaning. meaning. This "First Rule" reminds developers about the stability of the Application Binary Interface (ABI), which is used for communication and configuration between applications and the kernel. What follows is intended to familiarize the reader with ABI concepts, explain why ABI stability is important, and discuss what is included in a Linux stable ABI. The continued growth and evolution of Linux has required changes to the ABI, some of which have been controversial.

What is ABI?

ABI stands for Applications Binary Interface. One way to understand the concept of ABI is to consider how it differs from other concepts. For many developers, the Applications Programming Interface (API) is more familiar. Typically, a library's header files and documentation are considered its API, along with standard documentation like

HTML5

. Programs calling the library or exchanging data in string format must adhere to the conventions described in the API, otherwise you may get unexpected results. ABIs are similar to APIs in that they specify how commands are interpreted and binary data is exchanged. For C programs, the ABI typically includes the return type and argument list of functions, the layout of structures, and the meaning, order, and scope of enumerated types. As of 2022, the Linux kernel is still almost entirely C programs and therefore must adhere to these specifications.

A description of the "

Kernel System Call Interface

" can be found in the "Linux Manual Section 2" and includes similar functions that can be called from middleware applications. C version functions of mount and sync. The binary layout of these functions is the first important part of the Linux ABI. To the question "What does the stable ABI for Linux include?", many users and developers answer "the contents of sysfs (/sys) and procfs (/proc)". In fact, Official Linux ABI documentation does mainly focus on these Virtual File Systems. The previous section focused on how Linux ABI is used in programs, but it did not cover the equally important human factors. As shown in the figure below, the functionality of the ABI requires the kernel community, a C compiler (such as

GCC

or clang), and the creation of a user-space C library (usually glibc ), and binary applications laid out in Executable and Linkable Format (ELF).

Collaboration within the development community十分钟让你了解 Linux ABI

Why do we care about ABI?

The stability guarantee of the Linux ABI from Torvalds himself enables Linux distributions and individual users to update the kernel independently without being affected by the operating system.

Without a stable ABI for Linux, most or even all of the operating system would need to be reinstalled every time the kernel needed to be patched to address a security issue. Clearly, the stability of the binary interface is one of the important factors in the usability and widespread adoption of Linux.

十分钟让你了解 Linux ABITerminal output

As shown in the figure above, the kernel (in linux-libc-dev) and Glibc (in libc6-dev ) all provide bit masks that define file permissions. Obviously, these two sets of definitions must agree! apt The package manager will identify each file provided by the package. Potentially unstable parts of the Glibc ABI are located in the bits/ directory.

The stability guarantees of the Linux ABI work well in most cases. According to Conway's LawConway's Law, annoying technical problems that arise during the development process are often due to misunderstandings or disagreements between different software development communities, and these communities have all contributed to Linux. The interface between different communities can be easily imagined through the metadata of the Linux package manager, as shown in the figure above.

Y2038: An Example of ABI Breakage

The Linux ABI can be better understood by considering an example of the currently ongoing, slowly happening "Y2038" ABI breakage . In January 2038, the 32-bit time counter will roll back to all zeros, just like the odometer on older vehicles. January 2038 may still sound far away, but it’s a safe bet that many IoT devices sold today will still be operational. Ordinary products like the Smart Meters and Smart Parking Systems installed this year may use a 32-bit processor architecture and may not support software updates.

The Linux kernel has internally switched to using the 64-bit time_t opaque data type to represent later time points. This means that system calls like time() have had their function signatures changed on 64-bit systems. The extent of these efforts can be clearly seen in the kernel header files (such as time_types.h), where new and _old versions of the data structures are placed.

十分钟让你了解 Linux ABIOdometer rollover

The Glibc project also supports 64-bit time, so you’re all set, right? Unfortunately, based on discussions on the Debian mailing list, this is not the case. Distros face the difficult choice of either providing two versions of all binary packages for 32-bit systems or providing two versions for installation media. In the latter case, users on 32-bit time will have to recompile their applications and reinstall. As always, proprietary applications are a real headache.

What exactly is included in the Linux stable ABI?

Understanding the stable ABI is a bit tricky. Something to consider is that while most of sysfs is a stable ABI, the debug interfaces are definitely unstable because they expose kernel internals to user space. Linus Torvalds once said, "Don't break userspace", and usually he meant protecting ordinary users who "just want it to work", rather than system programmers and kernel engineers, who should be able to read kernel documentation and sources code to see what has changed between versions. The image below illustrates this difference.

十分钟让你了解 Linux ABIStability Guarantees

It is unlikely that the average user will interact with unstable parts of the Linux ABI, but system programmers may do so unintentionally. All parts of sysfs (/sys) and procfs (/proc) are stable except /sys/kernel/debug.

What about other binary interfaces visible to user space, including device files in /dev, kernel log files (can be read through the dmesg command), What about file system metadata or "boot parameters" provided in the kernel's "command line" (visible in a boot loader such as GRUB or u-boot)? Of course, "it depends."

Mounting old file systems

Besides the fact that a Linux system hangs during boot, the inability to mount a file system is the most disappointing thing. If the file system is on a paying customer's SSD, then the problem is really serious. A Linux filesystem that was mountable under an older kernel version should still be mountable when the kernel is upgraded, right? In fact, “it depends.”

In 2020, an injured Linux developer complained on the kernel mailing list:

The kernel has accepted this as a valid mountable filesystem format without any errors or warnings of any kind and has been working reliably like this for years... I have always generally thought that mounting an existing root filesystem belongs in the kernel<->userspace or Kernel<->The extent of existing system boundaries, defined by what is accepted by the kernel and used successfully by existing userspace, and upgrading the kernel should be compatible with existing userspace and systems.

But there's a problem: these unmountable filesystems are created using a proprietary tool that relies on flags defined by, but not used by, the kernel. This flag does not appear in Linux's API header files or procfs/sysfs, but is an implementation detail. Therefore, interpreting this flag in user space code means relying on "Undefined Behavior", a phrase that makes almost every software developer shudder. When the kernel community improved its internal testing and started working on new consistency checks, the "man 2 mount" system call suddenly started rejecting file systems with proprietary formats. Since the format's creator was clearly a software developer, he failed to gain sympathy from kernel filesystem maintainers.

十分钟让你了解 Linux ABIConstruction sign says crews working on trees

Dmesg log of threaded kernel

/dev Is the file format in the directory guaranteed to be stable or unstable? dmesg command will read the content from the file /dev/kmsg. In 2018, a developer implemented threading for dmesg output, allowing the kernel to "print a series of printk() messages to the console without being interrupted and/or Or interfered by the concurrency of other threads printk()". sounds good! Threading is achieved by adding a thread ID to each line of /dev/kmsg output. Readers who are paying close attention will realize that this change changes the ABI of /dev/kmsg, which means that applications that parse this file will also need to be modified accordingly. Since many distributions do not compile the kernel with the new features enabled, most users of /bin/dmesg may not notice this, but this change breaks the GDB debugger Read The ability to fetch kernel logs.

Indeed, astute readers will think that users of GDB are out of luck because the debugger is a developer tool. This is not actually the case, as the code that needs to be updated to support the new /dev/kmsg format is located in the "in-tree" part of the kernel's own Git source code repository. For a normal project, the inability of programs within a single codebase to work together is an obvious bug, so a patch to enable GDB to work with threaded /dev/kmsg has been merged .

What about BPF programs?

BPF is a powerful tool that can be monitored and even configured in real time in the running kernel. BPF was originally designed to support real-time network configuration by allowing system administrators to modify packet filters on the fly from the command line. Alexei Starovoitov and others have greatly extended BPF to enable tracing of arbitrary kernel functions. Tracing is clearly the domain of developers, not regular users, so it's obviously not subject to any ABI guarantees (although the bpf() system call has the same stability promises as other system calls). On the other hand, BPF programs that create new functionality offer the possibility of replacing kernel modules as the de facto standard means of extending the kernel. Kernel modules make devices, file systems, encryption, networking, etc. work properly, so are obviously facilities that the average user who "just wants it to work" relies on. The problem is that, unlike most open source kernel modules, BPF programs are traditionally not in the kernel source code.

In the spring of 2022, a proposal came into focus that proposed using mini-BPF programs instead of device driver patches for a wide range of human interface devices (such as mice and keyboards) support.

A heated discussion ensued, but the issue was apparently resolved in Torvalds' comments at the Open Source Summit:

He points out that if you break "real userspace tools used by normal (non-kernel developer) users" then you need to fix it, whether eBPF is used or not.

Consensus seems to be emerging that developers who want their BPF programs to still work after a kernel update will need to commit them to an as-yet-unspecified location in the kernel source code repository . Stay tuned to see what policies the kernel community adopts regarding BPF and ABI stability.

Conclusion

The kernel's ABI stability guarantees apply to procfs, sysfs, and the system call interface, but there are important exceptions. When a kernel change breaks "in-tree" code or a userspace application, the offending patch is often quickly rolled back. For proprietary code that relies on kernel implementation details, although these details are accessible from user space, it is not protected and receives limited sympathy when things go wrong. When issues like Y2038 cannot avoid ABI breaking, the transition is made in the most deliberate and systematic way possible. And new features like BPF programs raise unanswered questions about the boundaries of ABI stability.

ACKNOWLEDGMENTS

Thanks to Akkana Peck, Sarah R. Newman and Luke S. Crawford for their help with earlier versions of the material. Comment.

The above is the detailed content of Learn about Linux ABI in ten minutes. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template