Home > Article > Backend Development > Portability and cross-platform development of C++ (long article)
Overview
Today we’ll talk about the portability of C. If you usually use C for development and you are not very clear about the portability issues of C, then I suggest you take a look at this series. Even if you don't currently have a need for cross-platform development, it's still helpful to know about portability.
C’s portability is a huge topic, covering many aspects such as compilers, operating systems, and hardware systems. Each aspect has a lot of content. In view of my limited ability and energy, I can only introduce the most common problems in each aspect for your reference.
Later I will introduce it from the aspects of compiler, C syntax, operating system, third-party libraries, auxiliary tools, development process, etc.
Compiler
During the cross-platform development process, many problems are related to the compiler. So let’s talk about compiler-related issues first.
Compiler Selection
First of all, GCC is a priority to support, because GCC is available on almost all operating system platforms. It basically becomes a universal compiler. If your code can be compiled and passed by GCC on platform A, and then compiled with a similar version of GCC on platform B, there will generally not be much of a problem. Therefore, GCC must definitely consider supporting it.
Secondly, consider whether to support the local compiler. The so-called local compiler is the compiler produced by the operating system manufacturer. For example: the native compiler for Windows is Visual C. The local compiler relative to Solaris is SUN's CC. If you are more sensitive to performance or want to use the advanced features of some local compilers, you may have to consider supporting local compilers as well as GCC.
Compilation warning
The compiler is the programmer's friend. Many potential problems (including portability) can be discovered and given warnings by the compiler. If you usually pay attention to these warning messages, you can reduce a lot of them. trouble. Therefore, I strongly recommend:
1 Increase the warning level of the compiler;
2 Do not ignore the compiler's warning message easily.
Cross compiler
For the definition of cross compiler, see "Wikipedia". In layman's terms, it means compiling a binary program on platform A and running it on platform B. Suppose the application you want to develop runs on Solaris, but you do not have a SPARC machine that can run Solaris. At this time, a cross-compiler can come in handy. Under normal circumstances, GCC is used to make a cross-compiler. Due to space limitations, we will not discuss it in depth here. Interested students can refer to "here".
Exception handling
Due to the limited space of the previous post "Grammar", I didn't have time to talk about exceptions. Now I will talk about the parts related to exceptions separately.
Beware of new failure to allocate memory
The code generated by early old-fashioned compilers will return a null pointer if new fails. The Borland C 3.1 I used seemed to be like this. This kind of compiler should be rare now. If the compiler you are currently using still behaves like this, you are in trouble. You may consider overloading the new operator to throw a bad_alloc exception to facilitate exception handling.
A slightly newer compiler does not just return a null pointer. When the new operator finds that memory is out of control, according to the standard (see C 03 standard section 18.4.2), it should call the new_handler function (the prototype is typedef void (*new_handler)();). The standard recommends that the new_handler function does the following three things:
1. Try to get more memory;
2. Throw a bad_alloc exception;
3. Call abort() or exit() to exit the process.
Since the new_handler function can be reset (by calling set_new_handler), it may have the above behaviors.
In summary, if new fails to allocate memory, there are three possibilities:
1. Return a null pointer;
2. Throw an exception;
3. The process terminates immediately.
If you want your code to be more portable, you have to take all three situations into consideration.
Use exception specifications with caution
In my opinion, exception specifications are not a good thing. If you don’t believe me, you can read Article 75 of "C Coding Standards - 101 Rules, Guidelines & Best Practices". (The specific disadvantages will be discussed in a separate post on C exception and error handling later.) Closer to home, according to the standard (see Chapter 18.6.2 of the 03 standard), if an exception thrown by a function is not included in the exception specification of the function , then unexcepted() should be called. But not all compiler-generated code adheres to standards (such as some versions of VC compilers). If the compiler you need to support behaves inconsistently with exception specifications, consider removing the exception specification declaration.
Don’t throw exceptions across modules
The module mentioned here refers to the dynamic library. If your program contains multiple dynamic libraries, do not throw exceptions outside the module's exported functions. After all, there is no ABI standard for C yet (it is estimated that there may not be one in the future), and throwing exceptions across modules will lead to many unpredictable behaviors.
Do not use Structured Exception Handling (SEH)
If you have never heard of SEH, just pretend I didn’t mention it and skip this paragraph. If you are used to using SEH before, you need to change this habit before you plan to write cross-platform code. Code containing SEH can only be compiled on the Windows platform and will certainly not be cross-platform.
About catch(…)
Logically speaking, the catch(…) statement can only capture C exception types, and is powerless for non-C exceptions such as access violations and division-by-zero errors. But in some cases (such as some VC compilers), such as access violations and division-by-zero errors can also be caught by catch(...). Therefore, if you want your code to be portable, you cannot rely on the above catch(...) behavior in your program logic.
Hardware system related
The topic we talked about this time is mainly related to the hardware system. For example, if your program needs to support different types of CPUs (x86, SPARC, PowerPC), or the same type of CPU with different word lengths (such as x86 and x86-64), then you need to care about the hardware system.
Size of basic types
The size (number of bytes occupied) of basic types in C will change as the CPU word length changes. Therefore, if you want to express the number of bytes occupied by an int, do not write "4" directly (by the way, writing "4" directly also violates the taboo of Magic Number, see here for details), but should write " sizeof(int)"; conversely, if you want to define a signed integer whose size must be 4 bytes, do not use int directly, but use a pre-typedef fixed-length type (such as int32_t of the boost library, ACE library ACE_INT32, etc.).
I almost forgot that the size of pointers also has the above problems, so be careful.
Endianness
If you haven’t heard of “Endianness”, please read “Wikipedia”. To use a popular analogy, if there is a 4-byte integer 0x01020304 on a big-endian machine, it will become 0x04030201 when transmitted to a little-endian machine through the network or file; it is said that there is also a medium-endian machine. (But I have not come into contact with it), the above integer will become 0x02010403.
If the application you write involves network communication, you must remember to translate the host code and network code; if it involves transferring binary files across machines, you must also remember to perform similar conversions.
Memory Alignment
If you don’t know what “memory alignment” is, please read “Wikipedia”. To put it simply, for the sake of CPU processing performance, the data in the structure are not close to each other, but should be separated by some space. In this case, the address of each data in the structure is exactly an integer multiple of a certain word length.
Since the details of memory alignment are not defined in the C standard, your code cannot rely on the details of alignment. Wherever the size of a structure is calculated, sizeof() is written honestly.
Some compilers support #pragma pack prepared statements (which can be used to modify the alignment word length). However, this syntax is not supported by all compilers, so use it with caution.
Shift operation
For the right shift operation of signed integers, some systems use arithmetic right shift by default (the highest sign bit remains unchanged), and some use logical right shift by default (the highest sign bit is filled with 0). Therefore, do not perform right shifts on signed integers. By the way, even if there are no portability issues, try to use shift operators as little as possible in your code. Those students who try to use shift operations to improve performance should pay more attention. Not only does this have poor readability, but it is also thankless. As long as the compiler is not too mentally retarded, it will automatically handle this optimization for you, without the programmer having to worry about it.
Operating system
The previous post mentioned topics related to "hardware system". Today, let's talk about topics related to the operating system. C There are a lot of trivial matters related to OS in cross-platform development, so today I will be verbose for a long time. Please forgive me:-)
In order not to beat around the bush, Linux and various Unix are collectively referred to as Posix systems below. .
File System (FileSystem, hereinafter referred to as FS)
Most novices who have just started cross-platform development will encounter problems related to FS. So let’s talk about FS first. To sum up, the FS differences that are easy to encounter during development mainly include the following: differences in directory separators; differences in case sensitivity; differences in prohibited characters in paths.
In order to cope with the above differences, you should pay attention to the following points:
1. File and directory naming should be standardized
When naming files and directories, try to use only letters and numbers. Do not place two files with similar names (only the case is different in the names, such as foo.cpp and Foo.cpp) in the same directory. Do not use some OS reserved words (such as aux, con, nul, prn) as file names or directory names.
To add, the naming just mentioned includes source code files, binary files and other files created during runtime.
2. #include statements should be standardized
When you write #include statements, be careful to use forward slashes "/" (more common) instead of backslashes "\" (available only in Windows). The file and directory names in the #include statement must be in exactly the same case as the actual names.
3. If the code involves FS operations, try to use ready-made libraries
There are already many mature third-party libraries for FS (such as boost::filesystem). If your code involves FS operations (such as directory traversal), try to use these third-party libraries, which can save you a lot of work.
★Carriage Return CR/Line Feed LF for text files
This annoying problem occurs due to the inconsistent handling of carriage return/line feed by several well-known operating systems. The current situation is: Windows uses both CR and LF; Linux and most Unix use LF; Apple's Mac series uses CR.
For source code management, fortunately, many version management software (such as CVS, SVN) will intelligently handle this problem, allowing you to retrieve local source code from the code base to adapt to the local format.
If your program needs to process text files at runtime, pay attention to the difference between opening in text mode and opening in binary mode. Also, consider appropriate handling if it involves transferring text files across different systems.
★File search path (including searching for executable files and dynamic libraries)
Under Windows, if you want to execute a file or load a dynamic library, you will generally search the current directory; this is not always the case in Posix systems. So if your application involves starting processes or loading dynamic libraries, be careful about this difference.
★Environment variables
For the search path problem mentioned above, some students want to introduce the current path by modifying PATH and LD_LIBRARY_PATH. If you use this method, it is recommended that you only modify the process-level environment variables and do not modify the system-level environment variables (modifications at the system level may affect other software on the same machine, causing side effects).
★Dynamic library
If your application uses a dynamic library, it is strongly recommended that the dynamic library export standard C-style functions (try not to export classes). If you load a dynamic library in a Posix system, remember to use the RTLD_GLOBAL flag with caution. This flag will enable the global symbol table, which may cause symbol name conflicts between multiple dynamic libraries (once this happens, incredible runtime errors will occur, which is extremely difficult to debug).
★Service/Guardian Process
If you are not clear about the concepts of services and guard processes, please read Wikipedia (here and here). For convenience of description, they are collectively referred to as services below.
Since most of the modules developed in C are background modules, service problems are often encountered. Writing services requires calling several system-related APIs, resulting in tight coupling with the operating system, making it difficult to use one set of code. Therefore, a better way is to abstract a general service shell, and then mount the business logic code as a dynamic library under it. In this case, at least one set of business logic code is ensured; although two sets of service shell code are required (one for Windows and one for Posix), they are business-independent and can be easily reused.
★Default stack size
Different operating systems, the default stack size varies greatly, ranging from dozens of KB (it is said that Symbian only has 12K, which is really stingy) to several MB. Therefore, you should ask about the default stack size of the target system in advance. If you encounter a stingy system like Symbian, you can consider using compiler options to increase it. Of course, it is also important to develop the good habit of "not defining large arrays/large objects on the stack", otherwise the stack will be bursting no matter how large it is.
Multi-threading
The posts I have written in the past month or so have been quite complicated, which has resulted in this series not being updated for a long time. As a result, another netizen urged me in the comments, which made me a little embarrassed. Hurry up and finish the multi-threading chapter today. The last time I talked about operating systems, I talked about a lot of miscellaneous topics because OS-related topics were relatively trivial. At that time, I saw that the article was a bit long, so I left the multi-process and multi-threading parts until later.
★Compiler
◇About C runtime library options
Let’s first talk about a very basic question: about the settings of the C runtime library (hereinafter referred to as CRT: C Run-Time). I didn't want to talk about such a low-level issue, but there are several people around me who have suffered losses in this place, so I might as well talk about it.
Most C compilers come with a CRT (maybe more than one). The CRT that comes with some compilers may be divided into single-threaded CRT and multi-threaded CRT based on thread support. When you want to carry out multi-threaded development, don't forget to make sure that the relevant C project uses a multi-threaded CRT. Otherwise it will be an ugly death.
Especially when you use Visual C to create engineering projects, you must be more careful. If the newly created project does not contain MFC (including Console projects and Win32 projects), the default setting of the project will be to use "single-threaded CRT", as shown in the figure below:
◇About optimization options
"Optimization Options" is another critical compiler-related topic. Some compilers provide optimization options that are said to be awesome, but some optimization options may have potential risks. The compiler may disrupt the order of execution of instructions on its own initiative, resulting in unexpected thread race conditions (Race Condition, see "here" for a detailed explanation). Liu Weipeng gave several typical examples in "C Multi-threaded Memory Model", you can take a look.
It is recommended to only use the compiler's regular speed optimization options. The additional effects of other fancy optimization options may not be obvious, but the potential risks are not small. It's really not worth the risk.
Taking GCC as an example: It is recommended to use the -O2 option (in fact, -O2 is a collection of a bunch of options). There is no need to risk using -O3 (unless you have a very good reason). In addition to -O2 and -O3, GCC has a large number (estimated to be hundreds) of other optimization options. If you plan to use one of the options, you must first understand its characteristics and possible side effects, otherwise you will not know how to die in the future.
★Selection of Thread Library
Since the current C 03 standard hardly involves thread-related content (even if C 0x includes a thread standard library in the future, the support of compiler manufacturers may not be comprehensive in the short term), so For a long time to come, cross-platform multi-threading support will still depend on third-party libraries. So the choice of thread library is very important. Here is a brief introduction to several well-known cross-platform thread libraries.
◇ACE
First let’s talk about ACE, a library with a long history. If you've never encountered it before, look at literacy "here" first. Judging from the full name of ACE (Adaptive Communication Environment), it should be based on "communication". However, ACE's support for the side business of "multi-threading" is still very comprehensive, such as mutex locks (ACE_Mutex), condition variables (ACE_Condition), semaphores (ACE_Semaphore), barriers (ACE_Barrier), atomic operations (ACE_Atomic_Op), etc. Certain types such as ACE_Mutex are also subdivided into thread read-write locks (ACE_RW_Thread_Mutex), thread recursive locks (ACE_Recursive_Thread_Mutex), and so on.
In addition to comprehensive support, ACE has another obvious advantage, that is, it has good support for various operating system platforms and their own compilers. Including some old-fashioned compilers (such as VC6), it can also support (the support mentioned here not only means that it can be compiled, but also can run stably). This advantage is quite obvious for cross-platform development.
What are the shortcomings? Since ACE was started very early (probably in the mid-1990s), many of the old features of C had not yet been released (let alone new features), so the overall style of ACE feels rather old-fashioned and far inferior to boost. So fashionable and avant-garde.
◇boost::thread
boost::thread is in sharp contrast to ACE. This thing seems to have been introduced since boost version 1.32, and is younger than ACE. However, thanks to the support of a group of experts in boost, the development is quite fast. As of the current boost version 1.38, it can also support many features (but it seems not as many as ACE). In view of the fact that many members of the C standard committee gather in the boost community, as time goes by, boost::thread will eventually become the rising star of C threads, with a bright future!
The disadvantage of boost::thread is that it does not support many compilers, especially some old-fashioned compilers (many boost sub-libraries have this problem, mostly because they use some advanced template syntax). This is an obvious problem for cross-platform.
◇wxWidgets and QT
wxWidgets and QT are both GUI interface libraries, but they also have built-in support for threads. For an introduction to wxWidgets threads, you can see "here", and for an introduction to QT threads, you can see "here". These two libraries have similar support for threads, and both provide commonly used mechanisms such as mutex, condition, and semaphore. However, the features are not as rich as ACE.
◇How to weigh
For developing GUI software and already using wxWidgets or QT, you can directly use their built-in thread libraries (provided that you only use basic thread functions). Due to their built-in thread library, the features are a bit thin. In case you need some advanced threading functionality, consider replacing it with boost::thread or ACE.
As for the choice between boost::thread and ACE, it mainly depends on the needs of the software. If you want to support many and complex platforms, it is recommended to use ACE to avoid problems that are not supported by the compiler. If you only need to support a few mainstream platforms (such as Windows, Linux, Mac), it is recommended to use boost::thread. After all, compilers on mainstream operating systems still have good support for boost.
★Programming Precautions
In fact, there are many things that need to be paid attention to in multi-threaded development. I can only briefly list a few things that are more impressive.
◇About volatile
When talking about the traps that may be encountered in multi-threaded programming, we have to mention the volatile keyword. If you don't know much about it, first read "here" to learn about it. Since neither the C 98 nor C 03 standards define a multi-threaded memory model, the standards only have something to do with volatile and threads. As a result, a lot of saliva in the C community is focused on volatile (including the saliva of many C big guys). In view of this, I will not go into details here. I recommend a few great articles: Andrei Alexandrescu’s article “Here”, and Hans Boehm’s articles “Here” and “Here”. Everyone, go and read it yourself.
◇About atomic operations
Some students only know that competing writes by multiple threads require locking, but they do not know that multiple reads and single writes also need to be protected. For example, there is an integer int nCount = 0x01020304; in the concurrent state, a writing thread modifies its value nCount = 0x05060708; another reading thread obtains the value. So is it possible for the reading thread to read a "bad" (such as 0x05060304) data?
Whether the data is corrupted or not depends on whether the reading and writing of nCount are atomic operations. This depends on many hardware-related factors (including the type of CPU, the word length of the CPU, the number of bytes of memory alignment, etc.). In some cases, data corruption may indeed occur.
Since we are talking about cross-platform development, God knows what kind of hardware environment your code will be executed in the future. Therefore, when dealing with similar problems, you still need to use the atomic operation classes/functions provided by third-party libraries (such as ACE's Atomic_Op) to ensure safety.
◇About object destruction
In the previous series of posts "How do C objects die?", the problem of unnatural death of threads under the Win32 platform and the Posix platform has been introduced respectively.
Since the bottom layer of the above-mentioned cross-platform thread libraries still needs to call the thread API that comes with the operating system, everyone must still do their best to ensure that all threads can die naturally.
Related recommendations:
Comparison of similarities and differences between Python and ruby
Introduction to Java language (power node arrangement)
The above is the detailed content of Portability and cross-platform development of C++ (long article). For more information, please follow other related articles on the PHP Chinese website!