C language compiles to generate a ".OBJ" binary file (object file). In C language, after the source program (.c file) is compiled by the compiler, a binary file (called an object file) with a suffix of ".OBJ" will be generated; finally, it is called a "link" The software connects this ".OBJ" file with various library functions provided by the C language to generate an executable file with the suffix ".EXE".
The operating environment of this tutorial: windows7 system, c99 version, Dell G3 computer.
Compilation of C language source files
The suffix name of the C language source file is ".c", and the suffix name of the compiled file is ".obj". After connection The suffix name of the executable file is ".exe".
Steps to create a program in C language:
Editing: It is to create and modify the source code of the C program - the program we write is called source code.
Compilation: It is to convert the source code into machine language. The output of the compiler becomes the object code, and the file that stores them is called the object file. The extension is .o or .obj. (This part of compilation refers to the assembler compiling assembly language or the compiler compiling high-level language)
Linking: The linker combines the source code from various modules generated by the compiler, and then from Add necessary code modules to the program library provided by C language and form them into an executable file. The extension is .exe under Windows and has no extension under Unix.
Execution: Run the program.
After the C language source program is compiled by the C language compiler, a binary file (called an object file) with a suffix of ".OBJ" is generated. Finally, a binary file called a "connection file" is generated. "Program" (Link) software connects this ".OBJ" file with various library functions provided by the C language to generate an executable file with the suffix ".EXE". Obviously C language cannot be executed immediately.
The process diagram is as follows:
As you can see from the figure, the entire code compilation process is divided into two processes: compilation and linking. Compilation corresponds to the figure in the figure The part enclosed by curly braces, and the rest is the linking process.
Compilation process
- The compilation process can be divided into two stages: Compilation and will compile .
Compilation: Compilation is to read the source program (character stream), analyze it lexically and grammatically, and convert high-level language instructions into functions Equivalent to assembly code, the compilation process of source files consists of two main stages:
The first stage is the preprocessing stage, which occurs before the formal compilation stage. The preprocessing phase will modify the content of the source file based on the preprocessing directives that have been placed in the file. For example, the #include directive is a preprocessing directive that adds the contents of the header file to the .cpp file. This method of modifying source files before compilation provides great flexibility to adapt to the constraints of different computer and operating system environments. The code required for one environment may be different from the code required for another environment because the available hardware or operating systems are different. In many cases, you can put code for different environments in the same file and then modify the code during the preprocessing phase to adapt it to the current environment.
Mainly deal with the following aspects:
-
Macro definition instructions, such as #define a b
- For this kind of pseudo-instruction, all the precompilation needs to do is to replace all a with b in the program, but a as a string constant will not be replaced. There is also #undef, which will cancel the definition of a certain macro so that future occurrences of the string will no longer be replaced.
-
Conditional compilation instructions, such as #ifdef, #ifndef, #else, #elif, #endif, etc.
- The introduction of these pseudo-instructions allows programmers to decide which codes to process by the compiler by defining different macros. The precompiler will filter out unnecessary code based on relevant files.
-
The header file contains instructions, such as #include 'FileName' or #include etc..
- In header files, the pseudo-instruction #define is generally used to define a large number of macros (the most common ones are character constants), and also contain declarations of various external symbols. The main purpose of using header files is to make certain definitions available to multiple different C source programs. Because in the C source program that needs to use these definitions, you only need to add a #include statement, without having to repeat these definitions in this file. The precompiler will add all the definitions in the header file to the output file it generates for processing by the compiler. The header files included in the C source program can be provided by the system. These header files are generally placed in the /usr/include directory. Use angle brackets () to #include them in the program. In addition, developers can also define their own header files. These files are generally placed in the same directory as the C source program. In this case, double quotes ('') should be used in #include.
-
Special symbols, the precompiler can recognize some special symbols
- For example, the LINE logo appearing in the source program will be interpreted as the current line number (decimal number), and FILE will be interpreted as the name of the currently compiled C source program. The precompiler will replace occurrences of these strings in the source program with appropriate values.
What the precompiler does is basically "replace" the source program. After this substitution, an output file with no macro definitions, no conditional compilation instructions, and no special symbols is generated. The meaning of this file is the same as the unpreprocessed source file, but the content is different. Next, this output file is translated into machine instructions as the output of the compiler.
The second stage of compilation and optimization stage, In the output file obtained after precompilation, there are only constants, such as numbers, strings, variable definitions, and Keywords in C language, such as main, if, else, for, while, {,}, ,-,*,\ and so on.
- What the compiler has to do is to use lexical analysis and syntax analysis to confirm that all instructions comply with the grammatical rules, and then translate them into equivalent intermediate code representation or assembly code.
- Optimization processing is a relatively difficult technology in the compilation system. The issues it involves are not only related to the compilation technology itself, but also have a lot to do with the hardware environment of the machine. Part of optimization is the optimization of intermediate code. This optimization is independent of the specific computer. Another kind of optimization is mainly aimed at the generation of target code.
- For the former optimization, the main work is to delete public expressions, loop optimization (code extraction, strength weakening, changing loop control conditions, merging of known quantities, etc.), copy propagation, and useless assignments deletion, etc.
- The latter type of optimization is closely related to the hardware structure of the machine. The most important thing is to consider how to make full use of the values of relevant variables stored in each hardware register of the machine to reduce the number of memory accesses. In addition, how to make some adjustments to the instructions according to the characteristics of the machine hardware execution instructions (such as pipeline, RISC, CISC, VLIW, etc.) to make the target code shorter and the execution efficiency higher is also an important research topic.
Assembly: Assembly actually refers to the process of translating assembly language code into target machine instructions. For each C language source program processed by the translation system, the corresponding target file will eventually be obtained through this processing. What is stored in the target file is the machine language code of the target that is equivalent to the source program. Object files are composed of segments. Usually there are at least two sections in an object file:
- Code section: This section mainly contains program instructions. This segment is generally readable and executable, but generally not writable.
- Data segment: mainly stores various global variables or static data used in the program. Generally, data segments are readable, writable, and executable.
There are three main types of object files in the UNIX environment:
- Relocatable file: Contains code and data suitable for linking with other object files to create an executable or shared object file.
- Shared object file: This file stores code and data suitable for linking in two contexts. The first is that the linker can process it with other relocatable files and shared object files to create another object file; the second is that the dynamic linker can process it with another executable file and other shared object files. Combined together, they create a process image.
- Executable file: It contains a file that can be executed by a process created by the operating system. What the assembler generates is actually the first type of object file. For the latter two, some other processing is required to obtain them. This is the job of the linker.
Linking process:
The object file generated by the assembler cannot be executed immediately, and there may be There are many unresolved issues.
- For example, A function in a source file may reference a symbol (such as a variable or function call, etc.) defined in another source file; in The program may call the function in a certain library file, etc. All these problems need to be solved by the linker.
- The main job of the linker is to connect related target files to each other, that is, to connect the symbols referenced in one file with the definition of the symbol in another file, so that all these target files become A unified whole that can be loaded and executed by the operating system.
Based on the different linking methods of the same library function specified by the developer, the link processing can be divided into two types:
- Static linking: In this linking method, The code of the function will be copied from the static link library where it is located to the final executable program. In this way, these codes will be loaded into the virtual address space of the process when the program is executed. A static link library is actually a collection of object files, each of which contains the code for one or a group of related functions in the library.
- Dynamic linking: In this method, the code of the function is placed in an object file called a dynamic link library or shared object. What the linker does at this time is to record the name of the shared object and a small amount of other registration information in the final executable program. When this executable file is executed, the entire contents of the dynamic link library will be mapped into the virtual address space of the corresponding process at runtime. The dynamic linker will find the corresponding function code based on the information recorded in the executable program.
For function calls in executable files, dynamic linking or static linking can be used. Using dynamic linking can make the final executable file shorter and save some memory when a shared object is used by multiple processes, because only one copy of the code for this shared object needs to be saved in memory. But it does not necessarily mean that using dynamic links is superior to using static links. In some cases dynamic linking may cause some performance harm.
Related recommendations: "C Video Tutorial"
The above is the detailed content of What files are generated after compiling source files in C language?. For more information, please follow other related articles on the PHP Chinese website!