The following column golang tutorial will introduce to you a detailed explanation of the Golang GC garbage collection mechanism. I hope it will be helpful to friends in need!
#SummaryIn actual use of go In the process of learning the language, I encountered some seemingly strange memory usage phenomena, so I decided to conduct some research on the garbage collection model of the Go language. This article summarizes the results of the study. What is garbage collection? Once upon a time, memory management was a major problem for programmers developing applications. In traditional system-level programming languages (mainly C/C), programmers must carefully manage memory and control the application and release of memory. If you are not careful, memory leaks may occur. This kind of problem is difficult to find and locate, and has always been a nightmare for developers. How to solve this headache problem? In the past, two methods were generally used:After a large number of actual observations, we know that in object-oriented programming languages, the life cycle of most objects is very short. The basic idea of generational collection is to divide the heap into two or more spaces called generations. Newly created objects are stored in what is called the young generation (generally speaking, the size of the young generation will be much smaller than the old generation). With the repeated execution of garbage collection, objects with longer life cycles will be promoted ( promotion) to the old generation. Therefore, two different garbage collection methods, new generation garbage collection and old generation garbage collection, came into being, which are used to perform garbage collection on objects in their respective spaces. The speed of garbage collection in the new generation is very fast, several orders of magnitude faster than that in the old generation. Even if the frequency of garbage collection in the new generation is higher, the execution efficiency is still better than that in the old generation. This is because the life cycle of most objects is very short. , there is no need to promote to the old generation at all.
Go language garbage collection generally uses the classic mark and sweep algorithm.
The team also encounters the most and most difficult problem when practicing the go language, which is the memory problem (mainly gc). Here are the problems and experiences encountered In summary, everyone is welcome to communicate and discuss.
This problem was discovered when we conducted a stress test on the background service. We simulated a large number of user requests to access the background service. At this time, each service module can observe Significant increase in memory usage. However, when the stress test was stopped, the memory usage did not drop significantly. It took a long time to locate the problem, using various methods such as gprof, but the cause was still not found. Finally, I found out that it was normal at this time... There are two main reasons,
First, go's garbage collection has a trigger threshold, which will gradually increase as each memory usage increases (for example, the initial threshold is 10MB, then the next time it will be 20MB, and the next time it will become 40MB...) , if gc go is not triggered for a long time, it will actively trigger once (2min). After the memory usage increases during peak hours, unless you continue to apply for memory, it is almost impossible to trigger gc based on the threshold. Instead, you have to wait up to 2 minutes for active gc to start before triggering gc.
The second reason is that when the Go language returns the memory to the system, it just tells the system that the memory is no longer needed and can be recycled; at the same time, the operating system will adopt a "delay" strategy, which is not recycling immediately, but It will wait until the system memory is tight before it starts recycling, so that when the program re-applies for memory, it can obtain extremely fast allocation speed.
For back-end programs that require users to respond to events, stop the world part-time during golang gc is a nightmare. According to the above introduction, the gc performance of version 1.5 of go will be improved a lot after completing the above improvements. However, all garbage collection languages will inevitably face performance degradation during gc. In this regard, we should try to avoid frequently creating temporary heaps. Objects (such as &abc{}, new, make, etc.) to reduce the scanning time during garbage collection. For temporary objects that need to be used frequently, consider reusing them directly through the array cache; many people use the cgo method to manage the memory themselves and bypass garbage collection. , this method is not recommended unless it is absolutely necessary (it can easily cause unpredictable problems). Of course, it can still be considered if it is forced. The effect of this method is still very obvious~
One of our services needs to handle many long connection requests. During implementation, a read and write coroutine is opened for each long connection request, and endless for loop is used to continuously process the sending and receiving data. . When the connection is closed by the remote end, if these two coroutines are not processed, they will still keep running, and the occupied channels will not be released... You must be very careful here, and you must remove them after not using the coroutines. Close the dependent channel and determine whether the channel is closed in the coroutine to ensure its exit.
APR 30TH, 2016 8:02 PM | COMMENTS
This part mainly introduces some introductory knowledge of golang gc. Since gc content involves There are more, and I will sort them out bit by bit.
The main reference is this:
http://morsmachine.dk/machine-gc
is 14 It was written in 2000. It is estimated that the gc mechanism at that time was relatively simple. The new version of golang should have larger changes to gc
There is also the relevant part about golang gc in the go language reading notes
The term "Memory Leak" seems familiar to me, but in fact I have never seen its precise meaning.
Memory leak is explained from the perspective of the operating system. The vivid metaphor is that "the storage space (virtual memory space) that the operating system can provide to all processes is being used by a certain process. "Drain", the reason is that when the program is running, it will continuously dynamically open up storage space, and these storage spaces are not released in time after the operation is completed. After an application allocates a certain segment of memory, due to design errors, the program may lose control of the segment of memory, resulting in a waste of memory space.
If the program applies for a piece of memory in the memory space, this memory space is not released after the program is finished running, and the corresponding program does not have a good gc mechanism to perform the space application applied by the program. Recycling will cause memory leaks.
From the user's point of view, memory leakage itself will not cause any harm, because it does not affect user functions, but if "memory leakage" occurs
For C and C For languages without Garbage Collection, we mainly focus on two types of memory leaks:
There are many related issues involved in memory leaks, which will not be discussed here.
You can refer to this for specific advantages and disadvantages. Here is just a general introduction.
Therefore, two different garbage collection methods, new generation garbage collection and old generation garbage collection, came into being (classify first, and then prescribe the right medicine), which are used to perform garbage collection on objects in their respective spaces. Recycle. The speed of garbage collection in the new generation is very fast, several orders of magnitude faster than that in the old generation. Even if the frequency of garbage collection in the new generation is higher, the execution efficiency is still better than that in the old generation. This is because the life cycle of most objects is very short. , there is no need to promote to the old generation at all.
The gc in golang is basically the idea of mark clearing:
In the memory heap (because sometimes the memory is managed The heap data structure is used when creating pages, so it is called heap memory) which stores a series of objects, which may be related to other objects (references between these objects). A tracing garbage collector will at a certain point in time Stop the program that is running, and then it will scan the already known set of objects that the runtime already knows. Usually they are global variables and various objects that exist in the stack. gc will mark these objects, mark the status of these objects as reachable, find out all the references of objects in other places that can be reached from the current objects, and mark these objects as reachable objects. This step is called mark phase, that is, mark phase. The main purpose of this step is to obtain the status information of these objects.
Once all these objects have been scanned, gc will obtain all unreachable objects (objects with an unreachable status) and recycle them. This step is called the sweep phase, which isCleaning Phase.
gc only collects objects that are not marked as reachable. If gc does not recognize a reference, an object that is still in use may eventually be recycled, causing a program running error.
You can see the three main steps: scanning, recycling, and cleaning.
I feel that compared to other languages, the garbage collection model in golang is relatively simple.
The introduction of gc can be said to solve the problem of memory recycling. When using newly developed languages (java, python, php, etc.), users do not need to care about the release of memory objects, but only need to care about the application of objects. By performing related operations in runtime or vm, To achieve the effect of automatically managing memory space, this behavior of automatic recycling of memory resources that are no longer used is called garbage collection.
According to the previous statement, whether gc can properly identify a reference is the basis for gc to work normally. Therefore, the first question is how should gc identify a reference?
The biggest problem: It is difficult to identify references. It is difficult for machine code to know what counts as a reference. If a reference is missed by mistake, memory that was not ready to be freed will now be freed by mistake, so the strategy is to err on the side of more than less.
One strategy is to treat all memory spaces as possible references (pointer values). This is called conservative garbage collector (conservative garbage collector). This is how the Boehm garbage collector in C works. That is to say, ordinary variables in memory are treated as pointers, and try to cover all pointers. If by chance there are other objects in the space pointed to by the ordinary variable value, then this object will not be recycled. The go language implementation is fully aware of the type information of the object, and will only traverse the object pointed to by the pointer when marking, thus avoiding the waste of heap memory in the C implementation (solve about 10-30%).
2014/6 1.3 Introducing concurrent cleaning (concurrent execution of garbage collection and user logic?)
2015/8 1.5 Introducing three-color marking method
Regarding the introduction of concurrent cleaning, refer to here. In version 1.3, go runtime separated the mark and sweep operations. As before, all task execution is paused first and mark is started (the mark part still requires the original program stopped), the suspended task will be restarted immediately after mark is completed, and the sweep task will be executed in parallel with other tasks in the same way as ordinary coroutine tasks. If running on a multi-core processor, go will try to run the gc task on a separate core without affecting the execution of the business code. The go team itself says that it reduces the pause time by 50%-70%.
The basic algorithm is the cleaning and recycling mentioned before. The core of Golang gc optimization is to try to make the STW (Stop The World) time shorter and shorter.
Having said so much before, how to measure the star efficiency of gc and determine whether it has an impact on the running of the program? The first way is to set the environment variables of godebug. For details, you can refer to this article. It is a really good article: link. For example, run GODEBUG=gctrace=1 ./myserver
, if you want to To understand the output results, we need to conduct a further in-depth analysis of the principles of gc. The advantage of this article is that it clearly shows what factors determine golang’s gc time, so we can also take different targeted measures. Method to improve gc time:
According to the previous analysis, we can also know that gc in golang uses the clear mark method, so the total time of gc is:
Tgc = Tseq Tmark Tsweep
(T represents time)
Related to Tmark: 1 During the garbage collection process, the number of active objects in the heap, 2 The total amount of memory occupied by active objects with pointers 3 The number of pointers in the active object.
For example, the current program uses 4M of memory (here is Heap memory), which means that the current reachable memory of the program is 4M. When the memory occupied by the program reaches reachable* When (1 GOGC/100)=8M, gc will be triggered and start related gc operations.
How to set the parameters of GOGC should be determined according to the actual scenario in the production situation, such as increasing the GOGC parameters to reduce the frequency of GC. Tips
If you want to have in-depth insights, it is essential to use gdb. This article has compiled some introductory tips for using gdb.
Reduce object allocationThe so-called reduction of object allocation is actually to try to reuse objects. For example, the following two function definitions:
The first function has no formal parameters and returns a []byte every time it is called. The second function has a formal parameter of buf every time it is called. []byte type object, then returns the number of bytes read. The first function will allocate a space every time it is called, which will cause additional pressure on gc. The second function will reuse the formal parameter declaration every time it is called.
Cliché about string and []byte conversionConverting between string and []byte will put pressure on gc. Through gdb, you can first compare the data structures of the two:
When the two are converted, the underlying data structure will be copied, so the gc efficiency will become lower. In terms of solution strategy, one way is to always use []byte, especially in data transmission. []byte also contains many effective operations that are commonly used in string. The other is to use lower-level operations to directly convert to avoid copying. You can refer to the first part of performance optimization in WeChat "Yuhen Academy", which mainly uses unsafe.Pointer for direct conversion.
Regarding the use of unsafe, I feel that I can compile a separate article. First, list the relevant information here http://studygolang.com/articles/685 Intuitively, you can understand unsafe.Pointer Into void* in C, in golang, it is equivalent to a bridge for conversion of various types of pointers.
The underlying type of uintptr is int, which can hold the value of the address pointed to by the pointer. It can be converted to and from unsafe.Pointer. The main difference is that uintptr can participate in pointer operations, while unsafe.Pointer can only perform pointer conversion and cannot perform pointer operations. If you want to use golang for pointer arithmetic, you can refer to this. When performing specific pointer operations, it must first be converted into the uintptr type before further calculations can be made, such as the offset and so on.
Use sparingly to connect string Since using to connect strings will generate new objects and reduce the efficiency of gc, the best way is to use the append function.
But there is another drawback, for example, refer to the following code:
After using the append operation, the space of the array increases from 1024 to 1312, so if the length of the array can be known in advance, it is best to Fortunately, space planning is done when initially allocating space. This will increase some code management costs, reduce the pressure on gc, and improve code efficiency.
The above is the detailed content of Detailed explanation of Golang GC garbage collection mechanism. For more information, please follow other related articles on the PHP Chinese website!