Recently, I had the opportunity to test and optimize several Java-based shopping and portal applications running on Sun/Oracle JVMs. The most visited apps are in Germany. In many cases, garbage collection is a critical factor in Java server performance. In this article, we will study some advanced garbage collection algorithm ideas and some important adjustment parameters. We compare these parameters in various real-world scenarios.
From a garbage collection perspective, Java server programs can have a wide variety of needs:
Some high-traffic applications need to respond to a large number of requests and create very Many objects. Sometimes, some medium-traffic applications using high resource consumption frameworks will encounter the same problem. In short, how to effectively clean up these generated objects is a big challenge for garbage collection.
In addition, some applications need to run for a long time and provide stable services during operation, and require that the performance will not slowly deteriorate over time or suddenly deteriorate.
Some scenarios require strict limits on user response time (such as online games or betting applications, etc.), and almost no additional GC pauses are allowed.
In many scenarios, you can combine several requirements with different priorities. Several of my sample programs have much higher requirements on the first point than on the second point, but most programs do not have high requirements on all three aspects at the same time. This leaves you with plenty of room for trade-offs.
JVM has many improvements, but it still cannot optimize tasks while the program is running. In addition to the three points mentioned above, the default JVM settings have another priority second to them: reducing memory usage. Consider that thousands of users are not running on servers with ample memory. It is also important for many e-commerce products because these applications are configured to run on development laptops most of the time rather than on commodity servers. Therefore, if your server is configured with the minimum heap space and GC parameters, such as the following configuration,
java -Xmx1024m -XX:MaxPermSize=256m -cp Portal.jar my.portal.Portal
this will definitely cause the system to run inefficiently. First, it is good practice to configure not only the maximum memory limit, but also the initial memory size to avoid the server gradually increasing memory during startup. Otherwise it will be very costly. When you know how much memory your server needs (and you should find out in time), it's best to make the initial memory size equal to the maximum memory setting. It can be set through the following JVM parameters:
-Xms1024m -XX:PermSize=256m
The last basic option that is often configured in the JVM is to configure the new generation heap memory size, similar to the way set above:
-XX:NewSize=200m -XX:MaxNewSize=200m
The following chapters will Give explanations for the above configurations as well as more complex configurations. First, let's look at a portal application running on a fairly slow test host. When load testing, how does its garbage collection work:
Figure 1 GC behavior of the JVM with slightly optimized heap size in about 25 hours (-Xms1024m - Xmx1024m -XX:NewSize=200m -XX:MaxNewSize=200m)
Among them, the blue curve represents the change of total heap memory usage over time, and the vertical gray line represents the GC pause interval.
In addition to the graph, key indicators and performance of GC operations are displayed on the far right. First let's look at the average amount of garbage created (and collected) during this test. The 30.5MB/s value is marked in yellow because this is a sizable but acceptable garbage generation rate, which is fine for an introductory GC tuning example. Other values indicate the performance of the JVM in cleaning up this garbage: 99.55% of the garbage is cleaned in the new generation, and only 0.45% in the old generation. This result is pretty good, so it's marked green.
The reason for such a result can be seen from the pause interval introduced by GC (and the worker thread that handles user requests): there are many but very short new generation GC intervals, on average once every 6s, and the duration It won't take more than 50ms. These pauses stopped the JVM for 0.77% of the total time, but each pause was completely imperceptible to the user waiting for the server to respond.
On the other hand, the pauses of the old generation GC only account for 0.19% of the total time. However, during this period, the old generation GC only cleaned 0.45% of the garbage, while the new generation GC took 0.77% of the time to clean 99.55% of the garbage. It can be seen how inefficient the old generation GC is compared with the new generation GC. In addition, the average triggering rate of old generation GC pauses is less than once an hour, but the average duration can reach 8 seconds, and the maximum outlier even reaches 19 seconds. Because these pauses actually stop the JVM's threads processing user requests, pauses should be as infrequent as possible and of short duration.
Through the above observations, we can draw the basic tuning goals of generational garbage collection:
The new generation GC should try to collect as much garbage as possible to avoid frequent and frequent occurrences of old generation GC. Shorter duration.
先从下图开始。这个图可以通过JDK工具得到,比如jstat或者jvisualvm以及它的visualgc插件:
图2 JVM的堆内存结构,包括新生代的子分区(最左列)
Java的堆内存由永久代(Perm),老年代(Old)和新生代(New or Young)组成。新生代进一步划分为一个Eden空间和两个Survivor空间S0、S1。Eden空间是对象被创建时的地方,经过几轮新生代GC后,他们有可能被存放在Survivor空间。如果想了解更多,可以读一下Sun/Oracle的白皮书Memory Management in the Java HotSpot Virtual Machine
默认情况下,作为整体的新生代特别是Survivor空间太小,导致在GC清理大部分内存之前就无法保存更多对象。因此,这些对象被过早地保存在老年代中,这会导致老年代被迅速填满,必须频繁地清理垃圾。这也是图1中产生较多的Full GC暂停的原因。
(译者注:一般新生代的垃圾回收也称为Minor GC,老年代的垃圾回收称为Major GC或Full GC)
优化分代垃圾回收意味着让新生代,特别是Survivor空间,比默认情形大。但是同时也要考虑虚拟机使用的具体GC算法。
当前硬件上运行的Sun/Oracle虚拟机使用了ParallelGC作为默认GC算法。如果使用的不是默认算法,可以通过显式配置JVM参数来实现:
-XX:+UseParallelGC
默认情况下,这个算法并不在固定大小的Eden和Survivor空间中运行。它使用了一种自适应调整大小的策略,称为“AdaptiveSizePolicy”策略。正如描述的那样,它可以适应很多场景,包括服务器以外的机器的使用。但在服务器上运行时,这并不是最优策略。为了可以显式地设置固定的Survivor空间大小,可以通过以下JVM参数关闭它:
-XX:-UseAdaptiveSizePolicy
一旦这么设置后,就不能进一步增加新生代空间的大小,但我们可以有效地为Survivor空间设置合适的大小:
-XX:NewSize=400m -XX:MaxNewSize=400m -XX:SurvivorRatio=6
“SurvivorRatio=6”表示Survivor空间是Eden空间的1/6或者是整个新生代空间的1/8,在这个例子中就是50MB,而自适应大小策略经常运行在非常小的空间上,大约只有几MB。使用现在的配置,重复上面的负载测试,我们得到了下面的结果:
图3 堆内存优化后的JVM在50小时内的GC行为(-Xms1024m -Xmx1024m -XX:NewSize=400m -XX:MaxNewSize=400m -XX:-UseAdapativeSizePolicy -XX:SurvivorRatio=6)
这次的测试时间是上次的两倍,而垃圾的平均创建速率和之前基本一致(30.2MB/s,之前是30.5MB/s)。然而,整个过程只有两次老年代(Full)GC暂停,25小时左右才发生一次。这是因为老年代垃圾死亡速率(所谓的promation rate)从137kB/s减小到了6kB/s,老年代的垃圾回收只占整体的0.02%。同时新生代GC的暂停持续时间仅仅从平均48ms增加到57ms,两次暂停的间隔从6s增长到10s。总之,关闭了自适应大小调整,合理地优化堆内存大小,使GC暂停占总时间的比例从0.95%减小到0.59%,这是一个非常棒的结果。
优化后,使用ParNew算法作为默认ParallelGC的替代,也能得到相似的结果。这个算法是为了与CMS算法兼容而开发的,可以通过JVM参数来配置-XX:+UseParNewGC
。关于CMS下面会提到。这个算法不使用自适应大小策略,可以运行在固定Survivor大小的空间上。因此,即使使用默认的配置SurvivorRatio=8
,也比ParallelGC拥有更高的服务器利用率。
上述结果的最后一个问题就是,老年代GC的长时间暂停平均为8s左右。通过适当的优化,老年代GC暂停已经很少了,但是一旦触发,对用户来说还是很烦人的。因为在暂停期间,JVM不能执行工作线程。在我们的例子中,8s的长度是由低速老旧的测试机导致的,在现代硬件上速度能快3倍左右。另一方面,现在的应用一般使用1G以上的堆内存,可以容纳更多的对象。当前的网络应用使用的堆内存能达到64GB,(至少)需要一半的内存来保存存活的对象。在这种情况下,8s对老年代暂停来说是很短的。这些应用中的老年代GC可以很随意地就接近1分钟,对于交互式网络应用来说是绝对不能接受的。
缓解这个问题的一个选择就是采用并行的方式处理老年代GC。默认情况下,在Java 6中,ParallelGC和ParNew GC算法使用多个GC线程来处理新生代GC,而老年代GC是单线程的。以ParallelGC回收器为例,可以在使用时添加以下参数:
-XX:+UseParallelOldGC
从Java 7开始,这个选项和-XX:+UseParallelGC
默认被激活。但是,即使你的系统是4核或8核,也不要期望性能可以提高2倍以上。通常的结果会比2被小一些。在某些例子中,比如上述例子中的8s,这种提高还是比较有效的。但在很多极端的例子中,这还远远不够。解决方法是使用低延迟GC算法。
下篇中会讨论CMS(The Concurrent Mark and Sweep Collector)、幽灵般的碎片、G1(Garbage First)垃圾收集器和垃圾收集器的量化比较,最后给出总结。
The above is the detailed content of Detailed explanation of optimizing garbage collection for mission-critical Java applications (Part 1). For more information, please follow other related articles on the PHP Chinese website!