Concurrent programming knowledge points for Java thread learning-javaTutorial-php.cn

This article brings you relevant knowledge aboutjava, which mainly organizes issues related to concurrent programming, including the Java memory model, detailed explanation of volatile, and the implementation principle of synchronized, etc. Let’s take a look at it together, I hope it will be helpful to everyone.

Concurrent programming knowledge points for Java thread learning

Recommended study: "java Video Tutorial"

1. JMM Basics - Computer Principles

Java Memory The model is Java Memory Model, or JMM for short. JMM defines how the Java Virtual Machine (JVM) works in computer memory (RAM). JVM is the entire computer virtual model, so JMM is affiliated with JVM. The Java1.5 version has refactored it, and the current Java still uses the Java1.5 version. The problems encountered by Jmm are similar to those encountered in modern computers.
Concurrency problems in physical computers. The concurrency problems encountered by physical machines have many similarities with the situations in virtual machines. The concurrency handling scheme of physical machines also has considerable reference significance for the implementation of virtual machines.
Based on "Jeff Dean's Report at Google All-Engineering Conference" we can see that

Concurrent programming knowledge points for Java thread learning

When the computer does some of our usual basic operations, the response time required is Different.

The following cases are for illustration only and do not represent the real situation.

If 1M of int type data is read from the memory and accumulated by the CPU, how long will it take?
Do a simple calculation. For 1M data, the int type in Java is 32 bits and 4 bytes. There are a total of 1024*1024/4 = 262144 integers. The CPU calculation time is: 2621440.6 = 157286 Nanoseconds, and we know that it takes 250,000 nanoseconds to read 1M data from memory. Although there is a gap between the two (of course, this gap is not small, one hundred thousand nanoseconds is enough time for the CPU to execute nearly two hundred thousand instructions), but it is still On an order of magnitude. However, without any caching mechanism, it means that each number needs to be read from the memory. In this case, it takes 100 nanoseconds for the CPU to read the memory once, and 262144 integers are read from the memory to the CPU plus the calculation time. It takes 262144100 250000 = 26 464 400 nanoseconds, which is a difference in order of magnitude.

And in reality, most computing tasks cannot be completed by just "computing" by the processor. The processor must at least interact with the memory, such as reading computing data, storing computing results, etc. This I/O operations are basically impossible to eliminate (cannot rely on registers alone to complete all computing tasks). The speeds of the CPU and memory in early computers were almost the same, but in modern computers, the instruction speed of the CPU far exceeds the access speed of the memory. Since there is a gap of several orders of magnitude between the computer's storage device and the computing speed of the processor, modern computers Computer systems have to add a layer of cache (Cache) with a read and write speed as close as possible to the processor's operation speed to serve as a buffer between the memory and the processor: copy the data needed for the operation into the cache so that the operation can Proceed quickly, and when the operation is completed, it is synchronized back to the memory from the cache, so that the processor does not have to wait for slow memory reads and writes.

Concurrent programming knowledge points for Java thread learning

In a computer system, the register is the L0 level cache, followed by L1, L2, and L3 (followed by memory, local disk, remote storage). The cache storage space further up is smaller, the speed is faster, and the cost is higher; the storage space further down is larger, the speed is slower, and the cost is lower. From top to bottom, each layer can be regarded as the cache of the next layer, that is: the L0 register is the cache of the L1 first-level cache, L1 is the cache of the L2, and so on; the data of each layer comes from The layer below it, so the data of each layer is a subset of the data of the next layer.

Concurrent programming knowledge points for Java thread learning

## On modern CPUs, generally speaking, L0, L1, L2, and L3 are integrated inside the CPU, and L1 is also divided into a first-level data cache (Data Cache, D -Cache, L1d) and the first-level instruction cache (Instruction Cache, I-Cache, L1i), which are used to store data and execute instruction decoding of data respectively. Each core has an independent computing processing unit, controller, register, L1, and L2 cache, and then multiple cores of a CPU share the last layer of CPU cache L3.

2. Java Memory Model (JMM)

From an abstract point of view, JMM defines the abstract relationship between threads and main memory: shared variables between threads are stored in main memory (Main Memory ), each thread has a private local memory (Local Memory), which stores a copy of the shared variables that the thread can read/write. Local memory is an abstract concept of JMM and does not really exist. It covers caches, write buffers, registers, and other hardware and compiler optimizations.

Concurrent programming knowledge points for Java thread learning

##2.1 Visibility

Visibility means that when multiple threads access the same variable, a thread If the value of this variable is modified, other threads can immediately see the modified value.

Since all operations on variables by threads must be performed in the working memory and cannot directly read and write variables in the main memory, then for the shared variables V, they are first in their own working memory and then synchronized to the main memory. . However, it will not be flushed to the main memory in time, but there will be a certain time difference. Obviously, at this time, thread A's operation on variable V is no longer visible to thread B.
To solve the problem of shared object visibility, we can use the volatile keyword or lock.

2.2. Atomicity

Atomicity: That is, one operation or multiple operations, either all are executed and the execution process will not be interrupted by any factors, or None are implemented.We all know that CPU resources are allocated in units of threads and are called in a time-sharing manner. The operating system allows a process to execute for a short period of time, such as 50 milliseconds. After 50 milliseconds, the operating system will reselect a process. process to execute (we call it "task switching"), this 50 milliseconds is called the "time slice". Most tasks are switched after the time segment ends.

So why does thread switching cause bugs?Because the operating system performs task switching, it can occur after any CPU instruction is executed! Note that it is a CPU instruction, CPU instruction, CPU instruction, not a statement in a high-level language. For example, count is just one sentence in Java, but in high-level languages a statement often requires multiple CPU instructions to complete. In fact, count contains at least three CPU instructions!

3. Detailed explanation of volatile

3.1. Volatile features

You can treat a single

read/writeof a volatile variable as using the same lock Synchronizing these singleread/writeoperations

public class Volati { // 使用volatile 声明一个64位的long型变量 volatile long i = 0L;// 单个volatile 变量的读 public long getI() { return i; }// 单个volatile 变量的写 public void setI(long i) { this.i = i; }// 复合(多个)volatile 变量的 读/写 public void iCount(){ i ++; }}

Copy after login

can be seen as the following code:

public class VolaLikeSyn { // 使用 long 型变量 long i = 0L; public synchronized long getI() { return i; }// 对单个的普通变量的读用同一个锁同步 public synchronized void setI(long i) { this.i = i; }// 普通方法调用 public void iCount(){ long temp = getI(); // 调用已同步的读方法 temp = temp + 1L; // 普通写操作 setI(temp); // 调用已同步的写方法 }}

Copy after login

So the volatile variable itself has the following Features:

Visibility: Reading a volatile variable can always see (any thread) the last write to the volatile variable.
Atomicity: Reading/writing of any single volatile variable is atomic, but compound operations like volatile are not atomic.

Although volatile can ensure that variables are flushed to the main memory in time after execution, for count, a non-atomic, multi-instruction situation, due to thread switching, thread A has just loaded count=0 After reaching the working memory, thread B can start working. This will cause the execution results of threads A and B to be both 1, and both are written to the main memory. The value of the main memory is still 1, not 2

3.2 , Implementation principle of volatile

Lock prefix, Lock is not a memory barrier, but it can complete functions similar to memory barriers. Lock locks the CPU bus and cache, which can be understood as a lock at the CPU instruction level.
At the same time, this instruction will write the data of the current processor cache line directly to the system memory, and this writing operation will invalidate the data cached at this address in other CPUs.

4. The implementation principle of synchronized

The implementation of Synchronized in the JVM is based on entering and exiting the Monitor object to achieve method synchronization and code block synchronization, although the specific implementation details are different. , but can be implemented through pairs of MonitorEnter and MonitorExit instructions.

For synchronized blocks, the MonitorEnter instruction is inserted at the beginning of the synchronized code block, while the monitorExit instruction is inserted at the end of the method and the exception. The JVM guarantees that each MonitorEnter must have a corresponding MonitorExit. In general, when the code executes this instruction, it will try to obtain ownership of the object Monitor, that is, try to obtain the lock of the object:

If the entry number of the monitor is 0, the thread enters the monitor, and then sets the entry number to 1, and the thread becomes the owner of the monitor.
If the thread already occupies the monitor and just re-enters, the number of entries into the monitor is increased by 1.
If other threads have already occupied the monitor, the thread will enter the blocking state until the entry number of the monitor is 0, and then try to obtain ownership of the monitor again. For the synchronization method, judging from the decompilation results of the synchronization method, the synchronization of the method is not implemented through the instructions monitorenter and monitorexit. Compared with the ordinary method, the constant pool has an additional ACC_SYNCHRONIZED identifier.
The JVM implements method synchronization based on this identifier: when the method is called, the calling instruction will check whether the ACC_SYNCHRONIZED access flag of the method is set. If it is set, the execution thread will first obtain the monitor. After the acquisition is successful, The method body can be executed, and the monitor can be released after the method is executed. During method execution, no other thread can obtain the same monitor object again.

The lock used by synchronized is stored in the Java object header. The object header of the Java object consists of two parts: mark word and klass pointer:

mark word stores synchronization Status, flags, hashcode, GC status, etc.
klass pointer stores the type pointer of the object, which points to its class metadata. In addition, for arrays, there is also a data recording the length of the array.

Concurrent programming knowledge points for Java thread learning

#The lock information exists in the mark word of the object. The default data in MarkWord is to store the HashCode and other information of the object.

Concurrent programming knowledge points for Java thread learning

But it will change as the operation of the object changes. Different lock states correspond to different record storage methods

Concurrent programming knowledge points for Java thread learning

4.1. Lock status

Comparing the above picture, we found that there are four lock states,no lock state, biased lock state, lightweight lock state and heavyweight lock state, it will gradually escalate with the competitive situation. Locks can be upgraded but not downgraded in order to improve the efficiency of acquiring and releasing locks.

4.2. Biased lock

Introducing background: In most cases, the lock not only does not have multi-thread competition, but also is always acquired multiple times by the same thread. In order to allow threads The cost of acquiring locks is lower and biased locks are introduced to reduce unnecessary CAS operations.
is biased towards the lock, as the name suggests, it will bias to the first visit to the thread. If the synchronous lock is only accessed by a thread during the operation, there is no multi -threaded dispute. , Reduce some locking/unlocking CAS operations (such as some CAS operations waiting for the queue). In this case, a bias lock will be added to the thread. If other threads preempt the lock during operation, the thread holding the biased lock will be suspended, and the JVM will eliminate the biased lock on it and restore the lock to a standard lightweight lock. It further improves the running performance of the program by eliminating synchronization primitives when there is no competition for resources.

Look at the picture below to understand the process of obtaining the bias lock:

Concurrent programming knowledge points for Java thread learning

## Step 1. Check whether the bias lock logo in Mark Word is set If it is 1, whether the lock flag is 01, confirm that it is in the deflectable state.

Step 2. If it is in the biasable state, test whether the thread ID points to the current thread. If so, go to step 5, otherwise go to step 3.
Step 3. If the thread ID does not point to the current thread, compete for the lock through CAS operation. If the competition succeeds, set the thread ID in Mark Word to the current thread ID, and then execute 5; if the competition fails, execute 4.
Step 4. If CAS fails to acquire the bias lock, it means there is competition. When reaching the global safe point (safepoint), the thread that obtains the bias lock is suspended, the bias lock is upgraded to a lightweight lock, and then the thread blocked at the safe point continues to execute the synchronization code. (Revoking the bias lock will cause stop the word)
Step 5. Execute the synchronization code.

Bias lock release:

The cancellation of the biased lock is mentioned in the fourth step above. The bias lock will only release the bias lock when other threads try to compete for the bias lock. The thread holding the bias lock will not take the initiative to release the bias lock. To cancel the biased lock, you need to wait for the global safety point (no bytecode is being executed at this point in time). It will first pause the thread that owns the biased lock, determine whether the lock object is in a locked state, and then restore the biased lock to the previous state after canceling the biased lock. The status of lock (flag bit is "01") or lightweight lock (flag bit is "00").

Applicable scenarios for biased locks:

There is always only one thread executing the synchronization block. Before it finishes executing and releases the lock, no other thread executes the synchronization block. It is used when there is no competition for the lock. Once there is competition, it will be upgraded to a lightweight lock. When upgrading to a lightweight lock, the biased lock needs to be revoked. Revoking the biased lock will cause the stop the word operation;
in When there is lock competition, the biased lock will do a lot of extra operations. Especially when canceling the biased lock, it will lead to a safe point. The safe point will cause stw and lead to performance degradation. In this case, it should be disabled.

jvm Turn on/off bias lock
Turn on bias lock: -XX: UseBiasedLocking -XX:BiasedLockingStartupDelay=0 Turn off bias lock: -XX:-UseBiasedLocking

4.3. Lightweight lock

The lightweight lock is upgraded from the biased lock. The biased lock runs when one thread enters the synchronization block. When the second thread joins the lock contention, the biased lock The lock will be upgraded to a lightweight lock;

Lightweight lock locking process:

When the code enters the synchronization block, if the synchronization The object lock status is lock-free and biasing is not allowed (the lock flag is "01", and whether it is a biased lock is "0"). The virtual machine will first create a lock record in the stack frame of the current thread ( Lock Record) space is used to store a copy of the current Mark Word of the lock object, which is officially called Displaced Mark Word.
Copy the Mark Word in the object header to the lock record.
After the copy is successful, the virtual machine will use the CAS operation to try to update the object's Mark Word to a pointer to the Lock Record, and point the owner pointer in the Lock record to the object mark word. If the update is successful, proceed to step 4, otherwise proceed to step 5.
If the update action is successful, then this thread owns the lock of the object, and the lock flag of the object Mark Word is set to "00", which means that the object is in a lightweight lock state
If this update operation fails, the virtual machine will first check whether the Mark Word of the object points to the stack frame of the current thread. If so, it means that the current thread already owns the lock of this object, and then it can directly enter the synchronization block to continue. implement. Otherwise, it means that multiple threads are competing for the lock, then it will spin and wait for the lock, and the lock object has not been obtained after a certain number of times. The heavyweight thread pointer points to the competing thread, and the competing thread will also block, waiting for the lightweight thread to release the lock and wake it up. The status value of the lock flag changes to "10". What is stored in the Mark Word is the pointer to the heavyweight lock (mutex), and the subsequent threads waiting for the lock will also enter the blocking state.

4.3.1. Spin lock principle

The principle of spin lock is very simple. If the thread holding the lock can release the lock resource in a short time, then those waiting to compete The threads holding the lock do not need to switch between the kernel mode and the user mode to enter the blocked and suspended state. They only need to wait (spin) and acquire the lock immediately after the thread holding the lock releases the lock. In this way, Avoid the cost of switching between user threads and kernels.
But thread spinning needs to consume the CPU. To put it bluntly, it means that the CPU is doing useless work. The thread cannot always occupy the CPU and spin to do useless work, so you need to set a maximum spin waiting time.
If the execution time of the thread holding the lock exceeds the maximum spin waiting time and the lock is not released, other threads competing for the lock will still not be able to obtain the lock within the maximum waiting time. At this time, the contention thread will Stop spinning and enter blocking state.

4.3.2. Advantages and Disadvantages of Spin Locks

Spin locks reduce thread blocking as much as possible. This is a code block that does not compete fiercely for locks and occupies a very short lock time. In terms of performance, the performance is greatly improved, because the consumption of spin will be less than the consumption of thread blocking and suspending operations.
But if the competition for the lock is fierce, or the thread holding the lock needs to occupy the lock for a long time to execute the synchronization block, it is not suitable to use the spin lock at this time, because the spin lock always occupies the CPU before acquiring the lock. It is useless work and occupying a pit. The consumption of thread spinning is greater than the consumption of thread blocking and suspending operations. Other threads that need cup cannot obtain the CPU, resulting in a waste of CPU.

4.3.3. Spin lock time threshold

The purpose of the spin lock is to occupy the CPU resources without releasing them, and wait until the lock is acquired to process it immediately. But how to choose the execution time of spin? If the spin execution time is too long, a large number of threads will be in the spin state and occupy CPU resources, which will affect the performance of the overall system. So the number of spins is important.
can be selected by JDK1.5 can be set to 10 times by default in jdk1.5. In 1.6, adaptive spin locks were introduced. Adaptive spin locks mean that the spin time is no longer fixed, but is determined by the previous spin lock. It is determined by the spin time on the same lock and the status of the lock owner. It is basically considered that the time of context switching of a thread is the best time.

In JDK1.6-XX: UseSpinning turns on the spin lock; after JDK1.7, this parameter is removed and controlled by jvm;

Concurrent programming knowledge points for Java thread learning

4.3.4. Comparison of different locks

Concurrent programming knowledge points for Java thread learning

Recommended study: "java video tutorial"

The above is the detailed content of Concurrent programming knowledge points for Java thread learning. For more information, please follow other related articles on the PHP Chinese website!