1. Comparison between HashMap and ConcurrentHashMap
We use a piece of code to prove the thread insecurity of HashMap and the thread safety of ConcurrentHashMap. The code logic is very simple. 10,000 threads are started. Each thread performs a simple operation, which is to put a key and then delete a key. In theory, if thread safety is maintained, the final map size() must be 0.
Recommended tutorial: "java learning"
Map
The size of Map is shown here=13, but there is actually a key in the map. We use ConcurrentHashMap to run the same code, and the result map ==0
This proves that ConcurrentHashMap is thread-safe. Let’s analyze how ConcurrentHashMap ensures thread-safety from the source code. This time the source code The jdk version is 1.8.
2. ConcurrentHashMap source code analysis
3.1 Basic attributes of ConcurrentHashMap
//默认最大的容量 private static final int MAXIMUM_CAPACITY = 1 << 30; //默认初始化的容量 private static final int DEFAULT_CAPACITY = 16; //最大的数组可能长度 static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; //默认的并发级别,目前并没有用,只是为了保持兼容性 private static final int DEFAULT_CONCURRENCY_LEVEL = 16; //和hashMap一样,负载因子 private static final float LOAD_FACTOR = 0.75f; //和HashMap一样,链表转换为红黑树的阈值,默认是8 static final int TREEIFY_THRESHOLD = 8; //红黑树转换链表的阀值,默认是6 static final int UNTREEIFY_THRESHOLD = 6; //进行链表转换最少需要的数组长度,如果没有达到这个数字,只能进行扩容 static final int MIN_TREEIFY_CAPACITY = 64; //table扩容时, 每个线程最少迁移table的槽位个数 private static final int MIN_TRANSFER_STRIDE = 16; //感觉是用来计算偏移量和线程数量的标记 private static int RESIZE_STAMP_BITS = 16; //能够调整的最大线程数量 private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1; //记录偏移量 private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS; //值为-1, 当Node.hash为MOVED时, 代表着table正在扩容 static final int MOVED = -1; //TREEBIN, 置为-2, 代表此元素后接红黑树 static final int TREEBIN = -2; //感觉是占位符,目前没看出来明显的作用 static final int RESERVED = -3; //主要用来计算Hash值的 static final int HASH_BITS = 0x7fffffff; //节点数组 transient volatile Node[] table; //table迁移过程临时变量, 在迁移过程中将元素全部迁移到nextTable上 private transient volatile Node [] nextTable; //基础计数器 private transient volatile long baseCount; //table扩容和初始化的标记,不同的值代表不同的含义,默认为0,表示未初始化 //-1: table正在初始化;小于-1,表示table正在扩容;大于0,表示初始化完成后下次扩容的大小 private transient volatile int sizeCtl; //table容量从n扩到2n时, 是从索引n->1的元素开始迁移, transferIndex代表当前已经迁移的元素下标 private transient volatile int transferIndex; //扩容时候,CAS锁标记 private transient volatile int cellsBusy; //计数器表,大小是2次幂 private transient volatile CounterCell[] counterCells;
The above are the basic attributes of ConcurrentHashMap. Most of us are the same as HashMap, but only add some attributes. Later Let's analyze how the added attributes play a role.
2.2 Common method attributes of ConcurrentHashMap
put method
final V putVal(K key, V value, boolean onlyIfAbsent) { //key和value不允许为null if (key == null || value == null) throw new NullPointerException(); //计算hash值 int hash = spread(key.hashCode()); int binCount = 0; for (Node[] tab = table;;) { Node f; int n, i, fh; //如果table没有初始化,进行初始化 if (tab == null || (n = tab.length) == 0) tab = initTable(); //计算数组的位置 else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) { //如果该位置为空,构造新节点添加即可 if (casTabAt(tab, i, null, new Node (hash, key, value, null))) break; // no lock when adding to empty bin }//如果正在扩容 else if ((fh = f.hash) == MOVED) //帮着一起扩容 tab = helpTransfer(tab, f); else { //开始真正的插入 V oldVal = null; synchronized (f) { if (tabAt(tab, i) == f) { //如果已经初始化完成了 if (fh >= 0) { binCount = 1; for (Node e = f;; ++binCount) { K ek; //这里key相同直接覆盖原来的节点 if (e.hash == hash && ((ek = e.key) == key || (ek != null && key.equals(ek)))) { oldVal = e.val; if (!onlyIfAbsent) e.val = value; break; } Node pred = e; //否则添加到节点的最后面 if ((e = e.next) == null) { pred.next = new Node (hash, key, value, null); break; } } }//如果节点是树节点,就进行树节点添加操作 else if (f instanceof TreeBin) { Node p; binCount = 2; if ((p = ((TreeBin )f).putTreeVal(hash, key,alue)) != null) { oldVal = p.val; if (!onlyIfAbsent) p.val = value; } } } }//判断节点是否要转换成红黑树 if (binCount != 0) { if (binCount >= TREEIFY_THRESHOLD) treeifyBin(tab, i);//红黑树转换 if (oldVal != null) return oldVal; break; } } } //计数器,采用CAS计算size大小,并且检查是否需要扩容 addCount(1L, binCount); return null; }
We found that the logic of the put method of ConcurrentHashMap is not much different from that of HashMap, mainly because of the new thread safety part. When adding elements, use synchronized to ensure thread safety, and then use CAS operations to calculate the size. The entire put process is relatively simple. The summary is:
1. Determine whether the key and value are empty. If they are empty, throw an exception directly.
2. Determine whether the table array has been initialized. If not, initialize it.
3. Calculate the hash value of the key. If the position is empty, directly construct the node and put it in.
4. If the table is being expanded, enter the help expansion method.
5. Finally, turn on the synchronization lock and perform the insertion operation. If the overwrite option is turned on, overwrite it directly. Otherwise, construct nodes and add them to the end. If the number of nodes exceeds the red-black tree threshold, perform red-black tree conversion. If the current node is a tree node, perform a tree insertion operation.
6. Finally, count the size and calculate whether it needs to be expanded.
get method
public V get(Object key) { Node[] tab; Node e, p; int n, eh; K ek; //计算hash值 int h = spread(key.hashCode()); //如果table已经初始化,并且计算hash值的索引位置node不为空 if ((tab = table) != null && (n = tab.length) > 0 && (e = tabAt(tab, (n - 1) & h)) != null) { //如果hash相等,key相等,直接返回该节点的value if ((eh = e.hash) == h) { if ((ek = e.key) == key || (ek != null && key.equals(ek))) return e.val; }//如果hash值为负值表示正在扩容,这个时候查的是ForwardingNode的find方法来定位到节点。 else if (eh < 0) return (p = e.find(h, key)) != null ? p.val : null; //循环遍历链表,查询key和hash值相等的节点。 while ((e = e.next) != null) { if (e.hash == h && ((ek = e.key) == key || (ek != null && key.equals(ek)))) return e.val; } } return null; }
The get method is relatively simple. The main process is as follows:
1. Calculate the hash value directly. If the key and hash value of the node being searched are equal, the node will be returned directly. Just the value of the node will do.
2. If the table is expanding, call the find method of ForwardingNode to find the node.
3. If there is no expansion, just loop through the linked list and find the node value with the same key and hash value.
Expansion of ConcurrentHashMap
The expansion of ConcurrentHashMap is relatively complicated compared to the expansion of HashMap because it involves multi-threaded operations. The expansion method here is mainly transfer. Let’s analyze the source code of this method and study it. Here’s how to expand.
private final void transfer(Node[] tab, Node [] nextTab) { int n = tab.length, stride; //保证每个线程扩容最少是16, if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE) stride = MIN_TRANSFER_STRIDE; // subdivide range if (nextTab == null) { // initiating try { //扩容2倍 @SuppressWarnings("unchecked") Node [] nt = (Node [])new Node,?>[n << 1]; nextTab = nt; } catch (Throwable ex) { // try to cope with OOME //出现异常情况就不扩容了。 sizeCtl = Integer.MAX_VALUE; return; } //用新数组对象接收 nextTable = nextTab; //初始化扩容下表为原数组的长度 transferIndex = n; } int nextn = nextTab.length; //扩容期间的过渡节点 ForwardingNode fwd = new ForwardingNode (nextTab); boolean advance = true; boolean finishing = false; // to ensure sweep before committing nextTab for (int i = 0, bound = 0;;) { Node f; int fh; while (advance) { int nextIndex, nextBound; //如果该线程已经完成了 if (--i >= bound || finishing) advance = false; //设置扩容转移下标,如果下标小于0,说明已经没有区间可以操作了,线程可以退出了 else if ((nextIndex = transferIndex) <= 0) { i = -1; advance = false; }CAS操作设置区间 else if (U.compareAndSwapInt (this, TRANSFERINDEX, nextIndex, nextBound = (nextIndex > stride ? nextIndex - stride : 0))) { bound = nextBound; i = nextIndex - 1; advance = false; } } //如果计算的区间小于0了,说明区间分配已经完成,没有剩余区间分配了 if (i < 0 || i >= n || i + n >= nextn) { int sc; if (finishing) {//如果扩容完成了,进行收尾工作 nextTable = null;//清空临时数组 table = nextTab;//赋值原数组 sizeCtl = (n << 1) - (n >>> 1);//重新赋值sizeCtl return; }//如果扩容还在进行,自己任务完成就进行sizeCtl-1,这里是因为,扩容是通过helpTransfer()和addCount()方法来调用的,在调用transfer()真正扩容之前,sizeCtl都会+1,所以这里每个线程完成后就进行-1。 if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) { //这里应该是判断扩容是否结束 if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT) return; //结束,赋值状态 finishing = advance = true; i = n; // recheck before commit } }//如果在table中没找到,就用过渡节点 else if ((f = tabAt(tab, i)) == null) //成功设置就进入下一个节点 advance = casTabAt(tab, i, null, fwd); else if ((fh = f.hash) == MOVED) //如果节点不为空,并且该位置的hash值为-1,表示已经处理了,直接进入下一个循环即可 advance = true; // already processed else { //这里说明老table该位置不为null,也没有被处理过,进行真正的处理逻辑。进行同步锁 synchronized (f) { if (tabAt(tab, i) == f) { Node ln, hn; //如果hash值大于0 if (fh >= 0) { //为运算结果 int runBit = fh & n; Node lastRun = f; for (Node p = f.next; p != null; p = p.next) { int b = p.hash & n; if (b != runBit) { runBit = b; lastRun = p; } } if (runBit == 0) { ln = lastRun; hn = null; } else { hn = lastRun; ln = null; } for (Node p = f; p != lastRun; p = p.next) { int ph = p.hash; K pk = p.key; V pv = p.val; //这里的逻辑和hashMap是一样的,都是采用2个链表进行处理,具体分析可以查看我分析HashMap的文章 if ((ph & n) == 0) ln = new Node (ph, pk, pv, ln); else hn = new Node (ph, pk, pv, hn); } setTabAt(nextTab, i, ln); setTabAt(nextTab, i + n, hn); setTabAt(tab, i, fwd); advance = true; }//如果是树节点,执行树节点的扩容数据转移 else if (f instanceof TreeBin) { TreeBin t = (TreeBin )f; TreeNode lo = null, loTail = null; TreeNode hi = null, hiTail = null; int lc = 0, hc = 0; for (Node e = t.first; e != null; e = e.next) { int h = e.hash; TreeNode p = new TreeNode (h, e.key, e.val, null, null); //也是通过位运算判断两个链表的位置 if ((h & n) == 0) { if ((p.prev = loTail) == null) lo = p; else loTail.next = p; loTail = p; ++lc; } else { if ((p.prev = hiTail) == null) hi = p; else hiTail.next = p; hiTail = p; ++hc; } } //这里判断是否进行树转换 ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) : (hc != 0) ? new TreeBin (lo) : t; hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) : (lc != 0) ? new TreeBin (hi) : t; //这里把新处理的链表赋值到新数组中 setTabAt(nextTab, i, ln); setTabAt(nextTab, i + n, hn); setTabAt(tab, i, fwd); advance = true; } } } } } }
The expansion of ConcurrentHashMap is still relatively complicated. The complexity is mainly reflected in the level of controlling multi-thread expansion. I have not analyzed the source code of the expansion in detail. On the one hand, it is indeed more complicated. I have some problems. It’s very clear. On the other hand, I think our research is mainly to understand its ideas. As long as we can understand the key codes and key ideas. As long as we don’t re-implement a set of similar functions, I don’t think we need to worry about all the details. To summarize, the expansion steps of ConcurrentHashMap are as follows:
1. Get the thread expansion processing step size, which is at least 16, which is the number of nodes that a single thread can handle expansion.
2. Create a new array with twice the original capacity and construct a transition node for query operations during expansion.
3. Perform an infinite loop to transfer nodes, mainly based on the finishing variable to determine whether the expansion is completed. During the expansion period, the expansion operation is performed by setting different indexes in the following table for different threads, that is, different threads, the array of operations The segments are different, and synchronized synchronization locks are used to lock the operating nodes to ensure thread safety.
4. The actual position of the node in the new array is the same as the expansion logic of HashMap. Through bit operations, it is calculated whether the new linked list is at the original position or at the original position of the expanded length. For specific analysis, you can check my This article.
3. Summary
1. Most of the logic code of ConcurrentHashMap is the same as HashMap. It mainly uses synchronized and to ensure the thread safety of node insertion and expansion. Some students will definitely ask here, why? Why use synchronized? Instead of using optimistic locking, or what about lock? I personally think there are two possible reasons:
a. Optimistic locking is more suitable for scenarios with fewer competition conflicts. If there are more conflicts, it will lead to constant retries, which will result in lower performance.
b. After optimization, the performance of synchronized is no different from lock. In some scenarios, it may be faster than lock. So, I think this is the reason for using synchronized for synchronization.
2. The core expansion logic of ConcurrentHashMap is to allocate different array subscripts to different threads, and then each thread processes the nodes in its own table range. At the same time, the processing node reuses the logic of hashMap. Through bit operation, you can know the position of the node after expansion, either at the original position or at the oldlength position, and finally assign the value directly.
The above is the detailed content of Why is concurrenthashmap thread-safe?. For more information, please follow other related articles on the PHP Chinese website!