What is a hash table
A hash table, also called a hash table (Hash Table), is a A data structure that provides a mapping relationship between keys and values. As long as a Key is given, the matching Value can be found efficiently, and the time complexity is close to O(1).
Online learning video recommendation: java video
How hash tables work
A hash table is essentially an array. We know that arrays can be accessed randomly based on subscripts, such as a[0], a[1], a[2], a[3], a[4]. This way, the query efficiency is very high. In a hash table, when we give a key, the corresponding value can be queried immediately. At this time, we need a "transfer station" to convert the Key and the array subscript in some way, and this transfer station is the hash function.
In different languages, hash functions are implemented in different ways. The one used in Java is HashMap
.
In Java and most object-oriented languages, each object has its own hashcode
to distinguish different objects, and this hashcode is an integer variable. At this time, we need to convert this integer variable into the subscript of the array. The simplest conversion method is to take the modulo of the array length.
The formula is as follows:
index = HashCode(key) % Array.length
For example:
Given an array with a length of 8, we want to find the Vaule corresponding to the key "001121", and " The hashcode of 001121" is 1420036703, then you can first get the array subscript through the following calculation:
index = HashCode("001121")%Array.length = 1420036703 % 8 = 7
Read and write operations of hash table
1. Write operation
The write operation is to insert a new key-value pair in the hash table (called Entry in jdk).
The specific method is: convert the key value into an array subscript through a hash function, and then insert the Entry at that position in the array (note that it is the Entry key value pair Key Value, not just the Value). It is conceivable that different key values may be converted into the same subscript, and then it becomes a hash conflict.
The commonly used methods to solve hash conflicts are the open addressing method and the linked list method.
The basic idea of the open addressing method is: when a hash conflict occurs, the Entry is placed in the next empty position in the array, that is, moved back one by one.
The basic idea of the linked list method (applied to Java's HashMap collection class) is that each element in the array is not only an Entry object, but also the head node of a linked list. Each Entry object points to its next Entry node through the next pointer. When a new Entry object is mapped to a conflicting array position, it only needs to be inserted into the corresponding linked list.
2. Read operation
The read operation corresponds to the write operation, and only the conflict situation needs to be specially handled.
The specific idea is: use the hash function to convert the Key value to be found into an array subscript, and then check whether the Key value at that position in the array is the Key we are looking for. If so, find it. The Value value of the Entry can be returned; otherwise, continue searching down the linked list to see if there is a corresponding Key value.
For example, when we want to find the value corresponding to Key 002936, we first convert the Key into an array subscript and get the subscript 2. We check the element and find that the Key of the element is 002947, which is not what we want. If you query the Key, then continue to search down the linked list.
3. Expansion
We know that array expansion is required when the number of elements in the array reaches the maximum length of the array When expanding the array, when will the hash table be expanded?
When the hash table reaches a certain degree of saturation after multiple element insertions, the probability of a hash conflict will increase. At this time, a large number of elements are crowded at the same array subscript position. This pair of post-order It has a great impact on the performance of insertion and query operations. At this time, it is necessary to expand the hash table.
The factors affecting the expansion of the hash table are:
Capacity,即HashMap的当前长度
LoadFactor,即HashMap的负载因子,默认值为0.75
扩容需要满足的条件:
HashMap.Size >= Capacity X LoadFactor
简单解释为:当哈希表中的条目数超出了当前容量与其加载因子的乘积时,并且要存放的位置已经有元素了(哈希碰撞),这两个条件满足时,需要进项扩容,会将容量扩大为原来的两倍。加载因子默认值0.75,是在空间和时间上的一个折中,加载因子过高(发生冲突可多存放在链表),虽然减少了空间成本,但也增加了查询成本。
扩容的步骤:
扩容不是简单地把散列表的长度扩大,而是经历了下面两个步骤:
1.扩容,创建一个新的Entry空数组,长度时原数组的2倍;
2.重新Hash,遍历原Entry数组,所有的Entry重新Hash到新数组中。
经过扩容,原本拥挤的散列表重新变得稀疏,原有的Entry也重新得到了尽可能均匀的分配。需要注意的是,关于HashMap的实现,JDK8和以前的版本有着很大的不同。当多个Entry被Hash到同一个数组下标位置时,为了提高插入和查找的效率,HashMap会把Entry的链表转化为红黑树这种数据结构。
相关文章教程推荐:java语言入门
The above is the detailed content of Detailed introduction to hash tables in java. For more information, please follow other related articles on the PHP Chinese website!