哈希表:碰撞、调整大小、哈希

WBOY
发布: 2024-08-17 08:50:02
原创
541 人浏览过

Hash Tables : Collisions, Resizing, Hashing

假设理解 Big O 表示法。 JavaScript 中有示例。资料参考 Gayle Laakmann McDowell 的《Cracking the Coding Interview》

了解哈希表

无论您听说过字典、哈希映射还是哈希表,它们本质上都是相同的。在本博客中,为了简单起见,我们将将此数据结构引用为哈希表。

让我们从定义什么是哈希表开始。哈希表是一种数据结构,它以键值对的形式将键映射到值,以实现高效查找。有多种方法可以实现它。


哈希表的关键组成部分

使用链表数组哈希函数我们可以实现一个哈希表。让我们更深入地了解什么是哈希函数。

什么是哈希函数?

哈希函数是哈希表的重要组成部分。它是一种算法,通常采用函数的形式,接受输入(或“键”)并返回固定大小的字节串(通常采用整数的形式)。输出称为哈希码或简称为哈希。

散列表中散列函数的主要目的是将散列码映射到桶/槽数组的有效索引,从中可以找到所需的值。在我们的例子中,这些桶/槽将是链接列表。

良好的哈希函数的特征:

  • 确定性:对于给定的输入,它应该始终产生相同的哈希输出。
  • 均匀分布:它应该在其输出范围内尽可能均匀地映射预期输入。
  • 高效:计算应该很快。
  • 雪崩效应:输入的微小变化应该会导致哈希输出显着不同。

为什么使用链表数组?

在哈希表实现中使用链表数组是一种常见技术,称为链接。这种方法有几个优点:

  1. 碰撞处理:使用链表数组的主要原因是为了有效地处理碰撞。当两个键散列并映射到同一个索引时,我们可以简单地将新的键值对添加到该索引处的链表中。
  2. 空间效率:它允许哈希表存储比底层数组大小更多的项目。每个数组槽可以通过其链表保存多个项目。
雷雷

在此示例中,键 1 和 2 哈希到索引 0,而键 4、5 和 6 都哈希到索引 2。


插入键值对

现在我们已经很好地理解了哈希函数以及为什么要使用链式,让我们来看看将键值对插入哈希表的流程:

  1. 插入键(任何值)时,我们首先计算键的哈希码(通常是 int 或 long)。两个不同的键可能具有相同的哈希码,因为可能有无限的键和有限的整数。

  2. 将哈希码映射到数组中的索引。将哈希码映射到数组的常见方法是使用模运算符。 (例如,hash(key) % array.length))。使用此方法,两个不同的哈希码可能映射到同一个索引。

  3. 在索引处,有一个键和值的链表。将键值对存储在此索引处。当键具有相同的哈希码或哈希码映射到相同的索引时,就会发生冲突。

访问键值对

在哈希表实现中访问键值对非常高效。只需根据键计算哈希码,然后根据哈希码计算索引,最后在链表中搜索具有该键的值。

假设实现良好,访问键值对(插入和删除也是如此)需要 O ( 1 ) O(1) O(1).


What Makes a Hash Table Implementation "Good"?

A well-implemented hash table should balance efficiency, space utilization, and collision handling. Here are the key factors that contribute to a good hash table implementation:

A Good Hash Function

The heart of any hash table is its hash function. A good hash function should:

  • Be quick to compute
  • Minimize collisions

Optimal Load Factor

Theload factoris the ratio of filled slots to total slots in the hash table. Maintaining an appropriate load factor is crucial:

A typicalsweet spotis between 0.6 and 0.75

  • Too low (< 0.5): Wastes space
  • Too high (> 0.8): Increases collision risk

Collision Resolution Techniques

Two primary methods for handling collisions are:

  1. Chaining: Each table position stores a linked list of collided items. Simple to implement but can lead to slower lookups if chains become long.

  2. Open Addressing: If a collision occurs, look for the next available slot. Keeps all data in the table but requires careful implementation to avoid clustering of stored data.

Note that chaining and open-addressing cannot coexist easily. Logically, it would not make sense to look for the next available slot but store collided items at a specific index.

Dynamic Resizing

As the number of elements grows, the hash table should resize to maintain performance:

Typically, the table size is doubled when the load factor exceeds a threshold. All elements need to be rehashed into the new, larger table.

This operation is expensive but infrequent, keeping the amortized time complexity at O(1).


JavaScript Implementation

This implementation will utilize resizing and chaining for collision resolution. We will assume that our keys are integers.

For the hash function + mapping, we will keep it very simple and simply perform the following given a key:

k e y % a r r a y c a p a c i t y key \hspace{0.2cm} \% \hspace{0.2cm} array \hspace{0.1cm} capacity key%arraycapacity

Classical OOP

class HashNode { constructor(key, value) { this.key = key; this.value = value; this.next = null; } } class HashTable { constructor(capacity = 16) { this.capacity = capacity; this.size = 0; this.buckets = new Array(this.capacity).fill(null); this.threshold = 0.75; } hash(key) { return key % this.capacity; } insert(key, value) { const index = this.hash(key); if (!this.buckets[index]) { this.buckets[index] = new HashNode(key, value); this.size++; } else { let currentNode = this.buckets[index]; while (currentNode.next) { if (currentNode.key === key) { currentNode.value = value; return; } currentNode = currentNode.next; } if (currentNode.key === key) { currentNode.value = value; } else { currentNode.next = new HashNode(key, value); this.size++; } } if (this.size / this.capacity >= this.threshold) { this.resize(); } } get(key) { const index = this.hash(key); let currentNode = this.buckets[index]; while (currentNode) { if (currentNode.key === key) { return currentNode.value; } currentNode = currentNode.next; } return undefined; } remove(key) { const index = this.hash(key); if (!this.buckets[index]) { return false; } if (this.buckets[index].key === key) { this.buckets[index] = this.buckets[index].next; this.size--; return true; } let currentNode = this.buckets[index]; while (currentNode.next) { if (currentNode.next.key === key) { currentNode.next = currentNode.next.next; this.size--; return true; } currentNode = currentNode.next; } return false; } resize() { const newCapacity = this.capacity * 2; const newBuckets = new Array(newCapacity).fill(null); this.buckets.forEach(head => { while (head) { const newIndex = head.key % newCapacity; const next = head.next; head.next = newBuckets[newIndex]; newBuckets[newIndex] = head; head = next; } }); this.buckets = newBuckets; this.capacity = newCapacity; } getSize() { return this.size; } getCapacity() { return this.capacity; } }
登录后复制

Functional OOP

function createHashTable(initialCapacity = 16) { let capacity = initialCapacity; let size = 0; let buckets = new Array(capacity).fill(null); const threshold = 0.75; function hash(key) { return key % capacity; } function resize() { const newCapacity = capacity * 2; const newBuckets = new Array(newCapacity).fill(null); buckets.forEach(function(head) { while (head) { const newIndex = head.key % newCapacity; const next = head.next; head.next = newBuckets[newIndex]; newBuckets[newIndex] = head; head = next; } }); buckets = newBuckets; capacity = newCapacity; } return { insert: function(key, value) { const index = hash(key); const newNode = { key, value, next: null }; if (!buckets[index]) { buckets[index] = newNode; size++; } else { let currentNode = buckets[index]; while (currentNode.next) { if (currentNode.key === key) { currentNode.value = value; return; } currentNode = currentNode.next; } if (currentNode.key === key) { currentNode.value = value; } else { currentNode.next = newNode; size++; } } if (size / capacity >= threshold) { resize(); } }, get: function(key) { const index = hash(key); let currentNode = buckets[index]; while (currentNode) { if (currentNode.key === key) { return currentNode.value; } currentNode = currentNode.next; } return undefined; }, remove: function(key) { const index = hash(key); if (!buckets[index]) { return false; } if (buckets[index].key === key) { buckets[index] = buckets[index].next; size--; return true; } let currentNode = buckets[index]; while (currentNode.next) { if (currentNode.next.key === key) { currentNode.next = currentNode.next.next; size--; return true; } currentNode = currentNode.next; } return false; }, getSize: function() { return size; }, getCapacity: function() { return capacity; } }; }
登录后复制

以上是哈希表:碰撞、调整大小、哈希的详细内容。更多信息请关注PHP中文网其他相关文章!

来源:dev.to
本站声明
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责声明 Sitemap
PHP中文网:公益在线PHP培训,帮助PHP学习者快速成长!