How to apply Bloom filter in Java
What is a Bloom filter
The Bloom Filter is a very space-efficient random data structure. It uses a bit array (BitSet) to represent a set and passes a certain number of The hash function maps elements to positions in a bit array and is used to check whether an element belongs to this set.
Core idea of implementation
For an element, multiple hash values are generated through multiple hash functions, and the corresponding bits are set to 1 in the bit array. If there are multiple hashes If the corresponding bits of the value are all 1, it is considered that the element may be in the set; if at least one corresponding bit of the hash value is 0, the element is definitely not in the set. This method can achieve efficient search in a smaller space, but may have a false positive rate.
How to understand
A typical Bloom filter contains three parameters: the size of the bit array (i.e. the number of stored elements); the number of hash functions; the fill factor (i.e. False positive rate), that is, the ratio of the number of elements to the size of the bit array.

As shown in the figure above: The basic operation process of the Bloom filter includes initializing the bit array and hash function, inserting elements, checking whether the elements are in the set, etc. Among them, each element will be mapped to multiple positions in the bit array by multiple hash functions. When checking whether the element is in the set, you need to ensure that all corresponding bits are set to 1 before it is considered that the element may be in the set. in collection.
Typical application scenarios
Spam filtering: Set the corresponding position of the hash value of all blacklist emails in the Bloom filter to 1, for For each new email, check whether its hash value in the corresponding position in the Bloom filter is 1. If so, the email is considered to be spam, otherwise it may be a normal email;
URL deduplication: Set the corresponding position of the hash value of the crawled URL in the Bloom filter to 1. For each new URL, set its hash value in the Bloom filter. Check whether the corresponding positions are all 1. If so, the URL is considered to have been crawled, otherwise it needs to be crawled;
Cache breakdown: Correspond to all data existing in the cache The corresponding position of the hash value in the Bloom filter is set to 1. For each query key value, check whether its hash value in the corresponding position in the Bloom filter is all 1. If so, it is considered The key value exists in the cache, otherwise it needs to be queried from the database and added to the cache.
It should be noted that the false positive rate of the Bloom filter will decrease as the bit array size increases, but it will also increase the memory overhead and calculation time. In order to facilitate the understanding of Bloom filters, the following uses java code to implement a simple Bloom filter:
import java.util.BitSet;
import java.util.Random;
public class BloomFilter {
private BitSet bitSet; // 位集,用于存储哈希值
private int bitSetSize; // 位集大小
private int numHashFunctions; // 哈希函数数量
private Random random; // 随机数生成器
// 构造函数,根据期望元素数量和错误率计算位集大小和哈希函数数量
public BloomFilter(int expectedNumItems, double falsePositiveRate) {
this.bitSetSize = optimalBitSetSize(expectedNumItems, falsePositiveRate);
this.numHashFunctions = optimalNumHashFunctions(expectedNumItems, bitSetSize);
this.bitSet = new BitSet(bitSetSize);
this.random = new Random();
}
// 根据期望元素数量和错误率计算最佳位集大小
private int optimalBitSetSize(int expectedNumItems, double falsePositiveRate) {
int bitSetSize = (int) Math.ceil(expectedNumItems * (-Math.log(falsePositiveRate) / Math.pow(Math.log(2), 2)));
return bitSetSize;
}
// 根据期望元素数量和位集大小计算最佳哈希函数数量
private int optimalNumHashFunctions(int expectedNumItems, int bitSetSize) {
int numHashFunctions = (int) Math.ceil((bitSetSize / expectedNumItems) * Math.log(2));
return numHashFunctions;
}
// 添加元素到布隆过滤器中
public void add(String item) {
// 计算哈希值
int[] hashes = createHashes(item.getBytes(), numHashFunctions);
// 将哈希值对应的位设置为 true
for (int hash : hashes) {
bitSet.set(Math.abs(hash % bitSetSize), true);
}
}
// 检查元素是否存在于布隆过滤器中
public boolean contains(String item) {
// 计算哈希值
int[] hashes = createHashes(item.getBytes(), numHashFunctions);
// 检查哈希值对应的位是否都为 true
for (int hash : hashes) {
if (!bitSet.get(Math.abs(hash % bitSetSize))) {
return false;
}
}
return true;
}
// 计算给定数据的哈希值
private int[] createHashes(byte[] data, int numHashes) {
int[] hashes = new int[numHashes];
int hash2 = Math.abs(random.nextInt());
int hash3 = Math.abs(random.nextInt());
for (int i = 0; i < numHashes; i++) {
// 使用两个随机哈希函数计算哈希值
hashes[i] = Math.abs((hash2 * i) + (hash3 * i) + i) % data.length;
}
return hashes;
}
}The above is the detailed content of How to apply Bloom filter in Java. For more information, please follow other related articles on the PHP Chinese website!
Hot AI Tools
Undress AI Tool
Undress images for free
Undresser.AI Undress
AI-powered app for creating realistic nude photos
AI Clothes Remover
Online AI tool for removing clothes from photos.
Clothoff.io
AI clothes remover
Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!
Hot Article
Hot Tools
Notepad++7.3.1
Easy-to-use and free code editor
SublimeText3 Chinese version
Chinese version, very easy to use
Zend Studio 13.0.1
Powerful PHP integrated development environment
Dreamweaver CS6
Visual web development tools
SublimeText3 Mac version
God-level code editing software (SublimeText3)
What is a deadlock in Java and how can you prevent it?
Aug 23, 2025 pm 12:55 PM
AdeadlockinJavaoccurswhentwoormorethreadsareblockedforever,eachwaitingforaresourceheldbytheother,typicallyduetocircularwaitcausedbyinconsistentlockordering;thiscanbepreventedbybreakingoneofthefournecessaryconditions—mutualexclusion,holdandwait,nopree
How to use Optional in Java?
Aug 22, 2025 am 10:27 AM
UseOptional.empty(),Optional.of(),andOptional.ofNullable()tocreateOptionalinstancesdependingonwhetherthevalueisabsent,non-null,orpossiblynull.2.CheckforvaluessafelyusingisPresent()orpreferablyifPresent()toavoiddirectnullchecks.3.Providedefaultswithor
Java Persistence with Spring Data JPA and Hibernate
Aug 22, 2025 am 07:52 AM
The core of SpringDataJPA and Hibernate working together is: 1. JPA is the specification and Hibernate is the implementation, SpringDataJPA encapsulation simplifies DAO development; 2. Entity classes map database structures through @Entity, @Id, @Column, etc.; 3. Repository interface inherits JpaRepository to automatically implement CRUD and named query methods; 4. Complex queries use @Query annotation to support JPQL or native SQL; 5. In SpringBoot, integration is completed by adding starter dependencies and configuring data sources and JPA attributes; 6. Transactions are made by @Transactiona
Java Cryptography Architecture (JCA) for Secure Coding
Aug 23, 2025 pm 01:20 PM
Understand JCA core components such as MessageDigest, Cipher, KeyGenerator, SecureRandom, Signature, KeyStore, etc., which implement algorithms through the provider mechanism; 2. Use strong algorithms and parameters such as SHA-256/SHA-512, AES (256-bit key, GCM mode), RSA (2048-bit or above) and SecureRandom; 3. Avoid hard-coded keys, use KeyStore to manage keys, and generate keys through securely derived passwords such as PBKDF2; 4. Disable ECB mode, adopt authentication encryption modes such as GCM, use unique random IVs for each encryption, and clear sensitive ones in time
LOL Game Settings Not Saving After Closing [FIXED]
Aug 24, 2025 am 03:17 AM
IfLeagueofLegendssettingsaren’tsaving,trythesesteps:1.Runthegameasadministrator.2.GrantfullfolderpermissionstotheLeagueofLegendsdirectory.3.Editandensuregame.cfgisn’tread-only.4.Disablecloudsyncforthegamefolder.5.RepairthegameviatheRiotClient.
How to use the Pattern and Matcher classes in Java?
Aug 22, 2025 am 09:57 AM
The Pattern class is used to compile regular expressions, and the Matcher class is used to perform matching operations on strings. The combination of the two can realize text search, matching and replacement; first create a pattern object through Pattern.compile(), and then call its matcher() method to generate a Matcher instance. Then use matches() to judge the full string matching, find() to find subsequences, replaceAll() or replaceFirst() for replacement. If the regular contains a capture group, the nth group content can be obtained through group(n). In actual applications, you should avoid repeated compilation patterns, pay attention to special character escapes, and use the matching pattern flag as needed, and ultimately achieve efficient
Edit bookmarks in chrome
Aug 27, 2025 am 12:03 AM
Chrome bookmark editing is simple and practical. Users can enter the bookmark manager through the shortcut keys Ctrl Shift O (Windows) or Cmd Shift O (Mac), or enter through the browser menu; 1. When editing a single bookmark, right-click to select "Edit", modify the title or URL and click "Finish" to save; 2. When organizing bookmarks in batches, you can hold Ctrl (or Cmd) to multiple-choice bookmarks in the bookmark manager, right-click to select "Move to" or "Copy to" the target folder; 3. When exporting and importing bookmarks, click the "Solve" button to select "Export Bookmark" to save as HTML file, and then restore it through the "Import Bookmark" function if necessary.
'Java is not recognized' Error in CMD [3 Simple Steps]
Aug 23, 2025 am 01:50 AM
IfJavaisnotrecognizedinCMD,ensureJavaisinstalled,settheJAVA_HOMEvariabletotheJDKpath,andaddtheJDK'sbinfoldertothesystemPATH.RestartCMDandrunjava-versiontoconfirm.


