Mar 10, 2025

Implmenting Bloom filters in JS

If you have been programming, you must be familiar with & have used different data structures like arrays, linked lists, trees, etc. Bloom filters are also a type of data structure. Its a data structure thats is used to check if an element is present in a set or not.

Bloom filters were originally invented to solve space efficiency problems. In the 1970s, memory was scarse and storing a larrge amount of data in memory was a impractical.

How does a bloom filter work ?

Bloom filters consist of a bit array and a set of hash functions. All bits in the array are initialized to 0. Adding a new value to the bloom filter involves computing the hash values of the value using the hash functions and setting the corresponding bits in the bit array to 1.

When we want to check if a value is present in bloom filter or not, we again compute the hash values of the value using the hash functions and check if all the corresponding bits in the bit array are set to 1.

If all the bits are set to 1, then the value is present in the bloom filter. If any of the bits are not set to 1, then the value is not present in the bloom filter.

So simply said, if bloom returns false for a lookup value, then the value is definitely not present in the bloom filter. But if bloom returns true for a lookup vaule, then the value might be present in the bloom filter.

Here's a quick illustration:

source

Building spell checker using bloom filters

Here's a simple example of building a spell checker using bloom filters.

class BloomFilter {
constructor(size = 1000, numHashes = 3) {
this.size = size; // Size of the bit array
this.numHashes = numHashes; // Number of hash functions
this.bitArray = new Array(size).fill(0); // Initialize all bits to 0
}

// Basic hash functions (for simplicity, we use a few different hash methods)
hash1(word) {
let hash = 0;
for (let i = 0; i < word.length; i++) {
hash = (hash << 5) + hash + word.charCodeAt(i);
hash = hash & hash; // Convert to 32-bit integer
hash = Math.abs(hash);
}
return hash % this.size;
}

hash2(word) {
let hash = 5381;
for (let i = 0; i < word.length; i++) {
hash = (hash * 33) ^ word.charCodeAt(i);
}
return Math.abs(hash) % this.size;
}

hash3(word) {
let hash = 0;
for (let i = 0; i < word.length; i++) {
hash = (hash << 3) + word.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash) % this.size;
}

// Insert a word into the Bloom Filter
add(word) {
const indices = [
this.hash1(word),
this.hash2(word),
this.hash3(word),
].slice(0, this.numHashes); // Use first `numHashes` hashes

indices.forEach((index) => {
this.bitArray[index] = 1;
});
}

// Check if a word *might* be in the set (may have false positives)
contains(word) {
const indices = [
this.hash1(word),
this.hash2(word),
this.hash3(word),
].slice(0, this.numHashes);

return indices.every((index) => this.bitArray[index] === 1);
}
}
class BloomFilter {
constructor(size = 1000, numHashes = 3) {
this.size = size; // Size of the bit array
this.numHashes = numHashes; // Number of hash functions
this.bitArray = new Array(size).fill(0); // Initialize all bits to 0
}

// Basic hash functions (for simplicity, we use a few different hash methods)
hash1(word) {
let hash = 0;
for (let i = 0; i < word.length; i++) {
hash = (hash << 5) + hash + word.charCodeAt(i);
hash = hash & hash; // Convert to 32-bit integer
hash = Math.abs(hash);
}
return hash % this.size;
}

hash2(word) {
let hash = 5381;
for (let i = 0; i < word.length; i++) {
hash = (hash * 33) ^ word.charCodeAt(i);
}
return Math.abs(hash) % this.size;
}

hash3(word) {
let hash = 0;
for (let i = 0; i < word.length; i++) {
hash = (hash << 3) + word.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash) % this.size;
}

// Insert a word into the Bloom Filter
add(word) {
const indices = [
this.hash1(word),
this.hash2(word),
this.hash3(word),
].slice(0, this.numHashes); // Use first `numHashes` hashes

indices.forEach((index) => {
this.bitArray[index] = 1;
});
}

// Check if a word *might* be in the set (may have false positives)
contains(word) {
const indices = [
this.hash1(word),
this.hash2(word),
this.hash3(word),
].slice(0, this.numHashes);

return indices.every((index) => this.bitArray[index] === 1);
}
}

Here's how you can use this:

// Initialize Bloom Filter
const bloomFilter = new BloomFilter(1000, 3);

// Add correct words to the "dictionary"
const dictionary = ["apple", "banana", "cherry", "date", "elderberry"];
dictionary.forEach(word => bloomFilter.add(word));

// Test words for spelling
const testWords = ["apple", "banana", "kiwi", "date", "mango"];

testWords.forEach(word => {
const exists = bloomFilter.contains(word);
console.log(`"${word}" is ${exists ? "probably correct" : "definitely wrong"}`);
});
// Initialize Bloom Filter
const bloomFilter = new BloomFilter(1000, 3);

// Add correct words to the "dictionary"
const dictionary = ["apple", "banana", "cherry", "date", "elderberry"];
dictionary.forEach(word => bloomFilter.add(word));

// Test words for spelling
const testWords = ["apple", "banana", "kiwi", "date", "mango"];

testWords.forEach(word => {
const exists = bloomFilter.contains(word);
console.log(`"${word}" is ${exists ? "probably correct" : "definitely wrong"}`);
});

Here's the output of this code:

"apple" is probably correct
"banana" is probably correct
"kiwi" is definitely wrong (correct, not in dictionary)
"date" is probably correct
"mango" is definitely wrong (correct, not in dictionary)
"apple" is probably correct
"banana" is probably correct
"kiwi" is definitely wrong (correct, not in dictionary)
"date" is probably correct
"mango" is definitely wrong (correct, not in dictionary)

When to use bloom filters ?

  • When you need extreme space efficiency e.g A Bloom filter can represent a 1-million-word dictionary in ~1MB (with a 1% false-positive rate), while a hash set would require 10–100× more memory.
  • When avoiding expensive lookups is critical e.g DB lookups, disk lookups, CDN lookups
  • When false positives are acceptable e.g. url blocking, ad blocking, recommendation system

When to NOT use bloom filters ?

SituationWhy Not to Use Bloom FiltersAlternative
False positives are unacceptableBloom filters are probabilisticHash sets, trees
Need to delete elementsStandard Bloom filters do not support deletionCounting Bloom filter, hash set
Need to enumerate or retrieve elementsCannot list or retrieve elementsHash set, list, tree
Dataset is smallOverhead outweighs benefitsHash set, bit array
Memory access patterns are criticalCan cause cache misses, slowdownsHash set, optimized arrays
Set size is unknown or unboundedFalse positive rate grows with overfillScalable Bloom filters, hash set
Need to store actual dataOnly stores membership, not dataHash set, map

Who is really using bloom filters ?

  • Akamai uses bloom filters for optimizing edge caching by filtering the content which are accessed only once.
  • Chrome browser uses bloom filters for validating malicious URLs on client side.
  • Brave browser uses uses Bloom filters to efficiently check if a domain is in a blocklist for blocking ads.
  • Google BigTable uses bloom filters to reduce the number of disk seeks for non existing rows.
  • Dropbox uses bloom filters to check if a file chunk is already present in disk ( file dedup ).