Mar 10, 2025

Implmenting Bloom filters in JS

If you have been programming, you must be familiar with & have used different data structures like arrays, linked lists, trees, etc. Bloom filters are also a type of data structure. Its a data structure thats is used to check if an element is present in a set or not.

Bloom filters were originally invented to solve space efficiency problems. In the 1970s, memory was scarse and storing a larrge amount of data in memory was a impractical.

How does a bloom filter work ?

Bloom filters consist of a bit array and a set of hash functions. All bits in the array are initialized to 0. Adding a new value to the bloom filter involves computing the hash values of the value using the hash functions and setting the corresponding bits in the bit array to 1.

When we want to check if a value is present in bloom filter or not, we again compute the hash values of the value using the hash functions and check if all the corresponding bits in the bit array are set to 1.

If all the bits are set to 1, then the value is present in the bloom filter. If any of the bits are not set to 1, then the value is not present in the bloom filter.

So simply said, if bloom returns false for a lookup value, then the value is definitely not present in the bloom filter. But if bloom returns true for a lookup vaule, then the value might be present in the bloom filter.

Here's a quick illustration:

source

Building spell checker using bloom filters

Here's a simple example of building a spell checker using bloom filters.

class BloomFilter {
  constructor(size = 1000, numHashes = 3) {
    this.size = size;          // Size of the bit array
    this.numHashes = numHashes; // Number of hash functions
    this.bitArray = new Array(size).fill(0); // Initialize all bits to 0
  }

  // Basic hash functions (for simplicity, we use a few different hash methods)
  hash1(word) {
    let hash = 0;
    for (let i = 0; i < word.length; i++) {
      hash = (hash << 5) + hash + word.charCodeAt(i);
      hash = hash & hash; // Convert to 32-bit integer
      hash = Math.abs(hash);
    }
    return hash % this.size;
  }

  hash2(word) {
    let hash = 5381;
    for (let i = 0; i < word.length; i++) {
      hash = (hash * 33) ^ word.charCodeAt(i);
    }
    return Math.abs(hash) % this.size;
  }

  hash3(word) {
    let hash = 0;
    for (let i = 0; i < word.length; i++) {
      hash = (hash << 3) + word.charCodeAt(i);
      hash = hash & hash;
    }
    return Math.abs(hash) % this.size;
  }

  // Insert a word into the Bloom Filter
  add(word) {
    const indices = [
      this.hash1(word),
      this.hash2(word),
      this.hash3(word),
    ].slice(0, this.numHashes); // Use first `numHashes` hashes
    
    indices.forEach((index) => {
      this.bitArray[index] = 1;
    });
  }

  // Check if a word *might* be in the set (may have false positives)
  contains(word) {
    const indices = [
      this.hash1(word),
      this.hash2(word),
      this.hash3(word),
    ].slice(0, this.numHashes);

    return indices.every((index) => this.bitArray[index] === 1);
  }
}

class BloomFilter {
  constructor(size = 1000, numHashes = 3) {
    this.size = size;          // Size of the bit array
    this.numHashes = numHashes; // Number of hash functions
    this.bitArray = new Array(size).fill(0); // Initialize all bits to 0
  }

  // Basic hash functions (for simplicity, we use a few different hash methods)
  hash1(word) {
    let hash = 0;
    for (let i = 0; i < word.length; i++) {
      hash = (hash << 5) + hash + word.charCodeAt(i);
      hash = hash & hash; // Convert to 32-bit integer
      hash = Math.abs(hash);
    }
    return hash % this.size;
  }

  hash2(word) {
    let hash = 5381;
    for (let i = 0; i < word.length; i++) {
      hash = (hash * 33) ^ word.charCodeAt(i);
    }
    return Math.abs(hash) % this.size;
  }

  hash3(word) {
    let hash = 0;
    for (let i = 0; i < word.length; i++) {
      hash = (hash << 3) + word.charCodeAt(i);
      hash = hash & hash;
    }
    return Math.abs(hash) % this.size;
  }

  // Insert a word into the Bloom Filter
  add(word) {
    const indices = [
      this.hash1(word),
      this.hash2(word),
      this.hash3(word),
    ].slice(0, this.numHashes); // Use first `numHashes` hashes
    
    indices.forEach((index) => {
      this.bitArray[index] = 1;
    });
  }

  // Check if a word *might* be in the set (may have false positives)
  contains(word) {
    const indices = [
      this.hash1(word),
      this.hash2(word),
      this.hash3(word),
    ].slice(0, this.numHashes);

    return indices.every((index) => this.bitArray[index] === 1);
  }
}

Here's how you can use this:

// Initialize Bloom Filter
const bloomFilter = new BloomFilter(1000, 3);

// Add correct words to the "dictionary"
const dictionary = ["apple", "banana", "cherry", "date", "elderberry"];
dictionary.forEach(word => bloomFilter.add(word));

// Test words for spelling
const testWords = ["apple", "banana", "kiwi", "date", "mango"];

testWords.forEach(word => {
  const exists = bloomFilter.contains(word);
  console.log(`"${word}" is ${exists ? "probably correct" : "definitely wrong"}`);
});

// Initialize Bloom Filter
const bloomFilter = new BloomFilter(1000, 3);

// Add correct words to the "dictionary"
const dictionary = ["apple", "banana", "cherry", "date", "elderberry"];
dictionary.forEach(word => bloomFilter.add(word));

// Test words for spelling
const testWords = ["apple", "banana", "kiwi", "date", "mango"];

testWords.forEach(word => {
  const exists = bloomFilter.contains(word);
  console.log(`"${word}" is ${exists ? "probably correct" : "definitely wrong"}`);
});

Here's the output of this code:

"apple" is probably correct
"banana" is probably correct
"kiwi" is definitely wrong (correct, not in dictionary)
"date" is probably correct
"mango" is definitely wrong (correct, not in dictionary)

"apple" is probably correct
"banana" is probably correct
"kiwi" is definitely wrong (correct, not in dictionary)
"date" is probably correct
"mango" is definitely wrong (correct, not in dictionary)

When to use bloom filters ?

When you need extreme space efficiency e.g A Bloom filter can represent a 1-million-word dictionary in ~1MB (with a 1% false-positive rate), while a hash set would require 10–100× more memory.
When avoiding expensive lookups is critical e.g DB lookups, disk lookups, CDN lookups
When false positives are acceptable e.g. url blocking, ad blocking, recommendation system

When to NOT use bloom filters ?

Situation	Why Not to Use Bloom Filters	Alternative
False positives are unacceptable	Bloom filters are probabilistic	Hash sets, trees
Need to delete elements	Standard Bloom filters do not support deletion	Counting Bloom filter, hash set
Need to enumerate or retrieve elements	Cannot list or retrieve elements	Hash set, list, tree
Dataset is small	Overhead outweighs benefits	Hash set, bit array
Memory access patterns are critical	Can cause cache misses, slowdowns	Hash set, optimized arrays
Set size is unknown or unbounded	False positive rate grows with overfill	Scalable Bloom filters, hash set
Need to store actual data	Only stores membership, not data	Hash set, map

Who is really using bloom filters ?

Akamai uses bloom filters for optimizing edge caching by filtering the content which are accessed only once.
Chrome browser uses bloom filters for validating malicious URLs on client side.
Brave browser uses uses Bloom filters to efficiently check if a domain is in a blocklist for blocking ads.
Google BigTable uses bloom filters to reduce the number of disk seeks for non existing rows.
Dropbox uses bloom filters to check if a file chunk is already present in disk ( file dedup ).