Bloom Filter in Pure PHP
PHP Code Editor
Execution Result
Ready to execute
Click the "Run Script" button to see the output here
Description
A Bloom filter answers one question cheaply: is this item definitely not in the set, or possibly in it. It never gives a false negative, so it is perfect as a fast pre-check in front of an expensive lookup, like skipping a database hit for a key you can prove was never inserted. The cost is a small, tunable false positive rate and the fact that you cannot remove items or list them back.
This implementation uses a bit array backed by a sparse PHP array and derives k positions per item from two hashes combined linearly, the standard double-hashing trick that behaves like k independent hashes without computing k of them. crc32 gives the first hash and the top bytes of a sha256 give the second. Size the bit array and hash count to your expected item count to control the false positive rate.
final class BloomFilter {
private array $bits = [];
public function __construct(private int $size = 8192, private int $hashes = 4) {}
private function positions(string $item): array {
$h1 = crc32($item);
$h2 = (int) hexdec(substr(hash('sha256', $item), 0, 8));
$out = [];
for ($i = 0; $i < $this->hashes; $i++) {
$out[] = abs(($h1 + $i * $h2) % $this->size);
}
return $out;
}
public function add(string $item): void {
foreach ($this->positions($item) as $p) {
$this->bits[$p] = true;
}
}
public function mightContain(string $item): bool {
foreach ($this->positions($item) as $p) {
if (empty($this->bits[$p])) {
return false; // definitely not present
}
}
return true; // probably present
}
}
$bf = new BloomFilter();
foreach (['alice', 'bob', 'carol'] as $name) {
$bf->add($name);
}
echo $bf->mightContain('alice') ? "alice: maybe\n" : "alice: no\n";
echo $bf->mightContain('dave') ? "dave: maybe\n" : "dave: no\n";
Comments
No comments yet
Be the first to share your thoughts!