Fix ScalableBloomFilter and tests on top of #5 by etrepum · Pull Request #6 · jaybaird/python-bloomfilter

etrepum · 2012-12-26T20:27:03Z

This takes the branch from pr #5 that fixes BF implementation, fixes SBF, and makes the tests pass again.

Empirical measurement of the false positive rate of BloomFilter is substantially lower that the requested error rates. For example, a BloomFilter with capacity 20,000 and an error rate of .01, gives actual error rate .0003. This is because the bit size allocated to the filter is unnecessarily large, and does not match the theoretically calculated required size.

…size bug

…FP error.

etrepum · 2012-12-26T20:29:42Z

The bad news is that ScalableBloomFilter is definitely implemented incorrectly, if you do the math it converges to a much higher error rate than it's supposed to, because the first filter starts off with the target error rate so it can only get worse from there since checking each filter is an independent event. It appears that this bug has also propagated to other implementations inspired by this one [1]. Should probably tell them once it's fixed.

Per the paper and other implementations [2] the error rate for each bloom filter in the SBF has to be calculated much differently. Specifically the sum([err * (1-ratio) * pow(ratio, n) for n in [0..inf]]) should converge to err. The implementation in [2] calculates the error rate of the next filter by just doing a multiply of the previous error rate, which is probably what we should do too.

(err is 0.001 in this case, with ratio of 0.9)

>>> sum([0.001*(1-0.9)*pow(0.9, n) for n in xrange(0,328)])
0.0009999999999999994

[1] http://code.google.com/p/scalable-bloom-filter/source/browse/trunk/trunk/scalable-bloom-filter/src/main/java/com/elaunira/sbf/ScalableBloomFilter.java
[2] https://github.com/basho/riak_core/blob/master/src/bloom.erl

Fix ScalableBloomFilter and tests on top of #5

glangford and others added 3 commits December 13, 2012 11:27

Add performance benchmarks for BloomFilter; test code to demonstrate …

dcb8e49

…size bug

Bump to 2.0. Fix ScalableBloomFilter and make the tests resilient to …

64e42c8

…FP error.

jaybaird mentioned this pull request Dec 27, 2012

Fix suboptimal allocation of BloomFilter bit array size #5

Closed

ghost assigned jaybaird Dec 27, 2012

jaybaird added a commit that referenced this pull request Dec 27, 2012

Merge pull request #6 from etrepum/master

abfccef

Fix ScalableBloomFilter and tests on top of #5

jaybaird merged commit abfccef into jaybaird:master Dec 27, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ScalableBloomFilter and tests on top of #5#6

Fix ScalableBloomFilter and tests on top of #5#6
jaybaird merged 3 commits intojaybaird:masterfrom
etrepum:master

etrepum commented Dec 26, 2012

Uh oh!

etrepum commented Dec 26, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

etrepum commented Dec 26, 2012

Uh oh!

etrepum commented Dec 26, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants