Skip to content
This repository was archived by the owner on Jul 30, 2024. It is now read-only.

Fix ScalableBloomFilter and tests on top of #5#6

Merged
jaybaird merged 3 commits intojaybaird:masterfrom
etrepum:master
Dec 27, 2012
Merged

Fix ScalableBloomFilter and tests on top of #5#6
jaybaird merged 3 commits intojaybaird:masterfrom
etrepum:master

Conversation

@etrepum
Copy link
Copy Markdown
Contributor

@etrepum etrepum commented Dec 26, 2012

This takes the branch from pr #5 that fixes BF implementation, fixes SBF, and makes the tests pass again.

glangford and others added 3 commits December 13, 2012 11:27
Empirical measurement of the false positive rate of BloomFilter is
substantially lower that the requested error rates. For example, a
BloomFilter with capacity 20,000 and an error rate of .01, gives actual
error rate .0003. This is because the bit size allocated to the filter
is unnecessarily large, and does not match the theoretically calculated
required size.
@etrepum
Copy link
Copy Markdown
Contributor Author

etrepum commented Dec 26, 2012

The bad news is that ScalableBloomFilter is definitely implemented incorrectly, if you do the math it converges to a much higher error rate than it's supposed to, because the first filter starts off with the target error rate so it can only get worse from there since checking each filter is an independent event. It appears that this bug has also propagated to other implementations inspired by this one [1]. Should probably tell them once it's fixed.

Per the paper and other implementations [2] the error rate for each bloom filter in the SBF has to be calculated much differently. Specifically the sum([err * (1-ratio) * pow(ratio, n) for n in [0..inf]]) should converge to err. The implementation in [2] calculates the error rate of the next filter by just doing a multiply of the previous error rate, which is probably what we should do too.

(err is 0.001 in this case, with ratio of 0.9)

>>> sum([0.001*(1-0.9)*pow(0.9, n) for n in xrange(0,328)])
0.0009999999999999994

[1] http://code.google.com/p/scalable-bloom-filter/source/browse/trunk/trunk/scalable-bloom-filter/src/main/java/com/elaunira/sbf/ScalableBloomFilter.java
[2] https://github.com/basho/riak_core/blob/master/src/bloom.erl

@ghost ghost assigned jaybaird Dec 27, 2012
jaybaird added a commit that referenced this pull request Dec 27, 2012
Fix ScalableBloomFilter and tests on top of #5
@jaybaird jaybaird merged commit abfccef into jaybaird:master Dec 27, 2012
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants