More fixes and improvements to pattern matching#64
More fixes and improvements to pattern matching#64stevemk14ebr merged 8 commits intomandiant:masterfrom ViRb3:master
Conversation
|
I am working on reviewing this, I don't quite understand why we need the caches, they come with a very high overhead if the needle is common and the region being scanned contains the needle a lot - we already have OOM issues on large samples so I am weary of including that particular commit TBH. Our needle scan shouldn't really return duplicate regions (I could be missing something). Let's say the needle is a simple |
|
I've reverted the cache for now, I would consider adding it back if you can prove to me (ideally via a unit test in pattern_test.go) that it is necessary. Right now I cannot justify the memory overhead it introduces and it doesn't appear to me it's necessary for the kind of patterns used by GoReSym itself. Thanks for the continued improvements! |
This was a very simple one.
With the current implementation of needle search + "truncate right" to handle sub-matches, we end up re-scanning the same regions multiple times. In some cases, this is negligible, in others, it's really bad. There's probably a better way to handle this, but to fix the most basic cases, we now cache each region (start + end address), and skip regex matching if the exact same address was processed before.
This prevented one of my test cases to match.
Changes the matching function's signature to also return end indexes, in preparation for unit tests.
This is workaround for the 2nd issue above.