Conversation
zakird
approved these changes
Dec 10, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #913
Motivation
In investigating #911, I built a python wrapper that checks ZMap's scan coverage (whether we scan all targets we expect to and don't scan a target twice).
The new
--fast-dryrundecreases the output data from zmap by 97.5% to improve the scalability of these sort of testsBug
We were scanning the lowest 12 IPs twice (1.0.0.1/28 or so).
We're using multiplicative generators to randomly permute the IPv4 address space. The prime/modulus we use for single-port scans is
2 ** 32 + 15. This means that we generate 15 values that the lower 32 bits look identical to another target.Ex:
4294967299 % (2 ** 32) == 3and
3 % (2 ** 32) == 3This is shown in these debugging logs. Both these candidates mapped to
1.0.0.2Fix
max_candidate= (2 ** 32) * (2 ** bits_for_port)max_candidateAdditional Features
conf/to check that we're respecting the blocklist used by ZMap. It works on multiple ports and needs ~500MB RAM per port to store the per-target bitmap.--fast-dryrunmode that just outputs 6 bytes (4 bytes for IP, 2 bytes Port) for each target. This reduced the data generated by ZMap 97%, from 248 bytes per target to 6.--dryrunand usinggrep,sort, anduniqto find duplicates. But this file was on the order of 50 GB per port scanned and took hours tosorteven with tons of RAM and cores. No good.5.55 Mpps, enabling us to run full-coverage of the sharding on the IPv4 space on multiple ports much more efficiently.