Skip to content

Fix for #913 and added IPv4 scan coverage integration test and python wrapper with --fast-dryrun#916

Merged
zakird merged 29 commits intomainfrom
phillip/911-integration-test
Dec 10, 2024
Merged

Fix for #913 and added IPv4 scan coverage integration test and python wrapper with --fast-dryrun#916
zakird merged 29 commits intomainfrom
phillip/911-integration-test

Conversation

@phillip-stephens
Copy link
Contributor

@phillip-stephens phillip-stephens commented Dec 6, 2024

Resolves #913

Motivation

In investigating #911, I built a python wrapper that checks ZMap's scan coverage (whether we scan all targets we expect to and don't scan a target twice).
The new --fast-dryrun decreases the output data from zmap by 97.5% to improve the scalability of these sort of tests

Bug

We were scanning the lowest 12 IPs twice (1.0.0.1/28 or so).

We're using multiplicative generators to randomly permute the IPv4 address space. The prime/modulus we use for single-port scans is 2 ** 32 + 15. This means that we generate 15 values that the lower 32 bits look identical to another target.

Ex:
4294967299 % (2 ** 32) == 3
and
3 % (2 ** 32) == 3

This is shown in these debugging logs. Both these candidates mapped to 1.0.0.2

Dec 05 20:14:57.334 [DEBUG] shard: 1.0.0.2 hit. Candidate: 4294967299
Dec 05 20:15:15.563 [DEBUG] shard: 1.0.0.2 hit. Candidate: 3

Fix

  • add a max_candidate = (2 ** 32) * (2 ** bits_for_port)
  • re-roll if candidate is >= max_candidate

Additional Features

  • Integration Test
    • I added a python wrapper that uses a bitmap to identify targets not scanned and scanned multiple times. It dynamically reads the blocklist from conf/ to check that we're respecting the blocklist used by ZMap. It works on multiple ports and needs ~500MB RAM per port to store the per-target bitmap.
  • To make the integration test fast, I added a --fast-dryrun mode that just outputs 6 bytes (4 bytes for IP, 2 bytes Port) for each target. This reduced the data generated by ZMap 97%, from 248 bytes per target to 6.
    • I initially tried using --dryrun and using grep, sort, and uniq to find duplicates. But this file was on the order of 50 GB per port scanned and took hours to sort even with tons of RAM and cores. No good.
  • On my MacBook Air (M2), this test runs with the python wrapper at 5.55 Mpps, enabling us to run full-coverage of the sharding on the IPv4 space on multiple ports much more efficiently.
    • Single port across the full IPv4 space w/ Blocklist takes ~11min.

@phillip-stephens phillip-stephens marked this pull request as ready for review December 10, 2024 14:47
@zakird zakird merged commit b7c815a into main Dec 10, 2024
@zakird zakird deleted the phillip/911-integration-test branch December 10, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Target scanned twice in single scan

2 participants