CLI tool to extract property registration data from Guatemala's DICABI website (Dirección de Catastro y Avalúo de Bienes Inmuebles).
Given a NIT (Número de Identificación Tributaria), it returns structured JSON with property details including addresses, valuations, registry data, and co-owners.
- Node.js 18+
- A display server (the browser must run in non-headless mode)
npm installnode scraper.js <NIT>The NIT can be provided with or without the dash before the check digit:
node scraper.js 6603025-0 # with dash
node scraper.js 66030250 # without dash (auto-normalized)
node scraper.js 693921K # K check digitOutput is JSON written to stdout. Errors and status messages go to stderr.
Output is NDJSON (one JSON object per line):
{"id":"1234567-8","idType":"NIT","name":"GARCIA LOPEZ , JUAN CARLOS","dpi":"2581 47936 0201","matriculas":[{"id":"01S400203","properties":[{"direccion":"LOTE 3, 12 AVE Y (GUATEMALA / GUATEMALA)","fechaDeclaracion":"09/07/2012","extensionMts2":317.15,"fechaOperacion":"01/09/2012","finca":"2847","folio":56,"libro":71249,"valorFincaQuetzales":80000,"procedencia":"01S035111","valorTerreno":0,"valorConstruccion":0,"areaConstruccionMts2":0,"valorCultivos":0}]}],"totalExtensionMts2":981.45,"totalValueQuetzales":475000}
Pipe through jq for readable output:
node scraper.js 1234567-8 | jq .| Field | Description |
|---|---|
id |
The queried NIT |
name |
Name associated with the queried NIT |
dpi |
DPI (Documento Personal de Identificación), if available |
note |
Present when the queried NIT is a co-owner, not the registered owner |
matriculas |
Array of matricula fiscal records |
matriculas[].id |
Matricula fiscal ID (e.g. 01R585038) |
matriculas[].registeredOwner |
Present when the property owner differs from the queried NIT |
matriculas[].properties |
Array of properties under this matricula |
matriculas[].coOwners |
Array of co-owners, if any |
totalExtensionMts2 |
Total area across all properties (square meters) |
totalValueQuetzales |
Total assessed value across all properties (Quetzales) |
The tool validates NITs using Guatemala's Modulo 11 algorithm before querying the website. Invalid NITs are rejected immediately:
$ node scraper.js 1234567-0
Error: Invalid NIT "1234567-0" (check digit does not match).
The DICABI website uses ASP.NET WebForms with an embedded Crystal Reports viewer. Standard HTTP requests are blocked by a WAF (Cloudflare), so the scraper uses a real browser via Puppeteer with a stealth plugin to bypass bot detection.
The browser must run in non-headless mode — headless mode is blocked by the WAF.
The report text is extracted from the Crystal Reports viewer's rendered output and parsed line-by-line using positional pattern matching. Multi-page reports are navigated automatically.
The website will rate-limit requests after too many in a short period. The scraper detects Cloudflare block pages and returns a clear error message with the Ray ID. If you get rate-limited, wait a few hours before retrying.
ISC