legda is a small asyncio-based data retrieval app for government documents.
DocumentandDocumentInstance(legda/models.py) are SQLAlchemy models for persistent state.RetrievalPlugin(legda/plugins/base.py) defines the plugin contract:get_documents(datetime) -> list[Document]get_document(Document, DocumentInstance) -> bytesdownload_document(Document) -> list[DocumentInstance]
FederalRegisterRetriever(legda/plugins/federal_register.py) provides shared Federal Register retrieval logic.ExecutiveOrderRetriever(legda/plugins/executive_orders.py) uses that base for executive orders.OmbMemoRetriever(legda/plugins/omb_memos.py) scrapes OMB memoranda from the White House guidance index.OpmFederalRegisterRetriever(legda/plugins/opm_federal_register.py) uses that base for OPM actions.OpmMemoRetriever(legda/plugins/opm_memos.py) scrapes published OPM CHCOC memos.CongressRetriever(legda/plugins/congress.py) discovers passed legislation and downloads bill XML from Congress.gov.DiscoveryDriver(legda/drivers/discovery.py) runs all plugins and upserts discovered documents.DownloadDriver(legda/drivers/download.py) fetches documents withstatus=to_fetch.FetchDate(legda/models.py) stores per-plugin successful discovery timestamps.- Plugin registration lives in
legda/plugins/__init__.pyvia a plugin-key to class map.
python3 -m pip install -r requirements.txt
cat > .env <<'EOF'
CONGRESS_API_KEY=your_key_here
EOF
python3 -m legdaThis fetches and downloads retrieved documents into:
data/<document_type>/<publication_date>-<document_id>.<extension>
Discovery uses the last successful plugin fetch timestamp from fetch_dates; new
plugins start from DEFAULT_DISCOVERY_SINCE in legda/constants.py.