Skip to content

Fetch data gov#788

Open
amit-spatial wants to merge 3 commits intodevfrom
fetch_data_gov
Open

Fetch data gov#788
amit-spatial wants to merge 3 commits intodevfrom
fetch_data_gov

Conversation

@amit-spatial
Copy link
Copy Markdown
Collaborator

A General Script for downloading data.gov data.

A robust, general, modular, and resumable downloader for data.gov.in resources.

FEATURES:

  • Resource Dictionary: Use short names (e.g., 'lgd_villages') instead of UUIDs.
  • Parallel Fetching: Multi-threaded downloads for speed.
  • Storage Backends: CSV, JSON, and JSONL (JSON Lines).
  • In-Memory Buffering: Option to keep data in RAM and flush periodically (safer for crashes).
  • Deduplication: Ensures clean restarts based on a unique key.
  • Resumable: Automatically detects existing file size to resume.
  • Added --ignore-total flag to force continuous downloading if API count is wrong.

USAGE:

The script resolves 'resource_name' from an internal dictionary.
If the name is not found, it assumes the input is a raw Resource ID (UUID).

  1. Using Short Names (Fastest recommended setup):
    python fetch_datagov.py VillageSHG --api-key YOUR_KEY --format jsonl --workers 5

  2. Using Raw Resource ID (Fallback):
    python fetch_datagov.py d4206736-a28b-4552-8900-7e0c23c707ac --api-key YOUR_KEY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant