This is a command line tool that allows data scientists and researchers to efficiently download all data for a single docket from the public AWS Open Data S3 bucket mirrulations.
- Downloads all text and (optionally) binary data for a given docket.
- Shows live progress and ETA.
- Does not require AWS credentials (uses public/unsigned access).
- Python 3.9+
The setup.py works with pip to create a command mirrulations-fetch. It is recommended that you create a virtual environment and install locally:
python3 -m venv .venv
source .venv/bin/activate
pip install .mirrulations-fetch <docket_id> [OPTIONS]<docket_id>: The docket ID (e.g.,DEA-2024-0059)
--output-folder <target>: Target output folder (default: current directory)--include-binary: Include binary data in the download (default: off)--no-comments: Skip comments and derived-data (comment-related data only)
Download all data for docket DEA-2024-0059 from the DEA agency into the current directory:
mirrulations-fetch DEA-2024-0059Download including binary data, into a custom folder named mydata:
mirrulations-fetch DEA-2024-0059 --include-binary --output-folder ./mydataDownload docket and documents only (no comments or derived-data):
mirrulations-fetch DEA-2024-0059 --no-commentsThe downloaded data will be organized as follows:
<output-folder>/
<docket_id>/
raw-data/
docket/
documents/
comments/
binary-<docket_id>/ # (if --include-binary)
derived-data/
<all derived data folders and files>
This project is licensed under the MIT License.