No description
  • Rust 94.2%
  • Python 5.8%
Find a file
2026-03-31 14:45:38 +02:00
.github/workflows Add github workflows 2025-05-11 16:11:10 +02:00
.idea Fix long living Database Connections being killed. 2025-07-13 11:06:26 +02:00
docker bring up to date 2026-03-25 17:27:43 +01:00
src add posted variant 2026-03-31 14:45:38 +02:00
tools bring up to date 2026-03-25 17:27:43 +01:00
.dockerignore make runner run again 2025-02-20 17:21:59 +01:00
.gitignore ran cargo fmt 2025-06-07 20:53:46 +02:00
Cargo.lock add posted variant 2026-03-31 14:45:38 +02:00
Cargo.toml bring up to date 2026-03-25 17:27:43 +01:00
compose.dev.yml changed dockerfile 2025-02-23 13:20:32 +01:00
deny.toml Updated dependencies 2024-02-22 23:06:02 +01:00
Dockerfile.bot bring up to date 2026-03-25 17:27:43 +01:00
LICENSE Create LICENSE 2023-06-25 11:37:18 +02:00
LICENSE-APACHE Updated dependencies 2024-02-22 23:06:02 +01:00
LICENSE-MIT Updated dependencies 2024-02-22 23:06:02 +01:00
README.md add Readme.md 2026-03-31 14:45:22 +02:00

FIA Docs Scraper

This Application scrapes the FIA Website for new Events and Documents, it downloads the Documents and uploads them onto an S3 bucket, it also makes screenshots of every page of the Document and uploads them onto S3 as well.

Written in Rust and fully dockerized for easy deployment.

Setup

Configuration

This application will be configured using Environment Variables. If available the app will try to load a .env file.

Here is an example .env:

DATABASE_TOKEN=<Libsql JWT>
S3_ACCESS_KEY=<S3 Bucket Access Key>
S3_SECRET_KEY=<S3 Bucket Secret Key>
SENTRY_DSN=<Sentry DSN for Error reporting>

Docker

This App is generally speaking used together with the FIA-Docs-Bot.

Here is a docker-compose.yaml that encompasses

  • A Libsql Database
  • This FIA-Docs-Scraper
  • The FIA-Docs-Bot
services:
  libsql-database:
    image: ghcr.io/tursodatabase/libsql-server:latest
    volumes:
        - ./data/libsql:/var/lib/sqld
    container_name: database
    env_file: ./.db.env
  discord-bot:
    image: ghcr.io/markustheort/fia-docs-bot:latest
    container_name: fia-docs-bot
    restart: unless-stopped
    env_file: ./.bot.env
    environment:
      - DATABASE_URL=http://database:8080
    depends_on:
     - libsql-database
  docs-scraper:
    image: codeberg.org/mto/fia-docs-scraper:latest
    container_name: fia-docs-scraper
    restart: unless-stopped
    env_file: ./.scraper.env
    environment:
      - DATABASE_URL=http://database:8080
    depends_on:
      - libsql-database
    stop_grace_period: 30s

Technologies used

  • Rust with Tokio for async runtime.
  • Imagemagick for Document screenshots
  • S3-Compatible Storage for screenshot and mirror host.
  • Libsql (SQLite) for Data storage.
  • Sentry for error reporting.

License

This project is licensed under the MIT or Apache License. (see MIT License, see Apache License)