Nerdy Articles

Backup Strategy with Restic and Healthchecks.io

Henning — Sun, 30 Jun 2024 17:32:27 GMT

Introduction

Welcome back! In my previous articles, we set up a Debian server and took a deep dive into Docker. By now, you probably have your first services up and running. Before we explore more advanced possibilities, we need to talk about backups.

My home lab is a mix of production services that I rely on daily and a playground where I test new things. You might say I should have a separate environment, and you wouldn’t be wrong. But for me, testing new stuff means weaving it into my daily routines, so I prefer the luxury of my prod environment.

To recover from any mistakes or issues, I have a solid backup routine in place. What's even better is that it's completely hands-off and will notify me of any problems. To achieve my hands-off backup approach, I rely on a trio of powerful tools: Proxmox Backup Server, Restic, and Healthchecks.io. Proxmox Backup Server handles the backups of my virtual machines and containers, providing a robust and efficient solution. Restic takes care of file-level backups with its simplicity and reliability. Meanwhile, Healthchecks.io keeps a watchful eye on everything, ensuring I'm notified if anything goes wrong. This combination ensures my backup routine runs smoothly without requiring constant attention.

3-2-1 the Backup Gold Standard

Now, let’s discuss the gold standard of backups: the 3-2-1 rule.

3 stands for three different copies, meaning your data should exist in three copies. This ensures that even if one copy fails, you still have two copies to fall back on.

2 stands for two different storage devices. This means your data should be on at least two different physical devices. Some argue these should even be different device types, but I think two different disks, for example, are good enough!

1 stands for one off-site backup. You absolutely should have one copy/backup stored at a different location in case of flood, fire, or any other disaster that could compromise your main site.

In other words, you need:

The original data (Copy 1).
One backup copy (Copy 2).
Another backup copy (Copy 3).

So, you have the data stored in three locations in total:

Original Data: For example, on your primary computer or server.
First Backup: On an external hard drive, another server, or a different type of storage media in your server.
Second Backup: In the cloud or on an offsite server, providing additional protection against local disasters.

Overview of My Approach

I have a setup at home that includes my main Docker server running on a Debian VM within Proxmox, a NAS (TrueNAS Scale), and a Raspberry Pi. Additionally, I have a 13-year-old laptop running Proxmox Backup Server (PBS) located at my parents' house, which is connected to my network via VPN.

Here's how my backup strategy works: every night, my data on the NAS is backed up onto an additional HDD mounted in my Debian VM using Restic. This also applies to other data on various VMs, the Raspberry Pi, and my Docker directory on the Debian VM, which gets backed up additionally at noon. This serves as the first line of defence.

Moreover, another nightly backup runs using Restic to create a backup on my offsite laptop, with Docker data backed up again around noon. This ensures an offsite copy of my data, adhering to the 3-2-1 backup rule.

My Proxmox VM backup is only backed up once a night onto my offsite laptop. While this doesn’t comply fully with the 3-2-1 rule, I’m okay with it because, in 99% of cases, I would rebuild the VM, and the data is already covered by Restic.

As a side note, I don't follow the best practice of stopping Docker containers or databases before they are backed up. My backups are incremental, and I keep numerous versions. It would be extremely unlucky for all these versions to be corrupted. Also, very important services run their inbuilt backup jobs creating files / zips, which are then picked up in the Restic jobs automatically.

It's crucial to understand your own design decisions and risks. Only then can you properly weigh the risks of your decisions! If you’re not completely confident that you have everything covered or weighed properly, you should follow the standard rules. This ensures you have a solid foundation and peace of mind that your data is safe and recoverable.

Restic - Backups done right!

Let’s now dive into setting up Restic, getting your first backup job running, and integrating it with Healthchecks.io. Once we’re done, you’ll have a seamless, automated backup system that you can set and forget!

Personally, I rely on SFTP for transferring my backups due to its robust security features. However, if you're exploring other options, you can refer to the link here for alternatives. Depending on your setup, you may need to make adjustments if you're not using SFTP like I do. It's always good to explore different methods to find what best suits your needs for backup management.

In this article "Target" refers to the location where we want to push our backup. For the purposes of this article, I assume you have another server (whether on-site or off-site does not matter) to which you are transferring the backup via SFTP. "Source" refers to the server that holds the data we want to back up.

SSH Config for root (for SFTP Backups)

Before we can initialize the target using SFTP, we need to ensure we have SSH access to the target. If you followed my Debian article, we only allow key-based authentication. Additionally, I always run all my backups as root to avoid any file access issues. With running the backup I mean executing the script on our source system. This also means we need to ensure our root user has SSH access to the target.

If root does not yet have an SSH key, you'll need to generate one and then copy the public key to the target device where you want to set up Restic with SFTP.

sudo ssh-keygen -t ed25519 -a 100

Now that we have our SSH key ready to go and assuming your target location is securely configured (so no ssh-copy-id shortcut here!), let's manually grab that public key and add it to the authorized_keys file.

sudo cat /root/.ssh/id_ed25519.pub

Copy the value of the SSH public key. Next, use ssh to connect to the target machine. Once in, proceed to edit your authorized_keys file and paste the public key into it. This will allow you to authenticate securely using SSH keys and yes we are editing our users authorized_keys file and not the root one, as we do not need root access on the target device.

Now, test your SSH connection as you normally would, but this time prefix it with sudo to make sure it works from the root user as well. This ensures you can perform administrative tasks securely and manage your system effectively through SSH.

Setting up Restic

First, we need to install Restic on all devices we want to back up from. The target location does not need Restic installed!

sudo apt update
sudo apt install restic

As part of our Restic setup, we'll create a restic_password file in our home directory. This file will securely store the password used for encrypting and decrypting our Restic backups. It's a crucial step to ensure that our backup process remains secure and our data stays protected. We also use a file-based password to avoid embedding the password directly into the bash script. This adds an extra layer of security, ensuring the backup password remains protected while keeping our script clean and secure.

touch ~/restic_password

To ensure that only the root user and ourselves can access the restic_password file, we need to adjust its permissions accordingly. We'll set it up so that it's readable and writable only by these users.

chmod 640 ~/restic_password

Now, you can edit the restic_password file and enter your chosen password. For better security, consider randomly generating a password or simply mashing keys on your keyboard. Remember, this file should only contain your password and nothing else and also remember to take a note of the used password!

Initialising Restic Repository

Alright, with Restic installed and our password securely stored, the next step is to initialize our target location. This involves creating a repository which will serve as our backup store. This repository will hold all our encrypted backup data, ensuring it's safe and accessible for future restores.

sudo restic -p /home//restic_password -r sftp://@:/// init

Let's break down the restic command for initializing a repository step by step:

-p /home//restic_password: This flag specifies the path to the file containing the Restic password.
-r sftp://@:///:
- -r: This flag indicates the repository location.
- sftp://: Specifies the use of the SFTP protocol.
- : Your username on the remote server.
- : The IP address or hostname of the remote server.
- : The port number (typically 22 for SFTP, but we always change this!).
- ///: The target location on the server where backups will be stored. Adjust to your preferred name for the backup repository.
init tells restic to initialise the repository.

Congratulations! You've successfully initialized your first backup. Personally, I prefer using separate repositories for different backups, but you can also use a single repository if you prefer. This approach is particularly useful when you have duplicate files, as Restic automatically deduplicates files within the backup, saving storage space and simplifying management.

Running the First Backup

Now, let's kick off the initial backup manually before we proceed to script it and schedule it using cron.

sudo restic -p /home//restic_password -r sftp://@://home// backup /

We already know most of the command. Simply replace init with backup and include the path to the location you want to back up. This will initiate the backup process for that specific directory or file.

Creating the Backup Script

Next, we'll create a backup script that we can call from cron. Using a script allows us to organize multiple commands efficiently and easily adapt it for backups from other locations or servers by making slight adjustments.

We start by running restic unlock to ensure that the repository isn't locked due to a previous interrupted operation or another process accessing it. This command releases any locks and prepares the repository for the next operation, which is crucial for maintaining the integrity of our backups.

Next, we perform the restic backup command to initiate the backup process itself. This command takes the specified data and securely transfers it to the repository we've configured, ensuring our data is safely stored and encrypted.

After completing the backup, we execute the restic forget command. This command manages the retention policy of our backups by removing older snapshots according to specified criteria, such as keeping a certain number of recent backups (--keep-last) or retaining backups within a specific time frame (--keep-hourly, --keep-daily, etc.). This helps optimize storage usage while ensuring we retain relevant backup snapshots for future recovery needs.

#!/bin/bash

# Define variables
RESTIC_PASSWD="/home//restic_password"
BACKUP_SOURCE="/"
BACKUP_REPO="/"
KEEP_OPTIONS="--keep-hourly 2 --keep-daily 6 --keep-weekly 3 --keep-monthly 1"

# Perform Restic unlock and capture output
restic -p $RESTIC_PASSWD -r $BACKUP_REPO unlock

# Perform Restic backup
restic -p $RESTIC_PASSWD -r $BACKUP_REPO backup $BACKUP_SOURCE

# Perform Restic forget
restic -p $RESTIC_PASSWD -r $BACKUP_REPO forget $KEEP_OPTIONS --prune --cleanup-cache

Let's examine the script above and discuss a few points. I'm using variables to make it easier to modify the script without having to edit lengthy commands each time.

Variables:
- RESTIC_PASSWD: Path to the file containing the Restic password.
- BACKUP_SOURCE: Location on your system to be backed up.
- BACKUP_REPO: Path where the backup will be stored, including the name of the backup repository. This includes the SFTP connection details in my case.
- KEEP_OPTIONS: Specifies the retention policy for the backups, including how many hourly, daily, weekly, and monthly backups to retain. I am keeping 2 hourly backups, the backups of the last 6 days, then 3 weekly versions and one monthly one.
Restic forget command:
- restic -p $RESTIC_PASSWD -r $BACKUP_REPO forget $KEEP_OPTIONS --prune --cleanup-cache
  - Initiates the forget operation in Restic, which manages the retention policy of backup snapshots.
  - --prune: Removes unreferenced data from the repository to free up storage space.
  - --cleanup-cache: Cleans up any temporary cache files used during the backup and forget operations.

Let's save this script somewhere. Assuming you are saving it in your home directory, we'll name it restic_data.sh.

Scheduling the Backup

Now, we can add this script to cron. We'll do this using the root user to ensure that all files can be accessed during the backup process.

sudo crontab -e

Let's assume we want to schedule the backup to run at midnight and noon every day. Here's the line you would add to the end of your root's crontab:

0 0,12 * * * sh /home//restic_data.sh

Now that you have a scheduled backup running, the next step is ensuring you have monitoring in place to track job statuses and failures. Before setting up and integrating Healthchecks.io into our script, let's quickly review how to perform a restore from a Restic backup.

Restoring using Restic

Restoring using Restic is straightforward. Simply create a folder on your device where you want to restore file(s) to, mount the Restic repository, and then browse the backup just like any other mounted directory on your system. This makes accessing and recovering files from backups simple and efficient.

I usually have a folder called RESTORE in my home directory where I mount my repositories.

mkdir ~/RESTORE

Let's mount the repository we created and backed up to earlier. Keep in mind that this terminal session will be occupied serving the mount, so you'll need another terminal session to browse the files. If you're looking to restore larger amounts, consider exploring the restore option provided by Restic for more efficient recovery.

restic -p /home//restic_password -r sftp://@://home// mount ~/RESTORE

Now that you've mounted the backup repository, you can use another terminal session to browse the files and start copying anything you need. Remember, Restic mounts are intentionally read-only to ensure the integrity of your backups. This setup allows you to access your backups securely and retrieve specific files as required.

Comply with the Rules - Another Backup Location

As we discussed initially, implementing two backup jobs per source (one of which should be off-site) ensures compliance with the best practice of the 3-2-1 backup strategy. This step is straightforward since you've already done it once. Simply select another location for the backup, go through the initialization process, connect to it via SFTP if it's a remote location (not on the same device), copy the backup script, and slightly adjust it to reflect the new destination.

I recommend scheduling this backup to run at a different time to minimize potential issues with the source system. This staggered scheduling helps avoid conflicts and ensures continuous, reliable backups.

💡

Side note: When backing up from different sources to the same Restic repository, you can run all of these backups simultaneously. Restic is capable of handling multiple backup jobs at once, ensuring efficient and concurrent data protection without issues.

Regular Manual Spotchecks are Mandatory

Next, we will set up monitoring, but remember to periodically perform a manual test restore. When I started using Restic, I did this every few months; now, I do it about once a year. Some even wipe out their backups annually and run a complete fresh backup.

The main point is this: your backup is your last resort when something goes wrong. Don’t put yourself in a situation where you've never tested a restore and then find out it doesn’t work when you need it most!

Healthchecks.io - Monitoring our Backup Jobs

Alright, we've done the heavy lifting to ensure we have a backup running. While it's tempting to consider the job done, it's crucial to always know whether your backups are successful. The worst-case scenario (something many individuals and companies have experienced) is needing to restore data only to discover that the backup process stopped working months ago without anyone noticing.

In this next part, we are going to set up the Healthchecks.io Docker container, configure monitoring for our backup job, and adjust our backup script to report back to Healthchecks. As a bonus, we will also log the backup time and send the logs to Healthchecks, which will help us analyse any issues should they occur.

Healthchecks.io Docker Setup

Setting up the Healthchecks.io Docker container is quite straightforward. I will not go into too many details about the different variables being used. After my Docker 101 article, I trust that you should be able to understand the following compose file. I would also highly encourage you to check out their list of variables and add whatever you need.

networks:
  frontend:
    name: frontend
    driver: bridge

services:
  healthchecks:
    container_name: healthchecks
    image: lscr.io/linuxserver/healthchecks:latest
    restart: unless-stopped
    networks:
      - frontend
    ports:
      - 8000:8000
    security_opt:
      - no-new-privileges:true
    environment:
      PUID: $PUID
      PGID: $PGID
      TZ: $TZ
      ALLOWED_HOSTS: 
      SITE_ROOT: http://:8000
      SITE_NAME: Healthchecks
      SUPERUSER_EMAIL: $EMAIL
      SUPERUSER_PASSWORD: $SUPERUSER_PASSWORD
      DEBUG: 'False'
      DEFAULT_FROM_EMAIL: $EMAIL
      EMAIL_HOST: $SMTP_HOST
      EMAIL_PORT: 587
      EMAIL_USE_TLS: 'True'
      EMAIL_HOST_USER: $EMAIL
      EMAIL_HOST_PASSWORD: $EMAIL_HOST_PASSWORD
      SECRET_KEY: $SECRET_KEY
    volumes:
      - $DOCKERDIR/healthchecks/config:/config

If you have followed my Debian article, you have implemented the UFW firewall. Remember to open port 8000 (or your chosen port) in the firewall as well!

Setup a Check in Healthchecks.io

Now that Healthchecks.io is running, we can set up our first monitoring check, which we will then integrate into our bash script.

Once logged in, you'll see there's already a check named "My First Check" that we're going to edit now. To do so, click on the three dots on the right side.

The first thing I did was change the name and add a short description. Next, I adjusted the schedule. Healthchecks.io needs to know when this backup is scheduled to run. I typically set a grace period of one hour, meaning the backup job must report to Healthchecks.io within an hour of its planned start time. If it doesn't, an email notification is sent to us. If the backup reports in after the grace period, indicating that everything is fine again, we'll receive another email notification.

Before we proceed to our script, let's quickly verify that the email notification is working. Click on "Integrations" at the top, then click on "Test!" for our already configured mail integration.

Lastly, you'll need to return to your check and obtain the "Ping URL". I personally prefer using the UUID ping URL and do not use slugs altogether. Using slugs allows you to create checks from outside, which I do not leverage.

Backup Script with Healthchecks.io Integration

Full disclosure, I'm not an expert in creating advanced bash scripts and relied on ChatGPT to assist me (which worked extremely well). I understand the process, but I wouldn't be able to create this without assistance. There are likely more elegant or sophisticated methods available, but this solution works well for me and I'm satisfied with it.

Here is the end result of my script:

#!/bin/bash

# Define variables
RESTIC_PASSWD="/home//restic_password"
BACKUP_SOURCE="/"
BACKUP_REPO="/"
WEBHOOK_URL="http://192.168.10.199:8000/ping/ec0b8555-5611-4581-a082-5d7183187f37"
KEEP_OPTIONS="--keep-hourly 2 --keep-daily 6 --keep-weekly 3 --keep-monthly 1"

# Create a temporary file
TMP_FILE=$(mktemp)

# Initialize variables
FAILURE=0

# Sending start ping to measure runtime
curl --retry 3 --retry-max-time 120 $WEBHOOK_URL/start

# Function to run a command and capture its output to a temp file
run_command() {
  local CMD="$1"                      # Takes the first argument as the command to be executed
  echo "Running: $CMD" >> "$TMP_FILE" # Logs the command being executed to $TMP_FILE
  $CMD >> "$TMP_FILE" 2>&1            # Executes the command and redirects both stdout and stderr to $TMP_FILE
  if [ $? -ne 0 ]; then               # Checks the exit status of the command
    # Sets FAILURE variable to 1 if the command failed
    # Passes TMP_FILE content into OUTPUT variable
    # Sends ping to Heathchecks with the failure code
    # Deletes the TMP_FILE
    # Exits the bash script
    FAILURE=1
    OUTPUT=$(cat "$TMP_FILE")
    curl -fsS -m 10 --retry 3 --retry-max-time 120 --data-raw "$OUTPUT" $WEBHOOK_URL/$FAILURE
    rm "$TMP_FILE"
    exit 1
  fi
  echo -e "\n" >> "$TMP_FILE"         # Adds a newline to separate commands in $TMP_FILE
}

# Perform Restic unlock and capture output
run_command "restic -p $RESTIC_PASSWD -r $BACKUP_REPO unlock"

# Perform Restic backup
run_command "restic -p $RESTIC_PASSWD -r $BACKUP_REPO backup $BACKUP_SOURCE"

# Perform Restic forget
run_command "restic -p $RESTIC_PASSWD -r $BACKUP_REPO forget $KEEP_OPTIONS --prune --cleanup-cache"

OUTPUT=$(cat "$TMP_FILE")

# Determine status based on the FAILURE flag and send webhook
curl -fsS -m 10 --retry 3 --retry-max-time 120 --data-raw "$OUTPUT" $WEBHOOK_URL/$FAILURE

# Clean up temporary file
rm "$TMP_FILE"

Let's break down the bash script and go through each part. In the beginning, we have our familiar variables set up, with the addition of WEBHOOK_URL to hold our Ping URL. The script uses mktemp to create a temporary file and stores the path and filename in TMP_FILE. Another variable, FAILURE, is initialized to 0, which conventionally indicates a success code in bash scripting. This sets the stage for the subsequent operations in our backup monitoring script.

Next, we establish the initial connection to our Healthchecks container. To start the clock of the runtime of our backups, we append our Ping URL with /start. This allows us to send our start ping using the curl command, configured to retry up to 3 times over a max of 120 seconds if it fails to send the "ping".

Now, we're tackling the most complex part. We've created a function to execute our various commands. This function takes a command as an argument, logs its execution to our temporary file, and monitors its exit status. If the exit status indicates failure (not 0 for success), we update the FAILURE variable with the exit code, capture the contents of TMP_FILE in the OUTPUT variable, and send a Healthchecks.io ping containing both the OUTPUT and exit code. We tidy up behind ourselves and delete the TMP_FILE. Subsequently, the bash script execution is terminated as soon as one of the commands fails. This triggers an email notification from Healthchecks.io, which includes the logs of the run, aiding in troubleshooting and analysis.

In the final section, you'll notice our familiar three Restic commands now preceded by run_command, which invokes our custom function. This triggers the function to execute each command and monitor their exit statuses. As discussed earlier, the function captures the output of these commands in TMP_FILE. Afterwards, we assign the content of TMP_FILE to the OUTPUT variable. Following this, we send a Healthchecks ping containing the log content and the exit code (since FAILURE variable was initialized to 0 and no command failed, we send success code 0). Finally, the script cleans up by removing the temporary file.

When you open the Healthchecks check, you'll find the events listed on the right-hand side. Initially, you should see the runtime logged. If you click on the last event, hopefully marked as "OK," you'll also find the logs from our latest run displayed. This provides a detailed record of the script's execution and its outcome within the monitoring interface.

Conclusion

Congratulations, you now have a fully monitored backup setup using Restic and Healthchecks.io. The worst-case scenario now would be if your machine running Healthchecks.io goes down, resulting in you not receiving failure notifications. However, in my case, this would include essential services on my main server going offline, which I would definitely notice. To mitigate the risk of only the Healthchecks container going down, it's advisable to monitor the container itself using tools like Uptime Kuma, although this isn't covered in this article. It is also worth to mention, that you can use their hosted service and monitor 20 jobs for free!

TrueNAS Scale Backups with Restic

As a NAS user running a self-built small server with TrueNAS Scale bare metal, Restic isn't currently supported for backups. However, this limitation shouldn't deter us from utilizing Restic to back up data from our NAS.

In this section, we're focusing solely on the steps to set up a Restic-based backup script on TrueNAS Scale. The script itself remains largely unchanged. I'll also assume that you're familiar with navigating TrueNAS Scale.

Setting up Restic on TrueNAS Scale

To run backups using Restic on TrueNAS Scale, we need to download the Restic binary and make slight modifications to our existing backup script. To simplify this process, I created a new network share and stored both the Restic binary and backup script there.

Go to the releases page of Restic on GitHub and download the version ending with "*_linux_amd64.bz2". Once downloaded, unpack this file and place it on the network share or any preferred file location on your NAS. This will provide the Restic binary needed for the backup operations. Personally, I also rename the Restic binary file to just "restic" on my NAS. This helps because the binary has an update functionality, and I prefer not to have the version number in the filename.

Copy your backup script to the NAS (for me that would be onto the newly created network share). Adjust the three Restic commands in the script as follows and update the variables at the top accordingly (don't forget to setup a new check on Healthchecks and to update the WEBHOOK_URL). Ensure that each Restic command is prefixed with the full path to the Restic binary. For example, on my NAS, it would look like this (where restic is the actual binary we downloaded):

run_command "/mnt/nas/scripts/restic -p $RESTIC_PASSWD -r $BACKUP_REPO unlock"

I mentioned the update functionality of the Restic binary. You can navigate to the location of the binary in the terminal on your NAS (or specify the filepath) and execute the following command:

sudo restic self-update

This command updates the Restic binary to the latest version available, ensuring you have the most up-to-date features and security patches.

Scheduling the Backup

Now that we have Restic setup on TrueNAS Scale along with the adjusted backup script, the final step remaining is to schedule our backup.

Log into the TrueNAS Scale web interface, navigate to "System Settings" > "Advanced," and then click on "Add" in the "Cron Jobs" section. This is where we'll schedule our backup script to run automatically at specified intervals.

These are the settings I have set for my schedule:

Conclusion

As you can see, setting up a Restic backup on TrueNAS Scale follows a similar process to setting it up on a "normal" Linux machine. The main difference is that instead of installing Restic, we use the binary, and the cron job is configured through the web interface rather than the command line.

Proxmox Backups with Healthchecks.io

For this section, I'm assuming you have a Proxmox machine running and a Proxmox Backup Server already set up. Configuring backups is straightforward; you set the schedule and choose which VMs or LXC containers to back up from the Datacenter level, with retention policies managed on the Proxmox Backup Server.

The more complex part arises if we want to integrate with Healthchecks.io. This requires creating a bash script to handle calls to our Healthchecks container and implementing it as a hook script. Currently, I haven't found a way to send logs directly to Healthchecks from Proxmox Backup jobs, but for me, my primary backup focus remains on my Restic jobs; VM backups serve as an additional failsafe, so I can live with this.

Hook Script

Alright, let's proceed to create a hook script on your Proxmox machine, not the Proxmox Backup Server. I'll assume you'll also stored it in your home directory and named it "proxmox_backup_hook.sh". Let's continue from here.

Here is my hook script:

#!/bin/bash

# Base URL for the webhook
WEBHOOK_URL="http://192.168.10.199/ping/c515d5cd-c7f0-4f2f-8593-d00d162705d7"

case "$1" in
    job-start)
        # Send a ping to the webhook when the backup starts
        curl -fsS --retry 3 --retry-max-time 120 "${WEBHOOK_URL}/start"
        ;;
    job-end)
        # Determine the exit status
        STATUS=$2

        # Send a ping to the webhook when the backup finishes, appending the status code
        curl -fsS --retry 3 --retry-max-time 120 "${WEBHOOK_URL}/${STATUS}"

        ;;
    *)
        # Other cases can be handled here if needed
        ;;
esac

exit 0

Let's take a quick look at what's happening in the script. We have the WEBHOOK_URL variable set again (remember to set up a new check on Healthchecks). The script then monitors the status of the backup job, specifically looking for "job-start" and "job-end" statuses.

When the script detects a "job-start" status, it sends a start ping to Healthchecks to start the clock of the runtime of the backup. Once the backup completes and the "job-end" status occurs, another ping is sent to Healthchecks.io. If the backup was successful, the STATUS variable will be 0, indicating success. Any other value will indicate a failed job, triggering a notification.

Implementing the Hook Script

The final step is to implement the hook script. Since there's no option to add a hook script directly in the Proxmox UI at the moment, we'll need to SSH into our Proxmox machine and edit the backup job configuration there. This allows us to integrate our custom hook script into the backup process.

Edit this file:

sudo nano /etc/pve/jobs.cfg

If you have multiple backup jobs configured and want to monitor each one with a hook script on Proxmox, you'll have to create specific scripts for each job. This ensures that pings are sent to their designated checks on Healthchecks.io. To implement the hook script, add the following line to the configuration of each job you want to monitor using a hook script:

script /home//proxmox-backup-hook.sh

⚠️

If you're using a personal user account to log into the Proxmox UI, you won't be able to manually start a backup job after adding the hookscript line. This setup will throw an error indicating that only the root user can execute it. To manually start the backup job from the Proxmox UI, you'll need to log in using the root user account. This ensures you have the necessary permissions to manage and execute backup jobs as needed.

Final Conclusion

Now you're all set! Ensure you've configured backups for everything critical to you and always maintain two copies (one local and one off-site). Personally, I backup all my Docker volumes without stopping them, which has the potential of file corruption. I schedule these backups at different times and retain multiple iterations to revert to a working version if needed. For highly critical data like Paperless, I use their built-in backup procedures as an extra layer of security.

Ideally, we never need these backups, but having multiple fail-safes keeps my mind at ease. As mentioned earlier, I also enjoy tinkering with the environment to explore new possibilities and I feel confident to be able to restore anything I potentialy break.

I hope you found this article enjoyable and helpful in setting up a hassle-free, forget-about-it method for running and monitoring your backups.

Docker 101

Henning — Fri, 31 May 2024 22:29:57 GMT

Introduction

Welcome back, fellow self-hosting enthusiasts! In our previous journey together, we delved into the details of my self-hosted setup, took an in-depth look at Paperless, one of the many apps I run, and navigated through the basic configuration of Debian. Building upon that Debian Server foundation, we're now ready to embark on an exciting new chapter: setting up our Docker environment.

Before we plunge into the nuts and bolts of Docker installation, usage, and the magic of Docker Compose, let's take a moment to demystify Docker itself. I remember, about five years ago, spending days trying to wrap my head around Docker's concepts. It was a challenging yet enlightening journey, and with this guide, I aim to streamline your learning process, hopefully making it less daunting and more approachable.

I firmly believe that Docker, or containerization in general, is an exceptional choice for self-hosting and home labs. Self-hosting, for me, isn't just about stepping away from the clutches of big corporations; it's also about the joy of experimentation, tinkering, trying new things, and continuous learning. Docker empowers me to do all this without risking the integrity of other services or the stability of my server setup.

So, let's get ready to dive into the world of Docker, where you'll discover not only a powerful tool for your self-hosting arsenal but also a playground for your tech curiosity.

Docker Demystified: Understanding Containerization vs. Virtualization

Containers have significantly transformed the IT world in recent years and they are a superb choice for self-hosting endeavors. But what exactly do containers do? To understand this, it's helpful to start by looking at traditional virtualization.

Virtualization

Imagine you have a Debian server, and it's operating as a virtual machine on your host system (the physical server). In this setup, your virtual machine creates its own isolated area, complete with its own operating system, all while utilizing the underlying hardware of your host system. If we would want to run 15 different applications in isolation to each other, we could create 15 different vms.

Virtualization certainly has its perks. It allows you to run multiple, different operating systems on the same host system, which can be invaluable in certain scenarios. However, if your needs mostly revolve around running the same operating system for various applications, this is where the traditional approach to virtualization shows its inefficiency. You're essentially duplicating the same operating system multiple times, creating a significant amount of overhead.

Indeed, we could manage to run multiple applications on a single OS, ensuring they are as isolated as possible, but this would require considerable effort and a solid understanding of Linux. Plus, let's not forget, self-hosting for me is as much about exploration as it is about practicality. My testing often involves running something in a production-like environment for a while to assess its functionality, blurring the lines between tinkering and actual operation.

Containerization

To effectively utilize containers, an operating system is a prerequisite. In my setup, I use Debian, which itself operates as a virtual machine on my host system. To run containers within Debian, Docker is installed as the runtime environment. Unlike traditional virtual machines that require their own full-fledged operating systems, Docker containers share the underlying operating system of our Debian server. This shared use of the host OS is a key difference. While a VM encapsulates a complete OS, Docker containers only encapsulate the application and its dependencies, making them significantly more lightweight and efficient. This approach maximizes resource utilization and allows for a more streamlined and scalable deployment of applications.

The following image from Docker.com aims to visually represent this difference:

Source: docker.com

Pros & Cons of Docker for self-hosting

Docker is a great choice for self-hosting, offering a range of benefits that we'll explore. However, it's equally important to be aware of the potential downsides. Let's examine the pros and cons of using Docker in a self-hosting environment.

Pros

Isolation: Containers operate in a way that keeps each application separate from the rest of your server. This allows you to run numerous applications on the same machine while preventing them from interfering with each other. Additionally, it becomes incredibly simple to remove an application you no longer need, leaving no residual impact on your system.
Resource Efficiency: Containers utilize the kernel of the host system, making them more efficient and less resource-intensive than virtual machines. This enhanced efficiency leads to a more effective use of server resources, an advantage especially useful in self-hosting setups where resources might be limited.
Security: Due to their isolated nature, Docker containers provide enhanced security against malicious applications, as they have restricted access to the host system, limiting potential harm.
Portability: One of the standout features of containers is their portability. They are easily transferable and can be restored across different operating systems. This flexibility allows for effortless migration to different systems with minimal hassle in terms of configuration adjustments. As long as the target system supports the Docker environment, you can shift your containers to it, although some minor changes might still be necessary (like file system path).
Moreover, Docker Compose helps by documenting your configuration directly in the compose file, making management and re-deployment simpler.
Community: A key factor in my continued use of Docker has been its vast and active community. Whenever I encounter problems, solutions are typically just a Google search away, thanks to the extensive documentation and user forums. For someone who isn't an expert, this level of community support and resource availability has been crucial.

Cons

Security: Docker and containerization don't address all security concerns. Ensuring tight access controls, safeguarding sensitive data, and using trusted container images are essential. Surprisingly, Docker operates differently from system-level firewalls like UFW and bypasses it. To maintain security, you may need to implement firewall rules at a higher level, separate from the Docker host.
Complexity: The learning curve for Docker can be steep. It requires understanding several distinct concepts before you can use it effectively. In this guide, I aim to introduce you to the basics, hoping to significantly ease this learning process.
Persistent Data Management: One aspect that often requires careful consideration is persistent data management. Containers are stateless, meaning they don't retain data between restarts. To address this, we need to map so called volumes from our host system into the container. This allows us to store configurations, files, and share data between the host and the container. We'll delve deeper into this topic shortly.
Networking Complexity: Networking in Docker adds another layer of complexity that requires a solid understanding. By default, Docker creates its own network, and containers within it are not accessible from outside that network. To make containers accessible externally, additional configuration is necessary. Moreover, Docker offers various network types, each with its own pros and cons. I'll explain my choice later in the article.
Resource Overhead: Although I highlighted resource efficiency as a pro, it's essential to note that Docker isn't entirely free from resource overhead. While it's more efficient than traditional virtualization, some level of overhead is still present. In resource-constrained environments or for resource-intensive applications, this overhead can potentially impact performance. Furthermore, certain workloads, especially those with high I/O demands, might not achieve optimal performance in containerized environments.

Conculsion

Docker proves to be an excellent choice for self-hosting, provided you can navigate its complexities. In my experience, the effort has been entirely worthwhile. The foundational system setup I've created hasn't changed in over five years. Docker allows me to effortlessly introduce new applications, expose them to the internet with just a line or two of configuration, revert them to internal use, or remove them entirely without leaving system configuration clutter behind. However, it's essential to remember that while Docker can handle a wide array of tasks, it's not a universal solution. It excels in many scenarios, but like any tool, it has its limitations.

Docker basics

In this section, we'll unravel the mysteries of Docker, laying down the essential groundwork that will empower you to master Docker and unlock its full potential for your self-hosting adventures.

Images

At the core of Docker lies the concept of images, which serve as the foundation for your containers. These images encapsulate all the essential information, files, and often configurations required for an application to operate seamlessly. You can either build your own custom images (although I won't cover that in this article) or effortlessly download pre-configured ones from repositories. While Docker Hub is the most renowned repository, alternatives like GitHub (with image URLs starting from ghcr.io) and linuxserver.io (with URLs starting from lscr.io) offer a wealth of options. If you don't plan to upload your own images, there's usually no additional configuration needed, with Docker Hub as the default choice. Images come with versioning, which do not have an enforced standard, but most following a common convention. Understanding this convention is crucial, as it helps determine the scope of changes with each update. Let's look at the a version e.g. 5.2.3:

Major version: The first number (e.g., 5) signifies significant changes, possibly including backward-incompatible alterations or major feature additions. Upgrading the major version might entail additional steps and breaking changes. It makes sense to check the release notes prior an upgrade.
Minor version: The second number (e.g., 2) introduces backward-compatible feature additions, improvements, or enhancements. It typically expands on existing features without breaking compatibility.
Patch version: The third number (e.g., 3) indicates backward-compatible bug fixes, corrections, or minor enhancements. Incrementing the patch version means resolving issues and enhancing the image without introducing new features or breaking changes.

Understanding these versioning principles ensures you choose the right image for your needs and maintain compatibility with your applications.

In addition to versioning, you'll often come across a tag called latest within Docker repositories. When you don't specify a particular image version to use, Docker defaults to latest. I personally find it beneficial to still set latest (when I don't use a specific verison) for my images as a convenient way to identify and reference them in my configuration. The use of latest streamlines management, making it easier to work with images without constantly specifying version numbers. However, it's essential to exercise caution when relying on latest, as it may not always align with your specific requirements or ensure version consistency across your environment.

In my Docker setup, I often use the latest tag for applications. However, I take a more cautious approach with critical elements like databases, authentification service and my reverse proxy. These vital components are assigned specific version tags to ensure stability and compatibility, particularly during updates when I might not be available to address issues promptly. My choice depends on whether I can risk potential disruptions or compatibility problems when updates occur.

It's important to note that during updates, developers typically update not only the specific release tag but also major and minor version tags. For example, if you're using version 5.2.3, it won't auto-update to 5.2.4. However, containers with "latest," "5," or "5.2" tags will receive the update. This flexible tagging approach allows users to choose the level of version granularity that suits their needs.

Additionally, when searching for Docker images in a repository, I prioritize those that offer an alpine version. Alpine images are built on Alpine Linux, known for their minimalistic design, security focus, and resource efficiency. These images are typically smaller in size and consume fewer resources, making them an excellent choice to maintain a lean and efficient Docker environment.

When inspecting a new container, my first step is to examine its tagging strategy. After doing so a few times, it becomes a quick habit that takes just a minute or two.

Networking

Instead of relying on Docker's default network, I prefer to create custom networks for my containers. This practice grants me greater control and enables stricter isolation between containers. Let us have a look at the different options we have to choose from:

Bridge Network: Docker's default network mode that creates an internal private network, allowing containers to communicate with each other using internal IP addresses. This is what I mainly use.
Host Network: Containers share the host's network stack, making them accessible directly on the host's network (most likely your home network). This mode offers optimal network performance but may limit isolation.
None Network: Isolates containers from external networks completely, useful for security and testing purposes.
Overlay Network: Used in Docker Swarm, it facilitates communication between containers across different hosts, creating a virtual network that spans multiple nodes. (not covered in this article)
Macvlan Network: Provides containers with a unique MAC address, allowing them to behave like physical devices on the network, useful for scenarios requiring direct network access. (not covered in this article)
IPvlan Network: Similar to Macvlan, it allows containers to connect directly to physical networks, but with different characteristics suited to specific use cases. (not covered in this article)

The bridge network is the most commonly used network mode in Docker. What's important to understand is that you can create multiple bridge networks to achieve more isolation. Containers can be assigned to one or more of these networks, allowing them to communicate across different networks. We'll delve into setting up specific networks later when we cover Docker Compose.

The host network mode places your container directly on the same network as your host system. Containers using this mode don't receive separate IP addresses; they share the host's network stack. This setup is valuable for scenarios where you want containers to be easily discoverable on the network, such as in smart home setups (e.g., Home Assistant) or media servers (e.g., Jellyfin).

The none network mode serves a valuable purpose in Docker when dealing with applications that don't require network access. In cases where a container performs periodic calculations and writes results directly to disk without the need for external network communication, the "none" network mode is a good choice. While I don't currently have such a scenario in my homelab, it's a useful option for specific use cases that prioritize isolation from the network.

Persistent storage

It's vital to grasp that containers are inherently stateless, meaning they retain only the information shipped with their image. Any configurations, uploads, changes, or data created within a container will be lost unless we implement proper persistence mechanisms. This aspect is crucial to understand in Docker.

In our exploration of achieving persistence, we'll primarily focus on bind mounts. This technique involves mounting a folder or file from your server's directory directly into the container, ensuring that data remains intact across container restarts and updates.

Additionally, it's worth noting that using named volumes provides an even more versatile option. With named volumes, it becomes possible to directly mount an NFS share from your NAS into the container, offering seamless integration with external storage solutions. But this will be for another article.

Setup Docker & Docker Compose

Before we dive deeper into the details of Docker and Docker Compose, let's ensure your server environment is ready to go. In this setup phase, we'll prepare your server to execute and experiment with various Docker commands and code snippets. Whether you're using a physical server, a virtual machine, or a cloud instance, having Docker installed and configured correctly is essential for the next steps. So, let's get your server up to speed and ready to explore the world of Docker.

Installing Docker

Setting up Docker is relatively straightforward. We'll be adhering to the official guidelines provided in the Docker documentation. For your convenience, I'll also outline the essential steps right here in our guide.

Assuming you've followed my Debian guide, we should already be starting from a clean state. However, to ensure everything is set up perfectly, let's take a moment to remove any old or unnecessary components before proceeding with the Docker setup:

for pkg in docker.io docker-doc docker-compose podman-docker containerd runc; do sudo apt-get remove $pkg; done

The next phase involves a few equally important tasks: updating the system, installing essential packages, and adding the appropriate repository to our system. Each of these steps plays a vital role in setting a solid foundation for Docker installation:

sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

It's now time to install the Docker Engine:

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

The official documentation suggests adding your user to the Docker group. However, I recommend against this. Instead, always use 'sudo' for your Docker commands. This maintains an important layer of security on your system.

Let's configure Docker to automatically start on system reboot, ensuring consistent availability of the service.

sudo systemctl enable docker.service
sudo systemctl enable containerd.service

To prevent issues with oversized logs, which I've encountered before, it's important to set up log rotation. To do this, create or modify the file /etc/docker/daemon.json and add the necessary configurations. This will help manage log sizes efficiently.

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "3"
  }
}

By modifying the Docker configuration, logs will be rotated after reaching 100 MB, and only the two most recent versions + active one will be kept, avoiding the issue of unlimited log growth in the default setup.

sudo systemctl restart docker.service
sudo systemctl restart containerd.service

Setting up Docker Compose

We can start Docker containers with the docker run command, but there's a big downside. It's just a command, so we need to remember (or document) everything we've done. Docker containers can need a lot of configuration info, and all of it has to be in the command.

Docker Compose, on the other hand, is the complete documentation of our entire environment. From network configuration, secrets, volumes, and all the details of our applications, it's all in the Docker Compose file(s). Let's install it first!

sudo apt install docker-compose-plugin

To make sure everything is installed correctly, we can check it with:

sudo docker compose version

You should see a response with the Docker Compose version.

Creating our Docker Environment

Now, let's start setting up our Docker environment. We'll go through the configuration step by step.

File Creation

First, we need to create the necessary files. We'll start with the .env file, which stores key information that we'll reuse across containers or keep out of our Compose file for security reasons. Remember, the .env file should never be checked into a repository and must be kept safe.

Next, we'll create the Docker Compose file, which we'll configure shortly. Personally, I like to create a directory /server/docker where I store my environment and persistent data.

touch .env
touch docker-compose.yml # you can also just name it compose.yml

One more hint: remember that YAML (YML) files are very sensitive to spaces. The indentation of the different sections needs to be perfect.

.env file

Think of the .env file as your app's cheat sheet—a plain ol' text file with some serious mojo! It's where we stash all the goodies: settings, passwords, and other juicy bits. Why ".env"? It's short for environment, because it stores your environment variables.

## General
DOCKERDIR=/server/docker
PUID=1000
PGID=1000
TZ=Europe/Berlin

Let's start with the basics:

DOCKERDIR: This is now set to my standard path, /server/docker. Simple and to the point.
PUID: Stores the ID of my user on the Linux server.
PGID: Holds the group ID of my user group on the Linux server.
TZ: Defines the time zone I want to use.

One of the beauties here is the flexibility. If I decide to relocate my stack to a different path or adjust those IDs, it's a breeze—simply update the .env file! There's no need to delve into each of those potential 50 containers.

Just choose any name you like for the variable on the left, add an equal sign, and then the value. Need to handle special characters, like '#'? Simply enclose them in single quotes like this: ''#MyValue".

In your Docker Compose file, you can reference all the variables you've defined in the .env file. Simply type a dollar sign $ followed by the exact name of the variable (the left side of the equals sign).

Networks in Docker Compose

First order of business: setting up our docker networks. Fire up sudo docker network ls and voila! You'll notice there are networks already buzzing around. Now, I'm all about crafting my own networks rather than riding on the default ones. Why? Because I like being crystal clear about what I'm using and keeping the reins tight. Let's dive in!

networks:
  frontend:
    name: frontend
    driver: bridge
  backend:
    name: backend
    driver: bridge

I'm whipping up two networks here. First up, we've got "frontend," a bridge network. Then there's "backend," also a bridge network. Here's my rule of thumb: anything that has a UI gets on the frontend network. Meanwhile, all those behind-the-scenes supporting services? They're on the backend network. From a security standpoint, this setup won't exactly turn Fort Knox green with envy. If you're serious about security, you'll wanna go the extra mile, creating a network for each service and its entourage to max out that isolation level.

Creating our Services

Let's begin setting up some services, focusing on the Docker aspects using details from the Paperless article. If you need specific configuration details for the Paperless variables, feel free to refer to the article for more clarity.

services:
  service_name:
    container_name: ...
    image: ...
    restart: ...
    networks:
      ...
    ports:
      ...
    security_opt:
      - no-new-privileges:true
    depends_on:
      ...
    volumes:
      ...
    environment:
      ...

Let's kick off setting up our services by introducing the services section. Then, indented by a two of spaces or a tab, we provide a name for our service. Further indented by the same amount, we configure our service following a consistent pattern:

container_name: Something easily recognizable to identify the container.
image: Specifies the image we want to use.
restart: Determines the restart policy (I always opt for 'unless-stopped').
networks: Lists the networks the container should have access to.
ports: If the container needs to be accessible from outside the Docker network, we map a port from our host machine to the container's port.
security_opt: I always include the setting no-new-privileges:true for security. It prevents processes in the container from gaining new privileges after they start. This reduces the attack surface and limits the impact of security vulnerabilities. Only change this setting if you fully understand the implications and trust the source.
depends_on: Specifies containers, like databases, that this container relies on. If not applicable, delete this section.
volumes: Indicates persistent volumes, if needed, by mapping the corresponding folder from our host to the container.
environment: The crucial configuration of the container. We need to consult the container's documentation to understand which variables can be used to configure the container upon startup.

Let's examine an example configuration for the Paperless containers.

services:
  paperless:
    container_name: paperless
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped

We're naming our service 'paperless' and the container_name will also be 'paperless'. We're fetching the image from 'ghcr.io/paperless-ngx/paperless-ngx:latest', utilizing the GitHub repository for this purpose. As for the restart policy, it'll stick to my default 'unless-stopped' setting.

    networks:
      - frontend
      - backend
    ports:
      - 8080:8000
    security_opt:
      - no-new-privileges:true
    depends_on:
      - paperless-redis
      - paperless-postgres

The Paperless container will be connected to the 'frontend' and 'backend' networks for access. Externally, it will be reachable on port 8080, which is mapped to its internal port 8000. We've added a security option to limit new privileges. Furthermore, we've noted dependencies on PostgreSQL and Redis services, which we'll configure in the following steps.

    volumes:
        # Docker container data
      - $DOCKERDIR/paperless/paperless/data:/usr/src/paperless/data
        # Location of your documents
      - $DOCKERDIR/paperless/paperless/media:/usr/src/paperless/media
        # Target for backups
      - $DOCKERDIR/paperless/paperless/export:/usr/src/paperless/export
        # Watch folder
      - $DOCKERDIR/paperless/paperless/consume:/usr/src/paperless/consume
    environment:
      USERMAP_UID: $PUID
      USERMAP_GID: $PGID
      PAPERLESS_TIME_ZONE: $TZ
      ...

When mapping volumes in Docker Compose, the left side represents the path on the host machine, and the right side represents the path within the container. The $DOCKERDIR variable stands for /server/docker, making the configuration more flexible. You can technically map the root directory of the container /, but it’s generally better to map specific directories to maintain clarity and security.

Ensure you check the container documentation to understand where the data you want to persist is located. If the specified host path does not exist, Docker will create it. Similarly, Docker will create the specified directory within the container if it doesn’t exist.

Additionally, environment variables defined in our .env file are referenced in the docker-compose.yml (or your compose.yml) file, keeping configurations clean and easily manageable. You can see here the first few environment variables of Paperless-ngxbeing set.

Now, let's take a quick look at the other two services we need to run our Paperless-ngx instance.

  paperless-postgres:
    container_name: paperless-postgres
    image: postgres:16.0-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - $DOCKERDIR/paperless/postgres:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: DBUSERNAME
      POSTGRES_DB: DBNAME
      POSTGRES_PASSWORD: $DBPASSWORD
  
  paperless-redis:
    container_name: paperless-redis
    image: redis:7.2-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - $DOCKERDIR/paperless/redis:/data
    environment:
      REDIS_ARGS: "--save 60 10"

You can see the same pattern here again, with details differing according to the service we need.

For your benefit, here is the full setup. However, please note that the database parameters and other details still need to be adjusted to your specific needs.

networks:
  frontend:
    name: frontend
    driver: bridge
  backend:
    name: backend
    driver: bridge

services:
  paperless:
    container_name: paperless
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    networks:
      - frontend
      - backend
    ports:
      - 8080:8000
    security_opt:
      - no-new-privileges:true
    depends_on:
      - paperless-redis
      - paperless-postgres
    volumes:
        # Docker container data
      - $DOCKERDIR/paperless/paperless/data:/usr/src/paperless/data
        # Location of your documents
      - $DOCKERDIR/paperless/paperless/media:/usr/src/paperless/media
        # Target for backups
      - $DOCKERDIR/paperless/paperless/export:/usr/src/paperless/export
        # Watch folder
      - $DOCKERDIR/paperless/paperless/consume:/usr/src/paperless/consume
    environment:
      USERMAP_UID: $PUID
      USERMAP_GID: $PGID
      PAPERLESS_TIME_ZONE: $TZ
      PAPERLESS_OCR_LANGUAGE: deu+eng
      PAPERLESS_ENABLE_UPDATE_CHECK: "true"
      PAPERLESS_REDIS: redis://paperless-redis:6379
      PAPERLESS_DBHOST: paperless-postgres
      PAPERLESS_DBNAME: DBNAME
      PAPERLESS_DBUSER: DBUSERNAME
      PAPERLESS_DBPASS_FILE: /run/secrets/paperless_db_paperless_passwd
      PAPERLESS_DBPASS: $PAPERLESS_DBPASS
      PAPERLESS_SECRET_KEY: $PAPERLESS_SECRET_KEY
      PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{created} {title}"
      PAPERLESS_URL: ""
      PAPERLESS_ALLOWED_HOSTS: PAPERLESS_URL 
      PAPERLESS_ADMIN_USER: "" # only needed on first run
      PAPERLESS_ADMIN_PASSWORD: "" # only needed on first run
  
  paperless-postgres:
    container_name: paperless-postgres
    image: postgres:16.0-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - $DOCKERDIR/paperless/postgres:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: DBUSERNAME
      POSTGRES_DB: DBNAME
      POSTGRES_PASSWORD: $PasswordFromEnvFile
  
  paperless-redis:
    container_name: paperless-redis
    image: redis:7.2-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - $DOCKERDIR/paperless/redis:/data
    environment:
      REDIS_ARGS: "--save 60 10"

Docker / Docker Compose Commands

Last but not least, let's review the most important Docker commands and I'll share some tips. I also encourage everyone to not be afraid to explore the help section of a new command. This is a common practice for me, and I learn a lot by doing so. However, don't get me wrong—I don't always understand everything I see. Most Linux commands have a help, -help, or --help option you can append to the command to get more details.

Firstly, we can utilize the docker ps command to display all our containers along with information such as uptime, used image, status, etc. I always add the -a option to also view stopped containers.

sudo docker ps -a

When we are in the same directory, where our docker compose file is, we can use the docker compose command. You can also use the docker compose command from elswhere, but then you need to specify the file to be used. With docker compose up we can have everything created or started. This includes networks, volumes, ... everything that is defined in your compose file.

If you run the command after adjusting one service or adding a new one, only the changed part or added part will be (re-) created. The rest will be untouched. You will want to add -d after the command to run in detached mode. In the Linux console, detached mode means the command runs in the background, allowing you to continue using the terminal for other tasks.

One more addition will be --remove-orphans. This ensures that anything removed from our Docker Compose file, such as a service or a network, is also removed from our environment.

sudo docker compose up -d --remove-orphans

Tip: If you've added or changed multiple services in your compose file but only want to recreate specific ones, just list their names at the end of the command. Use the service names defined in your compose file.

sudo docker compose up -d service1 service2 ...

Another important command to know is the docker compose down command. This stops all services and removes everything you've created. If you are not using bind mounts as we have in this guide and have decided to created volumes, you will need to add --volumes to remove them as well. Although I rarely use this command without specifying a specific service, it can be very helpful.

sudo docker compose down

Tip: We can also specify the services here, just as we did before. This allows you to stop and remove only the specified services.

sudo docker compose down service1 service2 ...

When it comes to updating our environment, by updating I mean getting the latest images, we can use the docker compose pull command. It's crucial to understand image versioning and what you have set in your compose file. With docker compose pull, Docker checks every referenced image for a new version and downloads it if available (only downloads, doesn't update containers).

Personally, I prefer not to use services like Watchtower, which periodically check for updates and restart services to use the newest image. I've have encountered bugs in new images, prompting me to revert to an older version and wait for the bug to be fixed. I don't want this to happen when I'm not available to address issues. I prefer to handle updates manually, ensuring I have time to investigate and resolve any problems.

sudo docker compose pull

Alright, now that we've downloaded new images using docker compose pull, let's apply them to our Docker environment using the command we're already familiar with: docker compose up -d --remove-orphans.

Last but not least, once we've updated our service, we want to tidy up the system afterwards and delete anything that's no longer needed. This could include old images, containers, etc. For this task, we'll use docker system prune.

I always add -af to the command. a stands for all, which removes all parts of unused Docker artifacts, and f disables the system prompt, which would otherwise require confirmation for deletion.

sudo docker system prune -af

All right, we now know the most important commands we need while working with Docker and Docker Compose. When I want to run an update, I use the following commands in this sequence to update the system. I could combine them or even put them into a shell script, but I prefer to see what is happening so I can investigate issues more quickly.

sudo docker compose pull
sudo docker compose up -d --remove-orphans
sudo docker system prune -af

Conclusion

In conclusion, Docker proves to be an exceptional tool for self-hosting, offering a myriad of benefits for enthusiasts and professionals alike. While Docker simplifies the deployment and management of applications, it's essential to acknowledge and navigate its complexities effectively. Through diligent exploration and understanding, users can harness Docker's power to build robust, efficient, and scalable self-hosting environments.

My journey with Docker has been marked by continuous learning, experimentation, and growth. Leveraging Docker, I've maintained a stable and versatile server setup for over five years, seamlessly integrating new applications and services into my environment. Docker's ability to encapsulate applications and their dependencies in lightweight containers has revolutionized the way I approach self-hosting, empowering me to explore new technologies and expand my skill set.

However, it's crucial to recognize that Docker isn't a one-size-fits-all solution. While it excels in many scenarios, it's not without its limitations. Users must carefully evaluate their requirements and consider factors such as security, complexity, and resource management when leveraging Docker for self-hosting projects.

Despite its challenges, Docker offers unparalleled flexibility, efficiency, and scalability, making it an indispensable tool for modern self-hosting enthusiasts. By embracing Docker and mastering its intricacies, users can unlock a world of possibilities for their self-hosting adventures, propelling them towards new heights of innovation and productivity.

Debian Server Essentials: Setup, Configure, and Hardening Your System

Henning — Thu, 19 Oct 2023 20:43:32 GMT

Introduction

Today, we're delving deep into my standard Debian setup. Debian has become my go-to choice for various server projects. I rely on Debian across the board, whether it's running on my Raspberry Pi, inside my LXC containers, or powering my virtual machines (VMs). In fact, even the hypervisor I use, Proxmox, is built on the Debian foundation.

As I'm not a Proxmox expert by any means, I'll steer clear of the hypervisor setup, but that is quite straight forward anyways. I will walk you through my step-by-step process for configuring a fresh Debian server. This methodology applies universally, be it a VM, Raspberry Pi, or LXC container.

❗

I want to make it clear that I'm not a Linux expert or administrator; everything I've learned has been self-taught. The same applies to security practices.

If you happen to identify any issues or have valuable tips and advice, I welcome your input. I'm continually seeking ways to improve, and I'll maintain a change log at the bottom of this article to track any updates and enhancements.

Why Debian Wins my Heart and Server

I had been using Ubuntu as my primary operating system until the beginning of 2023. However, at that point, I encountered some performance issues within my setup that proved quite challenging to debug and identify the culprit. Faced with this predicament, I made the decision to overhaul my system and take the opportunity to explore Debian. I had heard numerous positive remarks about Debian and had come across articles suggesting that it could potentially provide a slight boost in performance, which piqued my interest.

I've always had a penchant for exploring new horizons and technologies. Considering that I'm not a Linux expert, I viewed this transition to Debian as a significant change that offered the perfect chance to learn something new. The allure of mastering a different Linux distribution and broadening my skill set was too enticing to pass up.

Since the complete rebuild of my homelab on Debian and parting ways with Ubuntu, I have experienced no issues and, perhaps subjectively, perceived a bump in performance. However, it's important to be entirely honest here—I lack concrete data to substantiate this perception. After all, running a homelab for several years, constantly tinkering and experimenting, may have introduced some issues. It's entirely plausible that a fresh start on Ubuntu might have yielded similar results. But the allure of Debian's reputation and the joy of embarking on a new learning journey were more than enough to justify the switch.

Embark on Your Debian Journey: A Step-by-Step Installation Guide

To kick off your Debian journey, start by obtaining the official Debian installer. You can either create an installation USB stick for your physical server or load the image directly into Proxmox for your virtualization environment.

Debian Installer

Installing Debian via the Internet

Debian Installer (RPi)

Tested images

☝️

This guide is centered around a Proxmox VM setup. Please note that there may be differences or additional steps required when setting up on a Raspberry Pi or LXC container.

Debian Installation

For those who may be unsure about the Debian installation process, I've documented my personal approach with screenshots and compiled it into a helpful video guide. Feel free to adapt the steps to your own needs and preferences. As a personal practice, I start with a straightforward, easily memorable password during the installation, which I then replace with a long, randomly generated password once the system is up and running. This adds an extra layer of security to the setup while keeping the initial installation process user-friendly.

Note as well that the root password is left empty on purpose. This will install sudo and grant our personal user sudo privileges.

0:00

/0:45

Now that we've completed the initial setup, we'll transition to using SSH for all subsequent steps. To begin, you'll need to determine your server's IP address. If you have a DNS (Domain Name System) like I do, you can create a DNS record that points to the server's IP address. This not only simplifies access but also provides a more user-friendly and reliable way to connect to your Debian server.

If you're using a Windows PC and have set up WSL 2 (Windows Subsystem for Linux 2) with a Debian environment, you're in luck. This environment can serve as a powerful tool for managing and connecting to your Debian servers. You can seamlessly use your WSL Debian environment to SSH into your servers, simplifying the process and enhancing your workflow. With this setup, you'll have the best of both worlds, harnessing the flexibility and power of Linux without leaving your Windows environment behind.

First Steps to Server Bliss: Connecting and Initial Configuration

Let's proceed to the next steps to configure the basics of our new server and bolster its security.

Connecting via SSH

As our initial action, we're going to create an SSH key for secure and efficient remote access. This cryptographic key pair enhances both convenience and security. If you've already generated a private key, you can skip this step and proceed to the next stage of our server configuration.

If you need backward compatibility (pre OpenSSH 6.5), then use an RSA key. Otherwise, go for an Ed25519 key, which offers the same (if not better) security and is more efficient.

ssh-keygen -t ed25519 -a 100

-a 100 provides a more secure key with longer key generation time.

ssh-keygen -t rsa -b 4096

When creating your SSH key, you'll notice that most fields are optional, and you can leave them blank to use the standard values. However, it's crucial to note that by doing so, your private key won't be secured with a password. While skipping the password might seem more convenient, it's also less secure. If you choose this option, it's imperative to ensure the physical and digital safety of your private key file. In the event that the key is ever lost or compromised, you should promptly revoke all access granted to that specific key to maintain the integrity of your server's security.

Now it is time to copy the public key onto the server. I made a DNS entry for my server under 'nerdyarticles'. You can use your server name or IP address instead.

ssh-copy-id @nerdyarticles

Now we can connect using our private key.

ssh @nerdyarticles

Disable root

Our next priority is to enhance server security by prohibiting direct root login. This added layer of protection ensures that the root user won't have the ability to log in directly, which is a recommended practice to bolster your server's defenses.

This step is straightforward and effective. To prevent root login, we'll modify the shell that root uses upon login to point to a non-existent file. A commonly recommended practice is to set the shell to '/bin/nologin'.

sudo nano /etc/passwd

root:x:0:0:root:/root:/bin/nologin

Change Passwords

To elevate our server's security, it's time to change the initial, easy-to-remember password we used for convenience during setup. I personally rely on a password manager to generate and securely store long, randomly generated passwords. This practice not only enhances security but also simplifies password management.

passwd

Secure SSH

Now, it's time to reinforce the security of our SSH connection. We'll implement a set of configurations that I've gathered through online research, intended to enhance the security of your SSH setup.

sudo nano /etc/ssh/sshd_config

Let's delve into the configuration file and make the necessary changes or additions to the parameters. I've arranged them in the order they should appear for your convenience

Protocol 2
Port XXXX
PermitRootLogin no  
MaxAuthTries 3  
PubkeyAuthentication yes 
PasswordAuthentication no   
PermitEmptyPasswords no  
X11Forwarding no  
ClientAliveInterval 180  
ClientAliveCountMax 3

Restart the service

sudo systemctl restart sshd

Let us try the connection.

exit
ssh @nerdyarticles -p XXXX

Disable IPv6

While there are clear advantages to having IPv6 enabled, I've chosen to disable it for my setup. In the future, I may explore this topic more deeply. It's worth noting that there are differing opinions on whether to leave it enabled, so I encourage you to research and decide based on your unique use case.

sudo nano /etc/sysctl.conf

Add the following to the bottom of the config file.

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
net.ipv6.conf.tun0.disable_ipv6 = 1

To put these changes into action and have them take full effect, a reboot is in order.

sudo reboot now

Setting up a Firewall

Our next step involves setting up the Uncomplicated Firewall (UFW) to establish fine-grained control over the ports and services that can be accessed on our server. This configuration ensures that only the ports and services we've explicitly configured are accessible, adding a significant layer of security to our system.

sudo apt install ufw

Since we've disabled IPv6, it makes sense to extend the optimization to UFW by disabling IPv6 rules. This not only streamlines our configuration but also eliminates any duplicated rules.

sudo nano /etc/default/ufw

Change the IPv6 value to false.

Before enabling the firewall, it's crucial to grant access to the SSH service. Neglecting this step could lead to unintentionally locking ourselves out of the server. Therefore, let's ensure SSH access is allowed before we activate the firewall.

sudo ufw allow  comment "SSH service"

Get ready, it's firewall time!

sudo ufw enable

Enhanced Security: Safeguarding with Fail2ban

Continuing our journey, we'll bolster security and vigilance by setting up additional measures and implementing log monitoring. One such tool is 'fail2ban,' which scans server logs for repetitive failed login attempts and subsequently blocks IP addresses engaging in suspicious behavior. This extra layer of defense enhances your server's resistance to potential threats.

sudo apt install fail2ban

Adding Qemu Guest Agent

As I run my VM on Proxmox, I want to make sure the QEMU Guest Agent is installed. This tool plays a critical role in enhancing the interaction between the host and guest systems. It enables various features like graceful shutdowns, improved file system consistency, and provides better insights into the guest's operational state. Ensuring the QEMU Guest Agent's presence contributes to a more efficient and manageable virtualized environment.

Be sure to activate the QEMU Guest Agent option in your Proxmox VM's settings and reboot. The QEMU Guest Agent might already be active. If not, you can install it as follows:

sudo apt install qemu-guest-agent

A reboot might be necessary!

Closing Remarks

To reiterate, I'm a self-taught Linux user and also not a security expert. While I consider this setup quite safe, I strongly advise conducting your own research and taking responsibility for any intrusions or issues.

I sincerely hope that this guide has proven helpful in establishing a secure and reproducible server setup. While I could have embarked on the journey of learning Ansible to streamline this process into one (or more) playbooks, I've found that the commands outlined in this article offer a swift and efficient means of bringing a new server to life, fully configured and ready to roll.

Change Log

20.10.2023: Updated install procedure (no root passwd), removed need to install sudo and change root passwd, added ssh key Ed25519 details, redone IPv6 paragraph
19.10.2023: Initial article based on Debian Proxmox VM

A Clutter-Free Life: Going Paperless with Paperless-ngx

Henning — Fri, 06 Oct 2023 23:06:21 GMT

Introduction

In my previous article, I provided a comprehensive overview of my homelab. If you haven't had a chance to read it yet but are interested, you can find it at here. In the upcoming articles, I'll delve into detailed guides on how to create a homelab like mine. However, from time to time, like now, I like to shift the focus and spotlight a service or project I'm particularly fond of.

In this article, I'll explore what Paperless-ngx (we'll call it Paperless for short) is and the benefits it offers. I'll also share how I've structured my Paperless installation. Finally, I'll provide my code for you to use in setting up your own instance. For now, I'll assume you already have a functional Docker environment ready to deploy your Paperless instance.

Introducing Paperless-ngx, your key to simplifying life by going digital! Are you tired of hunting for crucial documents when doing your taxes? With Paperless, that hassle is a thing of the past. This fantastic tool has built-in OCR (Optical Character Recognition) and learns from your categorized documents, making it a breeze to organize your files. Say goodbye to endless diggin through folders – now, all your essential documents are just a few clicks away. Once you've given Paperless a try for a while, you'll find that it consistently gets things right, reducing the need for manual adjustments. Say hello to a more streamlined and efficient way of managing your documents!

The big benefit of going digital in this space? Flexibility! Unlike traditional paper-based or folder-organized digital systems, your documents can belong to multiple categories at once. Ever wrestled with the dilemma of deciding which folder a document should go into? That's because it often fits more than one category. The solution? Tags. With a digital solution like Paperless, you can assign as many tags as you need, making categorization a breeze. Say goodbye to the folder confusion!

Paperless: A Living Proof of Open Source Greatness

In short: Paperless transitioned into Paperless-ng and then into Paperless-ngx

Paperless was initiated by Daniel Quinn, with his initial commit to the GitHub project dating back to 6th June 2016. The version 1 release came on 3rd July 2018 (with the first release, 0.3.0, launched on 1st January 2017). The last release under Daniel Quinn's was on 27th January 2019.

As the original project began to lose momentum, Jonas Winkler's fork Paperless-ng emerged as the official successor. The first release (0.9) occurred on 18th November 2020, and the final release before the current transition was on 22nd August 2021.

At the time of writing this, Paperless-ngx is the official version and repository, having its inaugural release on 05th March 2022.

This journey exemplifies the strength of the open-source community. Even if a project experiences slowdowns or transitions, passionate contributors ensure its legacy lives on and continues to evolve.

My Paperless Story

Efficiency Unleashed

Let's explore Paperless and my adoption of it. There are various approaches to adopting it, so find what suits you best. What works for me might just be the right fit for you too, or you might discover your own path along the way.

The documents view in Paperless-ngx

Here are the key feature of Paperless:

Document Management: Organize and manage your digital documents efficiently in the web.
Tagging System: Easily categorize documents with tags for quick access.
OCR Integration: Optical Character Recognition for extracting text from images and scanned documents.
Automatic Categorization: Learns from your tags and categories to auto-classify future documents.
Search Functionality: Powerful search capabilities to find documents swiftly.
Multi-User Support: Collaborate with others by setting up multiple user accounts.
Data Export: Export your documents and data for backup or migration purposes. They are also kept in a very neat folder structure and naming convention, enabling you to ditch Paperless without any effort.

From Paper to Pixels: Which Documents Belong in Your Digital Vault?

I'm in Germany, so these rules may not apply to you. Always do your own research. I am only sharing what I am doing and not advising you to do the same.

Generally, I digitize what I'd put in a physical folder, which is not every letter from banks, insurers, etc.

The next choice is whether to keep the physical copy or go digital only (make sure you have a proven backup solution). In Germany, you must keep physical copies of certificates, notarized docs, property papers, vehicle papers, specific financial docs (contract-related, not just statements), insurance policies, passports, and ID cards. I shred and dispose of any document that don't require a physical copy. Fortunately, many documents are digital these days, saving me from making that decision and scanning them.

Tagging, Document Types, and Correspondents: My Organizational Secrets

To stay organized and efficient, it's crucial to consider document types, tags, and correspondents.

Correspondents

This part is straightforward. Every sender of documents you upload to Paperless becomes a correspondent. You can add new correspondents on the fly when you review new documents.

I do make exceptions; for instance, invoices from online shops are assigned to 'Online Shops' rather than having their own correspondent. Thanks to the full-text search, finding what I need is easy within the hundreds of documents in the system.

Document Types

I put a lot of thought into document types when setting up Paperless, but rarely ever need them. Although I don't use them often, they don't hurt as Paperless is self-learning.

Here are the types I've set up:

Invoice
Letter
Insurance
Contract
Other

Digitalization Made Easy: Uploading to Paperless-ngx

Now that everything is set up for organizing our documents, you might wonder how to get them in there. It's a straightforward process, with three simple methods (not counting potential mobile apps as a fourth option).

This is also where the 'ToDo' tag comes into play! Make sure, when you create the tag, that you tick the 'Inbox tag' box.

Setting of the ToDo tag to always apply to new documents

Regardless of how the documents are imported into Paperless, the ToDo tag is set. I have created a so-called 'view', which displays all documents with the ToDo tag, and added that view to my homepage. This is essentially my inbox in Paperless, which I go through from time to time to check if everything is set correctly.

The Paperless-ngx start screen with documents in my 'inbox'.

Uploading

In your web browser, click the upload button, select your file(s), and click upload. For an even easier method, just drag and drop your file(s) into the Paperless browser window, no matter where you are in Paperless. It's that simple!

Email Inbox

You can have Paperless periodically scan a specific email inbox and import files from there. It could be your standard inbox, and you watch for a certain topic or folder, or you just set up a new inbox and forward emails with documents you want to import to that inbox. This is what I do, and let's have a quick look at how to set that up.

After you have added an email inbox to monitor in the settings, you can specify what the importer will actually watch for and do.

Settings for the email importer

What you can see in the picture above is that my rule 'Main Importer' will import from 'Paperless Mail Inbox' (the name I gave the configuration of the email inbox to monitor). It watches the folder 'INBOX', processes all PDF documents, uses the file's name as the document name, sets the 'ToDo' tag, and deletes the email.

Don't worry about the 'Do not assign a correspondent' option; this actually means that the self-learning algorithm will try to set the correct correspondent.

Auto import from a folder

You can also have Paperless watch a specific folder and import all documents from there. This is especially useful if you already have many digital documents you want to import initially or if you have a scanner with the ability to scan to a network share.

Effortless Editing: Perfecting Your Documents

Paired with the OCR process and the self-learning algorithm, editing newly added documents is very easy and fast. Remember, though, that the self-learning algorithm needs to learn from your documents. Any new correspondent will definitely be wrong, as it will try to set a known correspondent. After some time, it will become a matter of just making sure everything is correct.

The title is something I neraly always redo, to make it very meaningful, in order to identify the document very quickly.

I have learned that it is tricky to set the right date. There are a few to choose from (dates from the files meta data, dates in the header, or any other dates mentioned in the document). It gets it right most times, but do pay attention to it.

Editing view for Paperless-ngx documents

In the above screenshot, you can see a freshly imported document, which I can now check if everything is correct. Is it a new correspondent? No need to leave the window; you can create it straight from the input field here, same for the Document Type and Tags.

The archive serial number can also be very useful (I don't use it). The serial number will automatically count upwards by one from the last serial number used. You can then write this number on the physical document and just put it in a folder, without any extra organization. Are you now looking for a document, just look up the serial number, identify the folder, ...

The last step when I am done is removing the 'ToDo' tag and 'Save & Close' the document.

Setting up Paperless-ngx in Docker

My Way

You've come a long way (or maybe you jumped ahead to grab my Docker config). Now, let's chat about how I've configured my Paperless instance to run in Docker. To make this work, we'll need three containers: a database (usually PostgreSQL, which I prefer), a Redis broker for scheduled tasks, and, of course, Paperless-ngx itself.

Use of Alpine images

I'm a fan of Alpine images – they're speedy and light on resources. So, when I set up a service, I check if there are Alpine images up for grabs.

If I bump into a hiccup, I switch back to the regular image first. Gotta make sure it's not an Alpine image causing the trouble with dependencies.

Redis Container

Alright, let's kick things off with the easiest part: the Redis broker.

paperless-redis:
  container_name: paperless-redis
  image: redis:7.2-alpine #fixedVersion
  restart: unless-stopped
  networks:
    - backend
  security_opt:
    - no-new-privileges:true
  volumes:
    - ./paperless/redis:/data
  environment:
    REDIS_ARGS: "--save 60 10"

Here are some important points to highlight in my Docker Compose code for the Redis broker:

Naming Convention: I use a naming convention for supporting services, where it's structured as -. In this case, it's 'paperless-redis,' and this convention applies to both the service and container name.

Fixed Image Version: I always specify a fixed image version for supporting services. This helps prevent issues when upgrading the Docker stack. It ensures that the version I'm using is compatible with my main service. This is particularly crucial for databases, where version mismatches can require migration steps.

Docker Networks: I organize my Docker containers into two networks:

Frontend: This network is for services that run behind my reverse proxy and require outside access (outside means outside of the server running docker).
Backend: This network is for services that solely support other services and don't need outside access.

Container Privileges: I specify that the container cannot have additional privileges, which enhances security.

Bind Mount (Volume Path): I follow a specific structure for the bind mount (volume path). It starts with a folder named after the main service and then includes a subfolder named after the supporting service. This organization makes it easy to remove a specific configuration if I decide to discard something I've experimented with.

Redis Configuration: Specific to Redis, I set an environment variable that instructs it to save its state to disk every 60 seconds or after 10 write operations. This ensures data durability.

If you're wondering why there's no port setting, it's because we're working within a Docker network. We can access the container ports without explicitly defining them here. The port definition creates a port mapping between the container and the host machine/VM.

Warning: Ports ignore firewall rules

It's worth noting that the default Docker network (bridge network) isn't affected by rules from an iptables-based firewall. I'll dive deeper into this in an upcoming article about securing your Docker setup.

PostgreSQL Container

When it comes to databases in my self-hosting setup, I'm a fan of PostgreSQL. It's my preferred choice. In the case of Paperless-ngx, it defaults to PostgreSQL, but you do have the option to run MariaDB if that's your preference. I personally also always deploy a DB per service and do not reuse the same DB twice.

Let's have a look at the docker compose of the database.

paperless-postgres:
  container_name: paperless-postgres
  image: postgres:16.0-alpine #fixedVersion
  restart: unless-stopped
  networks:
    - backend
  security_opt:
    - no-new-privileges:true
  volumes:
    - ./paperless/postgres:/var/lib/postgresql/data
  environment:
    POSTGRES_USER: DBUSERNAME
    POSTGRES_DB: DBNAME
    POSTGRES_PASSWORD_FILE: /run/secrets/paperless_db_paperless_passwd
    #POSTGRES_PASSWORD: $PasswordFromEnvFile
  secrets:
    - paperless_db_paperless_passwd

I prefer to use a docker secret for the password if the image supports it. I've also included a commented-out line for providing the password via a .env file if you prefer that method.

Since I'm using a secret, I need to map it appropriately. I'll be covering the use of secrets in a future article about securing your Docker environment. Stay tuned for that!

Paperless-ngx Container

Finally, let's get to the main reason you're here: the Docker Compose configuration for my Paperless-ngx container.

paperless:
  container_name: paperless
  image: ghcr.io/paperless-ngx/paperless-ngx:latest
  restart: unless-stopped
  networks:
    - frontend
    - backend
  ports:
    - 8000:8000
  security_opt:
    - no-new-privileges:true
  depends_on:
    - paperless-redis
    - paperless-postgres
  volumes:
    - ./paperless/paperless/data:/usr/src/paperless/data # Docker container data
    - ./paperless/paperless/media:/usr/src/paperless/media # Location of your documents
    - ./paperless/paperless/export:/usr/src/paperless/export # Target for backups
    - ./paperless/paperless/consume:/usr/src/paperless/consume # Watch folder
  environment:
    USERMAP_UID: $PUID
    USERMAP_GID: $PGID
    PAPERLESS_TIME_ZONE: $TZ
    PAPERLESS_OCR_LANGUAGE: deu+eng
    PAPERLESS_REDIS: redis://paperless-redis:6379
    PAPERLESS_DBHOST: paperless-postgres
    PAPERLESS_DBNAME: DBNAME
    PAPERLESS_DBUSER: DBUSERNAME
    PAPERLESS_DBPASS_FILE: /run/secrets/paperless_db_paperless_passwd
    #PAPERLESS_DBPASS: $PAPERLESS_DBPASS
    PAPERLESS_SECRET_KEY_FILE: /run/secrets/paperless_secret_key
    #PAPERLESS_SECRET_KEY: $PAPERLESS_SECRET_KEY
    PAPERLESS_FILENAME_FORMAT: "{{created_year}}/{{correspondent}}/{{created}} {{title}}"
    PAPERLESS_URL: ""
    PAPERLESS_ALLOWED_HOSTS: PAPERLESS_URL
    # Set the following two for your first launch
    # and change the admin password afterwards.
    # Once setup, you can safely remove these variables.
    PAPERLESS_ADMIN_USER: ""
    PAPERLESS_ADMIN_PASSWORD: ""

Let's take a closer look at some of the specific details in the Docker Compose configuration:

Networks: We join both networks because we want Paperless to communicate with the reverse proxy (not setup or covered in this article) and other services on the backend network.

Ports: To access Paperless from other devices on your local network, we create a port mapping, which would not be needed when using a reverse proxy.

Depends On: This ensures that the specified containers are running before starting Paperless.

Volumes: We use volumes for data persistence, including our documents and Paperless configuration. I usually use NFS mounts for media, export, and consume which go straight to my NAS. For simplicity, bind mounts are used here.

Environment Variables

USERMAP_UID & USERMAP_GID: It's a good practice to set these to the IDs of your personal user on the Docker host system. This helps avoid permission issues when editing config files or accessing PDFs on the file system.

PAPERLESS_OCR_LANGUAGES: Set this to the languages you expect your documents to be in.

PAPERLESS_REDIS: Specify the path to your Redis container.

PAPERLESS_DB*: Configuration details to access your deployed database.

PAPERLESS_SECRET_KEY: It's essential to set this to enhance security. The default token is well-known, so changing it is recommended.

PAPERLESS_FILENAME_FORMAT: This determines the folder structure and filename of the PDFs stored in 'media'. You can customize it using various variables. I use a folder for the year created, followed by the correspondent, and then the filename as the creation date with the title from Paperless.

PAPERLESS_URL: Set it to your domain name or the URL, including the port you'll use to access Paperless.

PAPERLESS_ALLOWED_HOSTS: This setting restricts access to Paperless only from the addresses you specify.

PAPERLESS_ADMIN_USER & PAPERLESS_ADMIN_PASSWORD: Set these two before your first startup, and once you have logged in and changed the password, you can remove the variables. They are only needed to create the initial admin user. Otherwise, you can run the command 'createsuperuser,' but I prefer the above variables

These are the configuration settings I've opted for, but there are many more options to explore. I recommend checking out the well-written Paperless-ngx documentation to tailor the setup to your needs. It's a valuable resource for fine-tuning your self-hosted document management system.

Data Safety 101: The Power of Regular Backups

A helpful Redditor reminded me that I hadn't covered backups. Can you blame me? My backup system is so smooth that I rarely think about it (will be covered in the future). However, let's dive into the specifics of backing up Paperless.

Since we're running everything in Docker and have the export path mounted, we can easily use Paperless's built-in backup solution. To automate this, we'll add a specific command to a crontab job (under root user) on our Docker host machine. Personally, I like to create a shell script and launch it from the crontab. This approach allows me to make changes to the command without editing the crontab directly, but you can also insert the command directly into the crontab if you prefer.

The command I am adding to my 'paperless_backup.sh' shell script:

docker exec -it paperless document_exporter ../export -d -f -p -sm -z

You can find the documentation of the backup command here. But in short:

-d: will delete old backups
-f: uses my custom filname format
-p: uses dedicated folders for archive, originals, thumbnails and jsons
-sm: creates jsons per document instead of one large file
-z: zips the backup

Now, let's add this to the crontab. We need to include it in the root's crontab because our normal user doesn't have the access rights to run Docker commands without 'sudo,' and using 'sudo' won't work in a crontab due to the password prompt.

sudo crontab -e

I'm scheduling it to run at 4 am because at 3am my hypervisor backes up and my off-site backup starts at 5 am. This way, it includes the backup of Paperless and moves it to my off-site location. The backup itself only takes a couple of minutes for me, so there's no risk of overlapping with the off-site backup process.

Instead of adding the 'sh ' to the crontab, you could also directly add the command from the shell script in here.

0 4 * * * sh /server/docker/scripts/paperless_backup.sh

Warning: Supposed to stop Paperless before backup

According to the documentation, we should stop Paperless before running the backup. However, since we're running it in Docker, we can't stop the container as we need it for the backup command.

One of the maintainers wrote, everything should run fine as an online backup. There are measures in place to prevent issues when a backup runs while files are being uploaded. Check the GitHub discussion

Keeping Your Digital Archive Up-to-Date

We now have everything up and running, have backups configured. Let's talk about updating your paperless instance.

Personally, I do not create an extra backup before I run an upgrade for the paperless container. It is different when I update Redis or PostgreSQL! Generally, I update my whole homelab a couple of times a week, and every now and then, I do go through the Docker stack and check the background services if they should be updated. All those containers run a fixed version.

Let me walk you through the different steps.

Updating Paperless-ngx

Updating Paperless is very simple. It's just a matter of pulling a new image and starting the Docker Compose, which will restart Paperless if there is a new image available.

sudo docker compose pull 
sudo docker compose up -d

If you have more services in the Docker Compose with the latest tag or just want to make sure to only update one container, add the container's name like this.

sudo docker compose pull paperless
sudo docker compose up -d paperless

Updating Redis

I don't take any special precautions when updating Redis. It's just an in-memory message broker, so if anything goes wrong, I can just delete all the persistent data from './paperless/redis', use the image that worked, and spin it back up.

If you want to be extra safe, you can stop the container and make a backup of the './paperless/redis' folder before updating.

Now, all you need to do is set the image version to your choice, pull it, and start the container.

After making changes, it's important to check Paperless. Ensure everything's running smoothly, like verifying if your email import is working as expected. It's a good practice to ensure everything's in order.

Updating PostgreSQL

Now, let's dive into the trickier bit – updating the database. I don't do this casually and also do not do it often.

Minor Version upgrades

We distinguish between minor and major version upgrades. If the major version matches (the first number):

Halt all three Paperless containers.

sudo docker stop paperless paperless-redis paperless-postgres

Create a copy of the './paperless' directory.

mkdir backup
sudo cp -r ./paperless ./backup/paperless

Update the image version.
Fetch the latest image.

sudo docker compose pull

Start the containers.

sudo docker compose up -d

Major Version Upgrades

Major version upgrades for databases can be quite stressful. Some services specify compatible versions, but I couldn't find this info in Paperless-ngx docs. So, it's a bit of a gamble.

Ensure you have a safe backup ready, just in case we need it later. Safety first!

docker exec -it paperless document_exporter ../export -d -f -p -sm -z

With everything safely backed up, we're ready to start the update process (source of the process).

Stop paperless and redis containers

sudo docker stop paperless paperless-redis

Create a backup of the database and stop the container afterwards

sudo docker exec -it paperless-postgres pg_dumpall -U paperless > ./upgrade_backup.sql
sudo docker stop paperless-postgres

It's a good idea to check the 'upgrade_backup.sql' file to ensure the backup was successful. If you see only a few lines or encounter permission issues, pause and resolve them until you have a working dump. Don't rush this step.
Create a copy of the './paperless/postgres' folder
Delete the './paperless/postgres' folder
Set the new version you would like to use
Start the database again

sudo docker compose up -d paperless-postgres

Copy the 'upgrade_backup.sql' file from the copied folder to './paperless/postgres'
Now run the command to import backup into the new container

sudo docker exec -it Paperless-NGX-DB psql -U paperless < ./upgrade_backup.sql

Start all paperless containers and check everything works fine

sudo docker compose up -d

There you have it, all updated and good to go!

Paperless-ngx in Docker: The Full Stack

version: "3.9"

networks:
  frontend:
    name: frontend
    driver: bridge
  backend:
    name: backend
    driver: bridge
  default:
    name: default
    driver: bridge

secrets:
  paperless_db_paperless_passwd:
    file: ./secrets/paperless_db_paperless_passwd
  paperless_secret_key:
    file: ./secrets/paperless_secret_key

services:
  paperless:
    container_name: paperless
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    networks:
      - frontend
      - backend
    ports:
      - 8000:8000
    security_opt:
      - no-new-privileges:true
    depends_on:
      - paperless-redis
      - paperless-postgres
    volumes:
      - ./paperless/paperless/data:/usr/src/paperless/data # Docker container data
      - ./paperless/paperless/media:/usr/src/paperless/media # Location of your documents
      - ./paperless/paperless/export:/usr/src/paperless/export # Target for backups
      - ./paperless/paperless/consume:/usr/src/paperless/consume # Watch folder
    environment:
      USERMAP_UID: $PUID
      USERMAP_GID: $PGID
      PAPERLESS_TIME_ZONE: $TZ
      PAPERLESS_OCR_LANGUAGE: deu+eng
      PAPERLESS_ENABLE_UPDATE_CHECK: "true"
      PAPERLESS_REDIS: redis://paperless-redis:6379
      PAPERLESS_DBHOST: paperless-postgres
      PAPERLESS_DBNAME: DBNAME
      PAPERLESS_DBUSER: DBUSERNAME
      PAPERLESS_DBPASS_FILE: /run/secrets/paperless_db_paperless_passwd
      #PAPERLESS_DBPASS: $PAPERLESS_DBPASS
      PAPERLESS_SECRET_KEY_FILE: /run/secrets/paperless_secret_key
      #PAPERLESS_SECRET_KEY: $PAPERLESS_SECRET_KEY
      PAPERLESS_FILENAME_FORMAT: "{{created_year}}/{{correspondent}}/{{created}} {{title}}"
      PAPERLESS_URL: ""
      PAPERLESS_ALLOWED_HOSTS: PAPERLESS_URL
	  # Set the following two for your first launch
	  # and change the admin password afterwards.
	  # Once setup, you can safely remove these variables.
	  PAPERLESS_ADMIN_USER: ""
      PAPERLESS_ADMIN_PASSWORD: ""
    secrets:
      - paperless_db_paperless_passwd
      - paperless_secret_key
  
  paperless-postgres:
    container_name: paperless-postgres
    image: postgres:16.0-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - ./paperless/postgres:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: DBUSERNAME
      POSTGRES_DB: DBNAME
      POSTGRES_PASSWORD_FILE: /run/secrets/paperless_db_paperless_passwd
      #POSTGRES_PASSWORD: $PasswordFromEnvFile
    secrets:
      - paperless_db_paperless_passwd
  
  paperless-redis:
    container_name: paperless-redis
    image: redis:7.2-alpine #fixedVersion
    restart: unless-stopped
    networks:
      - backend
    security_opt:
      - no-new-privileges:true
    volumes:
      - ./paperless/redis:/data
    environment:
      REDIS_ARGS: "--save 60 10"

Full Docker Compose Stack for Paperless-ngx

Farewell, Paper Clutter

Closing Notes

Whew, that was quite a journey! We delved into Paperless-ngx, discovered the wonders of digital tagging over paper folders, and I spilled the beans on my battle-tested setup that's served me well for years. Our docker-compose adventure unfolded, providing you with the full recipe to kickstart your own paperless journey (if you have a running docker environment).

As for alternatives, I've been so content with Paperless that I've lost track of what's out there today. I can only vouch for the solution I know and love, but who knows what other gems might be waiting in the wings?

Farewell for now, fellow self-hosters. Keep the servers humming and the data flowing! Until next time, happy self-hosting.

Henning

Welcome to My Homelab: Your Gateway to Self-Hosting!

Henning — Tue, 03 Oct 2023 00:22:46 GMT

Welcome to my self-hosting adventure!

Over the past four years, I've spend many hours building my homelab, a place where I've explored open-source software, IT infrastructure, and the joys of self-hosting. In this first article, I'm excited to give you a glimpse into my homelab's general setup. From the hardware to the software stack, I'll paint the big picture. But hang tight, because this is just the beginning. In future articles, I'll dive deep into each component, offering guides and will be sharing the reasons behind my choices. My goal is simple: to empower you to embark on your self-hosting journey while inspiring you with the endless possibilities of open source.

Let's get started!

Hardware Overview

Let's kick things off by talking about the hardware I've got:

Hyper1: The main server running Proxmox and the majority of my load is a old gaming PC from 2012. The graphics card, RAM, SSDs and CPU cooler have been upgraded.
- i7-3820 (overclocked to 8x 3,9 GHz)
- GTX 1060 6GB
- 32 GB DDR3 Ram
Asgard: It all started on a self build mini server, which today runs TrueNAS-SCALE.
- J4105-ITX (up to 4x 2,5 GHz)
- 8 GB DDR3 Ram
Thor: Raspberry Pi 4 8GB
Bifroest: Old laptop from 2011 running my Proxmox backup server.
- i5-2410M (up to 4x 2,9 GHz)
- 8 GB Ram

Overview of my HomeLab created in Excalidraw (state 02.10.2023)

Secure Off-Site Backup Strategy

Let's start with something simple yet crucial: Bifroest, my trusty backup server. It runs Proxmox Backup Server (PBS) and sits safely at my parents' house, serving as a reliable off-site backup. It is connected via VPN (WireGuard), although it's temporarily chilling in my basement due to some maintenance.

Every night, Bifroest gets to work. It receives the backups of my virtual machines (VMs) and LXC container from Proxmox. Plus, it's the go-to spot for my file backups via SFTP. Using Restic, I keep my Docker directories and important files (photos, documents, and OwnCloud files) safely stored away at set intervals. It's all locked up tight with encryption, and I maintain monthly, weekly, and daily incremental backups for a specific duration. Restoring? Piece of cake. Thanks to Proxmox, it's as simple as a few clicks, and with Restic, it's all about "just" mounting it. Stay tuned for more, as I will for sure cover this part in more detail.

Ad-Free Network with DNS Failover

I use Pi-hole as my network-wide ad blocker. It is set as my DNS server. There are two ways to set up the DNS (at least with the hardware at my disposal).

Option 1
is to point my router to the Pi-hole DNS. The advantage is that I can configure a secondary DNS, but the downside is that all traffic appears to come from my router, which can be challenging for debugging and doesn't provide detailed statistics. Plus, there's an extra hop for data as it goes through the router as DNS, then to the Pi-hole, and finally to an upstream DNS like Cloudflare.
Option 2
is what I use. I edited my network's DHCP settings in my router (an AVM FritzBox 6660 Cable) and added the Pi-Hole DNS there . The advantage here is that all devices get assigned the DNS directly, eliminating the extra hop. The downside is that I can only assign one DNS. If that DNS goes down, most of my services won't work, authentication fails, and the internet goes down.

To address this, I've introduced a second backup Pi-hole running in a docker container on a Raspberry Pi 4. My main Pi-hole is running in a Debian based LXC container on Proxmox. The Pi-holes syncs its config every 5 minutes in both directions, and both devices run Keepalived. Keepalived issues a virtual IP, which is set as my DNS address, directing all traffic to the master Pi-hole. If the master Pi-hole goes down, it takes just one second to switch over to the backup Pi-hole. No more interruptions to the internet! And most importantly, your significant other won't come running asking about the internet!

I prefer using hostnames over IP addresses whenever possible. It's convenient and reduces the hassle of changing IP addresses everywhere. Just remember, your DNS becomes a critical part of your homelab setup.

Mail Servers: Proceed with Caution

Running a mail server with Mailcow and Mailjet works well for me, but I wouldn't recommend it. It's a complex setup that takes time and can be risky. I am not sure this is a topic I will be diving into here. I don't rely solely on it to avoid locking myself out completely.

Homelab's Core: Docker Host VM - Midgard

Let's dive into the heart of my homelab where the magic and fun really happens.

Midgard, my Debian-based VM, handles Docker, and all my containers are managed with a single docker-compose file. Don't worry; I'll explain my setup in detail soon. But here's a spoiler: I prioritize security while also aiming for convenience.

Networking with Traefik, Cloudflare, IPs, and SSL

I rely on Traefik as my reverse proxy for all web-based services. It handles my three domains, manages Let's Encrypt SSL certificates, and sets up routes based on Docker labels. The beauty? I rarely need to tinker with Traefik; Traefik adapts to new configurations without restarts.

I have three domains: one for this site, another for public services, and an internal one only reachable from my network (thanks to a wildcard DNS entry in Pi-hole). Even the internal domain has an official SSL certificate.

Cloudflare helps manage my public DNS entries. Since I lack a static IP, I run a Cloudflare-DDNS container per domain to keep IP addresses updated. For the public domain, the Cloudflare-Companion container handles DNS entries for subdomains I launch. It's a smooth setup!

Security with Authelia, OIDC, LDAP, MFA

Security is paramount, but convenience is key. I've set up LDAP with lldap and created separate groups. Now, let's check out Authelia: it's my security guardian for most services. It ensures I'm in the right LDAP group, sends MFA requests via DUO, and bars unauthorized access.

Authelia also lets me use OIDC on my network, reducing the need for proprietary logins. Once logged in, I usually don't see the extra security layer until my cookie expires.

Here's a pro tip: set up Redis with Authelia to keep your sessions alive, even when you need to restart the container for config changes. I use a "deny all, except configured otherwise" approach. It requires adding new services to the config, but it's a foolproof way to stay secure.

Talking about security, I use Vaultwarden (a Bitwarden spin-off) as my password manager. I always generate unique passwords and never reuse them for any account or service.

Home Entertainment: Highly Automated

Let's dive into something fun: entertainment! Surprisingly, I don't have many *arr apps deployed because I didn't really need them.

For my media server, I use Plex, though I must admit I'm not its biggest fan. Jellyfin had some issues with Chromecast and playback interaction (like pausing when someone's at the door). With my new TV (Samsung The Frame), it's even trickier, as some movies can't be played via the built-in Apple TV. But at least Plex has a great app on the TV.

To save storage, I'm experimenting with Tdarr, which transcodes files to H.265. Sonarr manages my shows, Radarr handles movies, NZBHydra2 searches Usenet, and SABnzbd gets it all.

I got tired of our overflowing physical recipe book and loose printed files. We often cook recipes from blog articles too. So, I discovered Mealie, a digital recipe collection that can import from websites and assist while cooking.

We decided to share these recipes with friends and family, making them accessible to the public.

Nerdy Articles is my latest venture to share my self-hosting knowledge. After teaching three people everything I've built and talking everyone's ears off about the latest services and enhanced security, I'm now sharing my knowledge here. By the way, this great website is based on Ghost. I use Matomo as my self-hosted Google Analytics, and don't worry, I've kept everything cookie-free (apart from the mandatory non avoidable cookies) for your privacy and to avoid annoying pop-ups.

Smart Home & Statistics

While it's not on the Midgard VM in Proxmox, it's still important. I have quite a lot of Zigbee-based devices. A Deconz Conbee II Zigbee dongle is my gateway to yield the power over my Zigbee network. It can't be in the basement due to the signal strength, therfore it's connected to the Raspberry Pi 4 in our living room, which also serves as my backup Pi-hole.

The living room Pi-Hole runs Zigbee2MQTT which is connected to my MQTT broker. Home Assistant is the heart of my smart home, connecting to a lot of things in our house. Zigbee devices, my lawn mower robot, rooftop solar panel data, current energy prices, and much more are all part of the Home Assistant ecosystem.

It serves as both a control center for smart devices and a hub for checking various statistics, including the outdoor temperature.

Speaking of statistics, Home Assistant deletes its data after 7 days (in a rolling cycle), except for the statistical sensors. I currently have over 700 sensors in Home Assistant, with 370 actively in use. To maintain performance, some data has to be cleared after a few days, as some of the sensors update nearly every second. However, there are specific statistics I want to retain for a longer period, like our solar panel energy production and hourly energy prices. These are gathered using Telegraf or sent by Home Assistant and stored in an InfluxDB.

File & Document storage

You might wonder, what's the difference between file and document storage? Well, let me explain. I use Owncloud for file storage, similar to Dropbox or OneDrive, and it's straightforward. We keep photos there, spreadsheets, ... , and our phones sync pictures on the local network.

But one thing annoyed me: sifting through physical folders for tax documents. So, I tackled this issue for good. After trying a few solutions, I settled on Paperless-ngx (formerly Paperless and then Paperless-ng). I'll write a more detailed guide later, but here's the core: unlike physical folders, I can tag my PDFs however I want, like Tax, Tax 2023, or Insurance. Plus, it has data science features and will pre set the meta data of your files according to previously set data.

I'll forward emails to Paperless-ngx or upload scanned documents. Periodically, I review the Paperless inbox, ensuring tags and metadata (like dates) are correct. Come tax time, I can easily access all the documents I need by searching for the relevant tags. Simple and efficient!

Documentation

Yes, I have a home wiki! It's not extensive, but it's really handy. Using Codex-Docs as the base, I've created instructions for things at home that we rarely do but need to remember how they work. It also includes a cheat sheet for less frequently used commands and tasks, as well as a step-by-step guide for setting up Debian (even though I know the steps by heart, it's convenient to have it on hand).

The diagram you saw earlier was created with Excalidraw, which is a new addition to my toolkit. I'll be using it more often when creating articles for you, my readers.

Handy tools for debugging

When you're tinkering with your homelab, quick access to logs is essential. Glances provides a speedy overview of your hardware's/VM's, processes, and system stats. To delve into the logs of all containers on the host, you can use Dozzle. Just remember, these are system logs, not service-specific logs written to disk files.

Ansible: Updating a Homelab with one command

Updating a homelab can be a breeze or a potential headache, especially when you're juggling numerous servers and containers. I've found Ansible to be my ultimate savior in this regard. With Ansible, I can orchestrate updates across all my servers, both physical and virtual, in a structured and efficient way. My playbook not only handles system updates but also knows how to deal with different applications running on each server. However, it's essential to exercise caution and ensure you have the time to oversee the process. Updates don't always go as planned, and I've had my share of hiccups, like dealing with broken Docker images. To mitigate risks, I keep core services on specific versions and manually update them only after creating a backup. I'll be sharing more about this process soon, so stay tuned for a detailed guide on how Ansible keeps my homelab up to date and hassle-free.

Article Wrap-Up: Stay Tuned for More!

I haven't covered all the background services I run, like PostgreSQL, MariaDB, or MySQL databases, Redis caches, ... These will be featured in more detailed articles about specific aspects of my homelab.

Thanks for sticking with me through this high-level overview of what I've built and what I plan to delve into in future articles. I hope you enjoyed reading it, found it insightful, and are excited to dive deeper with me in the upcoming articles.

If there are specific aspects of this homelab you're eager to learn more about, please let me know via a comment! Stay tuned, sign up to stay informed about new articles, and share with friends who might be interested in this topic too.

Farewell for now, fellow self-hosters. Keep the servers humming and the data flowing! Until next time, happy self-hosting.

Henning

Coming soon

Henning — Mon, 18 Sep 2023 20:07:00 GMT

Nerdy Articles will become my collection of things I have learned, mainly about self-hosting. For about 4 years, I have been on this journey, taught a few people all I know, and now want to be able to share this passion with you.

Soon, I will be starting to post the first articles. From a general overview of my home lab to detailed articles about aspects of my setup, which will guide you to achieve the same or something similar.

Stay tuned and subscribe!

Disclaimer

As stated above, I am also quite interested in AI, and the best way to learn new technologies is by using them. Therefore, I will be utilizing tools like text-to-image (i.e., Midjourney) and text-to-text (i.e., ChatGPT) generators.

These tools will just be my helping hand!

For example, the picture of this post has been generated with Leonardo.ai, and the text has been proofread by ChatGPT (as English is not my mother tongue).

Farewell for now, fellow self-hosters. Keep the servers humming and the data flowing! Until next time, happy self-hosting.

Henning

Nerdy Articles

Backup Strategy with Restic and Healthchecks.io

Introduction

3-2-1 the Backup Gold Standard

Overview of My Approach

Restic - Backups done right!

SSH Config for root (for SFTP Backups)

Setting up Restic

Initialising Restic Repository

Running the First Backup

Creating the Backup Script

Scheduling the Backup

Restoring using Restic

Comply with the Rules - Another Backup Location

Regular Manual Spotchecks are Mandatory

Healthchecks.io - Monitoring our Backup Jobs

Healthchecks.io Docker Setup

Setup a Check in Healthchecks.io

Backup Script with Healthchecks.io Integration

Conclusion

TrueNAS Scale Backups with Restic

Setting up Restic on TrueNAS Scale

Scheduling the Backup

Conclusion

Proxmox Backups with Healthchecks.io

Hook Script

Implementing the Hook Script

Final Conclusion

Docker 101

Introduction

Docker Demystified: Understanding Containerization vs. Virtualization

Virtualization

Containerization

Pros & Cons of Docker for self-hosting

Pros

Cons

Conculsion

Docker basics

Images

Networking

Persistent storage

Setup Docker & Docker Compose

Installing Docker

Setting up Docker Compose

Creating our Docker Environment

File Creation

.env file

Networks in Docker Compose

Creating our Services

Docker / Docker Compose Commands

Conclusion

Debian Server Essentials: Setup, Configure, and Hardening Your System

Introduction

Why Debian Wins my Heart and Server

Embark on Your Debian Journey: A Step-by-Step Installation Guide

Debian Installer

Debian Installer (RPi)

Debian Installation

First Steps to Server Bliss: Connecting and Initial Configuration

Connecting via SSH

Disable root

Change Passwords

Secure SSH

Disable IPv6

Setting up a Firewall

Enhanced Security: Safeguarding with Fail2ban

Adding Qemu Guest Agent

Closing Remarks

Change Log

A Clutter-Free Life: Going Paperless with Paperless-ngx

Introduction

Paperless: A Living Proof of Open Source Greatness

My Paperless Story

Efficiency Unleashed

From Paper to Pixels: Which Documents Belong in Your Digital Vault?

Tagging, Document Types, and Correspondents: My Organizational Secrets

Correspondents

Document Types

Tags

Who's Affected?