EricCloninger.com

Self-Hosting Vaultwarden on Proxmox with Tailscale and HTTPS

2026-03-01T00:00:00+00:00

Bringing my passwords to my local server, with secure mobile app support.

NOTE: This is yet another long post about a technical project. I haven’t found a write-up on this topic in a single location, so I’m going to document what I discovered and hopefully it will help the next person.

If you want to skip the boring background bits, click here.

The Boring Background Bits

I’ve been using Bitwarden for about 7 years. I am happy to pay
for their work and for using their infrastructure to keep my passwords secure. This year, there are two circumstances that had me looking into self-hosting a password server:

Bitwarden increased the annual cost of hosting by 20%
I’m currently unemployed and have time on my hands

The cost of the hosting wasn’t bad. It increased from $40 to $48 per year for a family plan that my wife and I share. They hadn’t increased the price in those previous 7 years, so it’s not that they’re being jerks to do it this year. They’ve earned a raise. But, $48 will buy a couple cases of beer, so I decided to see if I could make this happen. (Hint: I did).

Proxmox

My Proxmox homelab server has been running great for more than 2 years. In the time since I started working on Proxmox, I’ve installed and tried out a bunch of different container projects. Some I continue to use and others I’ve deleted after deciding they didn’t suit my needs. A great help in this has been the Proxmox Community Scripts project, which started out as one person’s attempt to make a stable and repeatable installation process and has since grown exponentially. (RIP, TTeck)

Along the way, I have learned how to use a reverse proxy to make some of these services available on the open Internet. I have a custom domain and I use subdomains to route traffic to the different services running on Proxmox.

The reverse proxy I use is called Nginx Proxy Manager. There are several other choices, such as Caddy and Traefik. I tried both of those, but found the Nginx proxy worked great and had the least amount of configuration hassles.

Nginx Proxy Manager (unfortunately abbreviated as NPM, but fortunately having nothing to do with Node.js), coupled with CloudFlare DNS, allows me to create a subdomain and handle traffic to and from that domain. The admin panel in NPM allows me to request and install a LetsEncrypt SSL certificate so that communication between whatever device I’m using and the server is as secure as can be. I don’t expose all the services on Proxmox to the open Internet, just those where I need to be able to access the information using a mobile app over secure connections.

Having said that, for this project, I’m NOT going to use Nginx and CloudFlare. I explicitly do NOT want to take a chance that a stray CVE or missed setting will give someone my passwords. But, I want the type of convenience they provide in having HTTPS and DNS.

That’s where Tailscale comes in.

Tailscale

Tailscale is a personal VPN that implements the WireGuard protocol. The folks that built Tailscale deserve a few beers for implementing something that “just works”. With Tailscale installed on a server, workstation, phone, router, or NAS, you can access protected assets when you’re away from your local network without exposing them to the open Internet.

This is exactly what I want for my password server. I don’t want it facing down Internet randos on a daily basis. It needs to just sit there and run on my local network. When I need to sync with it, I will start the Tailscale client on my phone or laptop to connect to my home services. Otherwise, they remain out of reach to casual visitors and hackers.

VaultWarden on Proxmox

As I mentioned earlier, I’ve been happy with Bitwarden from a user perspective. Of all the password managers, it has had the fewest reported intrusions or CVEs logged against it. I feel comfortable putting my passwords in there, but I’m aware that not every service is guaranteed to remain secure.

VaultWarden is an open-source alternative to the Bitwarden back-end server. The project exists with the blessing of the Bitwarden team. There is even a Bitwarden employee working on VaultWarden to ensure the servers are compatible.

A Proxmox Community Script installs VaultWarden as an unpriviliged Linux Container (LXC). The installer builds from sources, so it may take a little longer than other Proxmox projects or Docker instances that you have used. For my N100 mini-PC setup with 32GB of RAM, it took around 20 minutes.

Get On With It, Already

At this point, I’m going to assume that you have the VaultWarden project installed and running as an LXC on your Proxmox server. For the sake of this tutorial, I’m assuming the local network address of that instance is 192.168.1.10.

When you install VaultWarden with the default settings, the directory on your Linux server is /opt/vaultwarden. Within that directory is a config file called .env where you’ll have to enter some values. Use your favorite text editor to make changes, which is almost certainly vi because you’re too smart to use emacs.

Create a Tailscale account and install the clients on the devices that you need to access. This will take some time, but won’t be difficult if everything you’re using falls into the broad category of “common computing platforms”. I won’t explain how to configure Tailscale in this tutorial, because there are dozens of tutorials on YouTube and elsewhere that do just that.

In the end, you’ll have a pseudo-domain on the Tailscale network, such as “correct-horse”, which I will use as an example. Each device on your tailnet will have a unique subdomain under this pseudo-domain. So, your phone might be pixel9a.correct-horse.ts.net and your vaultwarden instance might be vw.correct-horse.ts.net.

Admin Access

By default, the VaultWarden admin token is empty in the configuration to force the administator to go through these steps to set it up. For these steps, you’ll use the Proxmox console for your VaultWarden instance.

This wiki page describes how to set up a token. You’ll want a passphrase like you do with your SSH key, so think of something like “correct horse battery staple”. Feed that phrase into the key encoder using this command.

/opt/vaultwarden/bin/vaultwarden hash

Copy the line beginning with ADMIN_TOKEN and paste it into /opt/vaultwarden/.env. Restart vaultwarden with

servicectl restart vaultwarden

If anything goes wrong in restarting the server, check the output of the service with

journalctl -u vaultwarden

Go to the bottom of the output to find the most recent error messages.

Now, access your instance from a web browser and you’ll be presented with a login form, where you will enter your passphrase of correct horse battery staple.

http://192.168.1.10/admin

Your browser will likely complain that the site is not secure. For now, you’ll have to bypass this warning. Each browser has a different process to get around this warning, which is there for perfectly valid reasons.

We’ll come back to the admin panel later. For now, just remember your passphrase.

SSL with LetsEncrypt

The VaultWarden installer generates a self-signed certificate and key when it runs. These are good enough to experiment with, but your browser is going to warn you that they aren’t generated by a recognized Certificate Authority (CA). You’ll need to generate a recognizable certificate from a known authority before you trust this setup with your passwords.

If you’ve worked with servers, you’re no doubt familiar with LetsEncrypt, so I won’t bother going into detail about it other than to say it’s a great service and I’m happy to not have to pay a corporation to generate a file of seemingly random digits. Tailscale will request LetsEncrypt certs for any device connected on a tailnet, with some caveats.

Using the Proxmox console for VaultWarden, type

tailscale cert --cert-file /opt/vaultwarden/vw.correct-horse.ts.net.crt --key-file /opt/vaultwarden/vw.correct-horse.ts.net.key vw.correct-horse.ts.net

Once this is complete, which may take 15-30 seconds, you’ll have the certificate and key files in the /opt/vaultwarden directory.

Open /opt/vaultwarden/.env and edit the line that begins with ROCKET_TLS= to point to these 2 new files.

ROCKET_TLS=’{certs=”/opt/vaultwarden/vw.correct-horse.ts.net.crt”,key=”/opt/vaultwarden/vw.correct-horse.ts.net.key”}’

Next, note the line that begins with ROCKET_PORT=. You’ll want to set this to 443, as that’s the port that secure traffic is routed to. You don’t need to set the ROCKET_ADDRESS value, although you can. Save the file and exit the editor.

Restart VaultWarden with

servicectl restart vaultwarden

If everything works, you can now go to your VaultWarden instance on a machine connected to Tailscale with

https://vw.correct-horse.ts.net/admin

Admin Panel Change

From the admin panel, you’ll enter your passphrase and now be able to start administering users. You may notice a warning that your domain name doesn’t match that value from when you installed VaultWarden. Go into General settings and change the Domain URL to your tailnet subdomain name (e.g. https://vw.correct-horse.ts.net)

The rest of the admin panel settings and users are for you to decide on your own. At this point, you have a secure connection from your desktop browser to the server and you can start importing passwords from your other services if you choose.

New Accounts

The one item you should consider is if you want users that find your site to be able to create new accounts. For me, that is absolutely not the case. I’m only creating accounts for my wife and I. You can control this behavior in 2 ways.

Add SIGNUPS_ALLOWED=false to /opt/vaultwarden/.env
Uncheck the Allow new signups box in General Settings

In either case, the web UI will continue to show the new user form, but the create button will not be enabled and the back-end will not accept any attempts to create a new account.

Set Up Mobile Devices

The VaultWarden service is set up to be a drop-in replacement for the Bitwarden back-end.

NOTE: I tried to take screenshot of the screens I’m describing and the app (rightly) prevented me from doing so. I didn’t want to take a photograph, so you’ll just have to make do with the descriptions I provide.

Install the official Bitwarden app on your mobile device or PC. By default, these apps are configured to use the bitwarden.com back-end. There is a drop-down menu below the login credentials where you can choose to change from bitwarden.com (or bitwarden.eu) to self-hosted. Choose self-hosted from the list.

The self-hosted server screen is a form with many fields. The only one that matters for now is the Server URL at the top. For this field, enter https://vw.correct-horse.ts.net and press the Save button.

Once you are able to log in, find the Sync option to connect to your server. After a few seconds, you should have passwords on your mobile device. 🎉

Wait 90 Days

One detail that I deliberately skipped over earlier is the lifetime of the SSL certificate and key that gives us HTTPS access. LetsEncrypt certs expire after 90 days and must be renewed.

Because Tailscale doesn’t know where or how we’re using the certificate they requested, they cannot issue renewals without us initiating the request. It’s up to us to issue the tailscale cert command again before the old one expires and then move or rename the files as necessary. This is described in more detail in the Tailscale documentation.

I haven’t needed to renew my certificate yet, so I can’t say for a fact this will work. I tried to renew the cert 1 day after issuing it, but it said the cert was not yet expired. I need to figure out how often I need to request renewal. For now, this is how I have configured my certificate renewal.

Attempt the job at 2:00 AM on the first of each month using cron. I added an entry in a new file called /etc/cron.d/renew-tailscale-cert with this information:

0 2 1 * * root tailscale cert --cert-file /opt/vaultwarden/vw.correct-horse.ts.net.crt --key-file /opt/vaultwarden/vw.correct-horse.ts.net.key vw.correct-horse.ts.net

U-Turns

There were a couple of choices I nearly made that sent me down paths that I probably would’ve regretted later. Fortunately, I caught myself before going too far down these paths.

Making the service available on the open Internet. Actually, this was never a goal, but I experimented by putting vaultwarden.my-domain.com on CloudFlare with a empty server that returned a 500 status code. This server got a lot of traffic from looking at the logs. I suspect miscreants watch for new DNS entries and pounce as quickly as they can when they find one with an interesting name.
Trying to install the tun kernel module in my LXC. It turns out that an unpriviliged LXC doesn’t have this module by default because it needs to run as root. I had to enable it by shutting the server down and adding the following line at the bottom of the LXC config file using the Proxmox console (e.g. /etc/pve/lxc/123.conf). After saving and starting the container, the Tailscale service would start and I could connect to the control server with tailscale up.

lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file

NOTE: As a followup to this, it turns out this is an option that is off by default when setting up an LXC project from the community scripts. If you choose to select this option at install time, the script will do this for you.
Installing the LetsEncrypt certbot tool on my server. Normally, this is how you would get a certificate renewed, but I couldn’t figure out which webserver was being used by VaultWarden. It turns out to be neither Apache nor Nginx, but its own server framework called Rocket. While trying to decide how to handle this, I realized that Tailscale does the request for users with better results than I could do myself.
Thinking I needed certs in PEM format. When the VaultWarden server would fail, it was pointing to the cert files in the error logs. I thought it was because the certs were in .crt format and the examples showed PEM files. I tracked down how to convert the files with openSSL, but it turns out that I had mis-typed the file name in the .env file and just needed to correct the file names. 😳

Wrap It Up

I have all this working on my devices and haven’t had any problems in about a week of use. I was able to reconfigure my devices and my wifes’ in a few minutes each. My wife doesn’t understand what Tailscale is for, but trusts that I know what I’m doing.

The process of installing and figuring out the configuration details took about 6 hours over the course of a few days. Hopefully, this post will help the next person around some of the potential missteps.

Once again, many thanks to the devs that create quality open-source projects. I hope that by documenting these projects and linking to their sites that I’m able to help shine some light on great software.

Talk to Me

I didn’t start out trying to write this as a tutorial, so I had to go back in my console and browser histories to see what I was doing and why. If, for any reason, you find something incorrect or I need to update the instructions, I’ll happily do so.

My GitHub profile is linked in the sidebar. I’ll absolutely take PRs and merge them, but you can also just send an email to the gmail account with the same user name as my GitHub profile. Similarly, with Discord.

My Music Hosting Project

2026-02-06T00:00:00+00:00

With some time on my hands, I delved into organizing my digital music library for online streaming.

NOTE: This is another long post about a technical project.

The Boring Background Bits

Like many software people, I’m a bit obsessive about categorizing and organizing things. While my office is cluttered, my photo and music collections are not. I build hierarchies with detailed metadata about the content of the files. Photos are tagged with information about where the photos were taken, the names of the people in them, and the subject matter, if applicable.

For Christmas, I made photo books for my children with pictures of them from their childhood with their mother and I. I was able to do this over the course of a few evenings by querying a database built from all that metadata using a tool called Digikam to sort and select the images. The books ended up with about 2000 photos for each of them, grouped by year and sometimes highlighting special events like visiting Disneyworld or the 3-week trip to Europe we took in 2015. My kids loved the photos and the effort wasn’t excessive because of the attention I paid to maintaining accurate metadata through the years.

For my digital music collection, I have always had a fairly consistent strategy so that I could find and play the songs I wanted on an iPod or my Android phone. The CDs were ripped to MP3 with as high a bit-rate as possible at the time. Through the years, I’ve added the relevant metadata using a variety of tools that have been built for the purpose, but not to the extent that modern “data hoarders” do. My collection grew organically as I added new physical media. Years ago, I removed the disks and their liner notes to sleeve cases and put the plastic jewel cases in the recycling bin. I revisit the physical media occasionally, when new technology comes along that lets me get better quality playback.

My Own Private Spotify

Which brings me to my next project. I want to replace the audio streaming service that charges a monthly fee with something on my own hardware with the content that I paid for decades ago. While I won’t have immediate access to nearly every song recorded, I don’t have to worry that my subscription is subsidizing hateful and harmful podcasts on that streaming service. I’ll just build my own Spotify, without Joe Rogan.

You Can Take My Server When You Pry My Cold, Dead Fingers From the Keyboard

A few years ago, I built what is commonly referred to as a “homelab” server. This is a dedicated piece of computer hardware that runs multiple versions of Linux simultaneously, each with a different purpose. These individual instances run without knowledge of each other using a very cool open-source software product called Proxmox.

One of these servers runs my home automation system, while another hosts my photo gallery. I have my own self-hosted versions of many popular cloud-based commercial tools. I spent several years learning and improving this system, to the point where I can have an idea of something I want to do and I can have a new service up and running on a secure network in under an hour. There is a great community of developers that take open-source server projects and wrap them into easily-installed packages hosted on Proxmox.

Sifting Through the Options

I trialed several self-hosted audio streaming projects, including Plex and Jellyfin. While I have a lifetime subscription to Plex, their audio playback setup isn’t ideal. The PlexAmp app is nice, but the real hurdle was getting my playlists into their system. I wanted something that worked with open standards and didn’t require me to convert into other formats. I settled on Navidrome, which uses the Subsonic streaming protocol, allowing me to choose from dozens of mobile apps.

The web interface of Navidrome isn’t Beautiful, but it’s functional. It allows me to view my collection by a number of criteria. it allows me to use my existing playlists unaltered, and it lets me share my collection with my family. Once I had my music collection with playlists on the server, I quickly became aware of one notable problem for someone with an obsession for organizing…

My Metadata is Crap!

As of this writing, I have 30,831 MP3 files from my own ripped CDs and others from download sites. The quality of the metadata from downloaded songs is very inconsistent and I find it difficult to enjoy my music when a song I’m expecting to hear doesn’t play because the metadata is incorrect and the song is attributed to a different artist. So, I needed to do a massive cleanup operation on the downloads and some maintenance on the ones I had ripped to MP3 a decade earlier.

To get my metadata into order, I wrote some Python scripts to help. The metadata is in a format called ID3, which is not a recognized standard. Many different software tools have implemented ID3 their own way using their own conventions, so getting information on the right way to do things is a bit messy.

Fortunately, there is a project called Musicbrainz that is attempting to catalog every recording and they’ve made a very good start on things. They have a software tool called Picard that will attempt to match your music collection with the metadata they have in their catalog. In addition to setting values correctly, the software inserts special values called GUIDs (Globally Unique IDentifiers) into the metadata so that other tools/playback software can know more about the song. All for the ridiculous price of nothing.

Something I did to make this process much faster was to host the MusicBrainz Server on my own hardware. The MusicBrainz project provides an installable package that runs on Docker or raw hardware. I am experimenting with Docker on an old laptop running Ubuntu, so it drastically sped up the process of tagging music by querying a machine on my local network rather than the open Internet. I contributed a small sum to their project via GitHub after using the tool and server software for a few days.

After getting all those songs matched up, I wrote some Python code to remove the extra metadata tags that the download sites add that point back to their site. These fields provide no value and so I removed them. I also wrote some Python to “lint” the collection, looking for inconsistent dates and performer names. I used the Python Mutagen library to process and write the updated tags.

Can I Get in a Word or Two

As I was finishing the metadata cleanup, I came across someone else’s passion project to bring in lyrics for songs. This is a nice feature that both Spotify and Youtube Music have but, as with ID3, there’s no standard around it. There are some conventions that are widely adopted. This project has a simple app that runs on desktop and using the information from MusicBrainz to identify songs, will add the lyrics to a file if they can be found in their database.

As with MusicBrainz’ Picard, the LRCLIB project has a self-hosted server and database to speed up the process. I was able to add lyrics to my entire collection of 30,000+ songs in under an hour using LRCGET and my self-hosted server. As with MusicBrainz, I donated a small amount to the author of LRCLIB to show my appreciation for their efforts.

Play it Again, Sam

One of my problems has been that when I create a playlist and then move the songs to a new medium or server, the playlist itself is stale and no longer points to the correct directory. This isn’t a “me” problem, it’s universal when you have an external file to list the contents of the playlist. So, I decided to put the playlist membership into the song metadata itself. This solves the problem for me. Some songs belong to many playlists, so the way I implemented this was to re-use an existing field that was made for this purpose. I have Python code that will stuff multiple values into this field meant for a single value. Then, I have some code that will generate the playlist based on the current location of the song. It’s not clean, but it works for me.

I have the scripts I wrote on Github, but the code quality is low. I need to clean up the scripts to properly use classes and have some documentation before I share them publicly.

From My Server to My Ears

Like most everyone else, my phone is tethered to my body most of the time. I have a subset of my music collection on the phone and I can play it anytime and anywhere I want. I used software called Poweramp on my phone for many years. I think I originally paid for it in 2010 or about that time. It works really well.

However, Poweramp does not connect to my personal server, so I did some research on different software products. I tried one called Substreamer for a while and I’ll likely use it more, but it is not updated very often.

I trialed another product called Symfonium and eventually paid for it. However, it requires a bit of customization to get everything “just right”. There are a lot of people on forums discussing their configurations and the developer is involved, but they are rather prickly about any comments that aren’t praising their creation. I don’t need this person as a friend, but when I have a question, I don’t need their peevish comments that aren’t helpful. So, I continue to look.

My iPod touch doesn’t have much life left in it. I still have an original FireWire scroll-wheel iPod and I would love for that thing to make a comeback (with USB-C, of course). I bought a knock-off of the Sansa Clip on Amazon and it’s everyone you might expect from a $30 piece of crapware. I have a few audiobooks on it, but it’s mostly unusable as a music player.

Cans

I probably have an addiction to headphones. I can’t count the number that I actually have. On my desk, right this moment, I see 7 different things that I can listen to my music on. So, I’m far from unbiased.

I generally don’t like things in my ears. I tried Samsung’s ear bugs when I was working for them and I found them uncomfortable to wear and the sound quality was bad. I have a cheap workout neck-band headset with buds that go in my ears. I use these while doing yardwork, mostly to keep the sound of power equipment from damaging my ears even more. They work well enough for that use case, but it’s not exactly when I want to listen to music with high fidelity.

My favorites would have to be

Sennheiser HD 380 Pro wired headphones. The sound quality is exceptional and they are very comfortable for the long listening sessions. However, the coiled wire is a bit taut for my liking. I prefer a non-coiled option. I bought a Sennheiser wireless telephone headset a few years ago and was less impressed by it.
Plantronics Backbeat Pro wireless headphones. These big over-the-ear cans are comfy and they are smart enough to sense when I take them off and they will alert my music source to stop playing. They’re a bit outdated now and charge with micro-USB, so I’ve stopped taking them on trips because I prefer everything to charge with USB Type-C.
Anker SoundCore Q30. These are inexpensive noise-cancelling headphones by a company that is prevalent on Amazon. They have good battery life and decent noise cancelling. The controls are a bit difficult to operate, but at the price I paid, I don’t worry overly much about them getting lost in an airport or broken in my backpack.

Cones

As for music playback in an open enviroment, like when I’m home alone or we have friends over… that’s an entirely different matter and I’m still trying to come up with a good solution. I have a Soundcore Bluetooth boombox that is waterproof and does an adequate job. It’s a bit bassy, but the battery life is good and I don’t worry if I left it out in the rain.

I had a bunch of old-school analog equipment that I gave to a guy that fixes and sells it to the retro market. I’m looking for something modern to play around the home that isn’t tied to a commerical cloud service and hates their customers (ala Sonos).

Time will tell.

Technical Deep Dive of a Newspaper Archival Project

2025-12-16T00:00:00+00:00

I took on a project to help a local non-profit bring their data online.

NOTE: This is a very long post about a technical project that I started back in September. I tried to keep the narrative as linear as possible, but the truth is that it was a series of going down paths and returning back to the main task.

I included information about research paths that didn’t work out, but not in great detail because I didn’t keep exhaustive notes. This wasn’t intended to be a Masters thesis, but it probably could’ve been.

In high school, I worked for our home-town newspaper. With the energy of a teenager, I ran about the office lifting boxes, composing headlines, developing film, printing photos, laying out pages, and stuffing mailers with each weeks’ issue. I parlayed my earnings into the purchase of a Pentax K1000 35mm camera and a 50mm lens, which I still own. One of my former colleagues and her husband moved to a nearby town and set up their own weekly newspaper. I helped them set up the darkroom and showed her husband how to mix and use the chemicals, but it was too far from college to work there.

Time moved on and eventually the local newspaper shut down as a lot of small town newspapers did. About the time the Internet arrived, the final owners decided they couldn’t keep it running. The newspaper operated by my former coworker and her husband continued up to her death in 2024.

A few months ago, my former colleagues husband and I talked about bringing those newspapers online and that was the beginning of this project.

The Data

The content for this project came from the Oklahoma Historical Society. They provided us with PDF files containing scanned images of the Hydro Review from July 1904 thru December 1947. Each PDF contained 2 or 3 years of issues, with roughly 1,200 pages in each PDF file. There are a few missing issues that were indicated by a single blank page with the words “MISSING FILES” in large letters.

Big Picture

What we wanted was a site that would allow easy browsing and searching of the content. We imagined that most people finding the site want to search for a relatives’ name or find a specific date. We wanted to make those user journeys as easy as possible, given several constraints…

1) We couldn’t spend much money on the project, relying on grants and our own contributions.

2) For handling of the data and converting files, I would need to rely on Free and Open-Source Software (FOSS) to the largest extent that I could. In situations where I couldn’t use FOSS, I wanted to use products that do not require a subscription.

3) I can’t be a system administrator. I can’t spend time fighting off people trying to get into the administration panel or inserting malicious code into pages, so however the site operates needs to be as low-maintenance as possible.

4) The project must be automated as possible. While the project won’t scale in the sense that I’ll be adding content years from now, there were thousands of pages to be processed. Fortunately, there are plenty of frameworks and CI/CD processes to make this simpler.

User Stories

I need to represent several user journeys through the site.

The front-first visitor that finds the site by typing the domain into their browser or linked from somewhere that lands them at the top. I need them to find something of interest and hopefully find their time well-spent. This is not a commercial site, so I’m not going to spend a lot of effort reading visitor logs, although I will activate Google Analytics, mostly to make sure people are finding what they need. I’ll look at high-level metrics like time on page and pages visited.
Someone coming in from a web search. Over time, I hope that people looking for their ancestors in Google Search will find the site. This person likely wants a specific family name and to find out what became of a relative. In reading the early editions, it seems that a lot of people died of things we can now easily prevent. A ruptured appendix or tonsilitis was a life-threatening illness in 1904. One man, thrown in the town’s jail for an evening, was found dead the next morning with a bottle of laudanum (liquid codeine) by his side. Those peoples’ names aren’t reflected in the current town’s populace, but they mattered to someone, somewhere.
Someone looking for a specific date. Historical dates such as statehood, the stock market crash, and the declaration of war. Those should be found with just a few clicks.
Someone coming from a social media link, primarily Facebook. Like it or not, Facebook is the primary platform for people sharing genealogy research and searching for family connections. We want the site to operate properly when linked from these platforms.

If someone is looking for information on a site, the best experience is to have information that has a close affinity to what is being read immediately available without a search or going to the front page and navigating.

One important aspect is to get the link hierarchy correct from the start. Once the pages are picked up by search engines and linked from social media, I don’t want to have to invalidate them or rely on redirections. For now,

Converting and Processing the Data

All of the work described here was done on a Windows PC with Python scripts running under Powershell. All of the image and file processing is written in platform-agnostic Python code. Some of the image processing steps used GUI tools on Windows, but the process of converting images and building the website can easily be done on Mac OS or Linux.

On a few occasions, I found issues with open-source tools or libraries and reported errors back to the maintainers, typically via Github issues. The original files were about 13GB of PDF pages. The first thing was to separate them all into individual pages and figure out how to map those into issues by date.

There are a set of tools to handle PDF files collectively called “poppler”. I used pdfseparate to save all the files using the original file name as a base and a sequential number for each separate page. This resulted in 13,224 individual PDF files, most of which were roughly 1MB in size.

There was a small glitch in the software that about every 500 pages, I would end up with a file that was only 1 page but was the size of the entire PDF archive. Fortunately, when I created a script to just convert these specific pages, they were saved with an expected file size. I didn’t try to understand the nature of this glitch or report an issue back to the project.

Creating an Index of Pages

Now that I had individual pages, I need to figure out how to get those 13,224 sequentially numbered files into file names that reflected their content. The first stab at this resulted in 3 1/2 years of issues which required an entire evening of copying and pasting in Excel. I tracked the year of the issue, volume number, edition numbers, month, day, and file name, as below:

Year	Volume	Edition	Month	Day	Page	File
1904	3	35	7	15	1	1904-07-15-1905-06-23-001.pdf
1904	3	35	7	15	2	1904-07-15-1905-06-23-002.pdf
1904	3	35	7	15	3	1904-07-15-1905-06-23-003.pdf
1904	3	35	7	15	4	1904-07-15-1905-06-23-004.pdf

The file names came from doing a directory listing and just pasting the output into Excel. The years and volumes were sequential and I was able to build formulae to automatically number them. Similarly with the month and days, I built formulae that would check the values of previous rows and highlight if certain conditions occurred, such as an odd number of pages, which would not likely exist in a newspaper with content on both sides of the page. The exception to this was the 30+ issues that were missing and contained a single “missing issue” page.

One thing that helped this process was using a desktop PC and 2 monitors. I created thumbails of the pages and opened them with Irfanview on the second monitor. This allowed me to quickly see how many pages each issue has. Most of the early issues had 6 pages, that eventually grew to 8 pages. However, there would be special issues that had a sales insert or perhaps legal notices from the county. So, I couldn’t rely on dozens of issues having the same number of pages and had to visually inspect them all.

Even though this was a manual process, I was able to build out the spreadsheet over the course of 2 evenings while watching baseball playoff games. Once I had the spreadsheet built, I wrote a Python script that read in the spreadsheet and renamed the files to indicate which issue they are in. In the end, I had a folder structure that looked like this…

+ -- 1904
|    + -- 1904-07-15
|    |    + -- HR-1904-07-15-01.pdf
|    |    + -- HR-1904-07-15-02.pdf
|    |    + -- HR-1904-07-15-03.pdf
|    |    + -- HR-1904-07-15-04.pdf
|    + -- 1904-07-22
|         + -- HR-1904-07-22-01.pdf
|         + -- HR-1904-07-22-02.pdf
|         + -- HR-1904-07-22-03.pdf
|         + -- HR-1904-07-22-04.pdf
...
+ -- 1905
|    + -- 1905-01-06

Converting to Raster Images

PDF is fine for documents, but it’s not a good storage choice for image data that needs to be manipulated. I converted the PDFs to JPEG in high quality for all further activities using the poppler tools.

pdftocairo -singlefile -jpeg -jpegopt quality=92,optimize=y

The pdftocairo tool had several odd behaviors that I had to work around. The biggest one was that it assumed the PDF input was multipage, so no matter what I gave for the output, it always put a “-1.jpg” as a suffix. I would rename the files with a script, but finally I read the docs in more detail and found the -singlefile parameter to suppress that behavior.

After converting to JPEG, I had another script to make smaller thumbnail images that could be shown on each issues’ post page. For this, I used the ImageMagick suite of image tools. ImageMagick is great because if something can be done to an image, there’s a tool or option to do it. However, in order to do the simplest of operations, you have to wade through a dozen pages of examples and StackOverflow questions, each not quite what you need.

magick -thumbnail '450x>' -gravity North -extent 450x450

This command creates an image that consists of the entire width of the image and as much of the page length so as to make a square to show in the gallery of printed pages at the bottom of each issues’ post.

OCR Investigations

This is the part of the project that is the most technically complex and the part that still doesn’t have a great solution. I have spoken to half a dozen people for whom this is their profession and they all admit it’s an area where they struggle. Given the budget and the lack of access to commercial scanning tools, we’re having to rely on open-source or free-tier tools.

I started by running the tesseract tool on the images to see what it would produce. Tesseract is the most commonly used OCR library with a rich support for C++ and Python. The initial results were not great because of several issues:

There are a lot of scanning artifacts that aren’t part of the printed page. This includes dust specks, folds in the pages, and overlapping pages
Inconsistent image layout due to these pages being scanned by many different people over many decades of archiving
The number and size of columns changed over the course of publication

Some times there would be a vertical border between columns and other times there would be scarcely 1/8” (3mm) between the columns, so that the OCR could not reliably determine where the boundaries of the content were. Advertisements were typically enclosed within a border, but the columns of text tended to run together, especially in the earlier years of publication.

I read articles about how to improve OCR results. The most consistent pieces of advice were:

Reduce the images to grayscale and reduce the number of shades of gray to 256
Remove the content of the pages from the black borders surrounding the pages
Properly align the images and remove any skew (distortion) introduced by the scanner not being perpendicular to the material (aka paralax)

The first issue was easily solved in bulk using Python scripts, requiring a few hours of processing time. The other two issues began a research project that could’ve easily turned into a Masters thesis, should I have wanted to do that again. The most promising avenue was to use OpenCV, a Computer Vision project that is widely used.

OpenCV

I hung out for several days on Slack, Discord, and Reddit groups devoted to the OpenCV project and its myriad of tools. I asked general questions and eventually more detailed questions about my specific use cases. Forums are great for gathering ideas, but everyone their has their own projects and none of them were similar to this one. Unfortunately, there were no turn-key solutions already built for newspapers. At least, none that the budget for this project would allow.

I spent several days experimenting with purpose-built tools to do object detection using YOLO (You Only Look Once). This had promise, but the inconsistent nature of the scans made it difficult to handle more than a few weeks worth of issues at a time. Each batch of issues required me to adjust my code and determine what markers to base the image extraction on. This was leading into a deeper investment in time that I had to give, and I ended up abandoning these tools.

One thing I noticed when looking at OpenCV is there is a vast amount of plagiarism across blogs and sites discussing the project. One particular method with Python code for finding contours and using the results to identify the borders of an image was reproduced dozens of times across sites. Some of these sites were selling their code as original products, yet they didn’t even change the names of their variables or the comments in the code. As much promise as CV has, it’s clear there are frauds presenting themselves as experts.

In the end, I resorted to a manual process. I built automations in Photoshop to roughly determine the edges of the pages and crop them. This was a labor-intensive process that took over a week of evenings and a cold, rainy weekend while watching football games. During this process, I identified approximately 175 pages that were duplicated scans and removed them from the archives.

Image Improvement

Now that I had nicely cropped images, I looked for ways to determine the rotation and skew of each page. Those old scans weren’t perfectly aligned to the scanner and could be off by several degrees. This has a huge impact on the quality of the OCR process, so I used what I learned from the OpenCV investigations to build Python code to determine how much each image needed to be rotated with ImageMagick.

As part of this, I realized the ImageMagick uses EXIF/JFIF image metadata rather than the image data itself when rotating images. I asked on a forum about how to correct this and I ended up in a long conversation with someone that is a maintainer of exiftool. They said they had great experience using a tool called ScanTailor to re-orient and deskew images of printed pages.

ScanTailor required a bit of a learning curve. Its’ history has seen it worked on and abandoned a few times, only to be taken up by someone needing additional functions it could provide. The UX is crude at best, but underneath, it has a great ability to operate on large numbers of images and do the right things. I spent a day running this tool on all 13,100 images and it did a wonderful job cleaning up the remaining borders, rotating the images properly, and removing any skew.

ScanTailor saves its output as TIFF images, which are 4x-5x larger than the JPEG images and uncompressed, which is fine for the purpose of this project. The PDFs from the historical society used compression, so there’s no way to restore that lost image quality, but these are newspaper scans and not high-fidelity photos. Either JPEG or TIFF are fine to continue on.

Back to OCR

At this point, with nicely cropped, rotated, and deskewed images, I started playing with tesseract and reading on how to improve the results. One particular value that I could manipulate is the ‘page segmentation mode’ which tells the tool to treat the page contents in different ways. I experimented with a number of different combinations of this value and none provided better results than --psm 3.

I set up a long-running Python script to run OCR against every image and store the results in a text file. This process ran for almost 14 hours on my 13th generation Intel i9 workstation. I gave the process as much of the CPU and RAM as it needed during this process.

Restricting Search

With a bulk of text to be indexed, I took a look at what the OCR process thought was on the page. Of course there were a lot of mistakes and words that didn’t make sense. I went back to the original goals for the project and determined that it’s best to think about what people would be searching for–names of people and places.

What constitutes a name? For places, I could rely on lists culled from Wikipedia. I extracted the name of countries, US States, city and county names in Oklahoma, cemeteries, post offices, and physical features, such as bodies of water, mountains, and valleys. All these went into a file that I would use later.

I found lists of surnames and given names that are common in the US. I primarily focused on surnames from Europe, Central America, and native tribes, although there are names from Asia and the Middle East in there as well. I want to give credit here to Rhett Butler for the work published at probablyhelpful.com/ for putting it into easily used tables. All data is derived from David L. Word, Charles D. Coleman, Robert Nunziata and Robert Kominski (2008). Demographic Aspects of Surnames from Census 2000. U.S. Census Bureau.

In addition, I added a file containing place names and family names from this region that I knew from memory.

All of these words were combined into an allow list containing over 150,000 entries.

Conversely, I have a block list. I struggled briefly with this from an ethical perspective, but it wasn’t a difficult decision to make. There are words that were used openly in the past that are now slurs. I did not redact the pages containing those words because they are part of the historical record, but I don’t want them being searched for on a site that I’ve created. There is a text file containing these words in the repo for the tools to create the site and that repo is not public.

I ran an analysis on the words that Tesseract found and realized that anything longer than 17 characters was not a name. Usually, it was a concatenation of other words or scanning errors. Words less than 3 characters would increase the search index size without providing any helpful searches for the user, so I removed those as well.

Building a Website

With the input data as clean as it can be, the next phase of the project was to build a website.

I ran a series of Python scripts on the input images to generate appropriate sized images for web viewing and to combine the individual page images into a single PDF for each issue. Toward the end of the project, I realized the images and PDFs only had basic EXIF information in them. I chose to add additional values, such as the GPS location and a link to the website. I also put in a title describing the image based on the publication date and the page numbers, where applicable. I asked the archivists that I met online whether they watermark or put other digital signatures on the images to prevent abuse and they say they did not, so I followed their lead.

As I mentioned earlier, I can’t spend evenings and weekends making sure people aren’t trying to attack the site. For that reason, I chose to forego traditional CMSes like WordPress, Joomla, and Drupal. Like my personal site(s), I want the attack surface to be as small as possible and the potential gain for the attackers to be zero. They might disable the site by flooding the domain with requests, but they won’t add spurious files and malware. AWS CloudFront should mitigate this risk. Bots can scrape the content, but honestly I’d prefer their operators just ask and I can provide them with a link to an archive.

All the links on the site are relative to the site itself. While a top-level domain exists, someone could take the site code and put it on their own server. In fact, that’s exactly what I do when I go to local groups to show it off–I take my laptop and don’t bother connecting to the Internet.

Technical Stack

I chose to use Jekyll for the publishing system. This is arguably the most well-known of the static site generators (SSGs) and Github itself provides support for it. There are hundreds of themes and tools made to work with Jekyll. I can host the site locally, on Github.io, as well as a cloud provider. There are CI/CD scripts to support Jekyll build and deploy models. I have experience running Jekyll sites on AWS and I like the fact that I can easily control my costs there.

The drawback (and benefit) of Jekyll and every other SSG is that everything about the site must be known in advance. There is a robust scripting language to build pages and link between them, but every new piece of content requires rebuilding the site. Given that The Hydro Review ceased publishing in the 1990’s, this doesn’t seem like that big of an impediment once it’s all running.

Another slight drawback is that there isn’t a lot of new development for Jekyll. What you can find today is probably all you will get in terms of turn-key tools and themes. If you need more than what is available, you’ll have to build it yourself. I wouldn’t build a new commercial product using Jekyll, but for this kind of personal interest/community project it works fine.

Jekyll Themes

There was a somewhat thriving market in themes when I first started using Jekyll in 2012. I initially searched for themes that were meant for newspaper or literary archives. I found one, but it had not been maintained or updated in over 5 years. I decided to go with a well-maintained, general-purpose theme with good support for desktop and mobile browsers.

The theme I’m using is called Minimal Mistakes, which is widely used and is still supported by the author. It looks good on desktop and mobile devices. For the initial phase of this project, I chose to use this theme, but I didn’t go to great lengths to customize it, hoping that eventually we’ll have funding for a full-featured site with managed hosting.

Jekyll supports Javascript on pages. For most of the site, I used only what is included in the theme. However, the page that displays the scanned JPEG images of pages has JS that I wrote to handle keyboard input and moving between pages, similar to what you might see on an image-sharing site like Flickr or Instagram.

Hosting

I used AWS because that’s where my skills lie, having launched at least a dozen projects there. The only recurring expenses should be storage and bandwidth, aside from the annual renewal of the domain through Route 53.

The code for the site is in a public Github repo. The repo has an automated CI/CD action to build and deploy the site to AWS whenever there is a check-in to the main branch. I chose to use a separate account for this project from my personal projects so that I could properly hand it over to the non-profit that is supporting the work. I gave my personal account the ability to submit code into the repo and manually initiate a build action, but in all other aspects I kept the accounts separated. It’s just good housekeeping.

On AWS, the site is stored in one S3 bucket while the image and PDF data are stored in another. While developing the site, the changes have nothing to do with the image data, which was static once processed. Having the images in with the website code increased build times considerably. I assigned a subdomain to the image data to keep the two buckets separated. I could’ve used configuration settings in CloudFront to make the image data bucket seem to be /content of the site, but I chose to keep them separated for now. Perhaps in a future phase, I’ll clean up that bit of technical debt.

CloudFront and Certificate Manager serve up the site with HTTPS without the cost of a recurring SSL certificate. I’m keeping an eye on the hosting costs through the AWS Billing Manager. While I have a robots.txt file to keep known crawlers out, I do want Google and others to index the site. AI tools are causing headaches for other projects because they don’t honor robots.txt, so I put a fairly low cap in AWS on hosting costs. If this cap is exceeded, the site will shut down until the end of the billing cycle. I’ll get an email from AWS with this information and we can re-assess how much the non-profit is willing to spend on hosting.

Search Indexes

Having an on-site search function is crucial to help users find the information about the people they are interested in finding. Google search isn’t sufficient for this task, although it will certainly be a component.

The Jekyll theme that I’m using has built-in integrations for a search engine called Lunr, a derivative of Apache Solr. This is a client-side only implementation that wasn’t going to scale well for our purpose. I tried it out on a local build of this project and it did not give good results.

The other integration that’s built into Minimal Mistakes uses the Algolia search engine. I’ve used Algolia enterprise search at a previous job and the results were good. I created a free tier account at Algolia in the name of this project and requested API keys. I’m going to pause for a bit and talk about how I architected the site navigation because it has bearing on how search is done. I’ll get back to this topic a bit later, but for now I’ll divulge that Algolia powers the search function on the site.

Jekyll is a publishing system, not a Content Management System (CMS). I’m not going to go into great detail on the differences in these because that’s not germane to the goal of this document. Jekyll, like a CMS, allows us to create a interlinked site with nice visuals by following its requirements around file naming and placement. The task of organizing the content is on the site creator.

Issue Pages

Each issue of the newspaper has a page devoted to its contents. At the top is a thumbnail image of the masthead that links to a PDF containing the full issue. This PDF has OCR ran on it using a tool called OCRmyPDF, which also uses the Tesseract engine. I ran this tool on all the generated PDFs after they were built from the cropped and straightened images. When the link is clicked, the PDF opens in a new browser tab and a text search will work from within the browser. Again, this search is limited by the ability of the Tesseract engine to recognize text.

Beneath the PDF masthead for each issue are thumbnails of the individual pages as reduced JPEG images, scaled down to 2560px in height at 70% quality by the build scripts. These images do not have OCR text embedded in them because that is not supported by the JPEG standard. The viewer page for images supports basic keyboard navigation using a bit of Javascript that I wrote.

Each page has a hidden element, containing the filtered words from the scanned images. Someone wanting to find a specific family name using the search function can land on an issue page by virtue of these words being on the HTML page, albeit hidden from casual view. The intent is that the visitor will click on the PDF and use the search function on that file to find the specific location on the page where the name exists. This isn’t perfect, in fact, it’s barely rudimentary, but it costs nothing to implement beyond the CPU cycles to create the files.

Each issue page has “Next” and “Previous” buttons to move between issues.

I wanted several ways to navigate by browsing, so the left-hand sidebar provides a list of issues by year. Also, there is a full listing of all 2,200+ issues accessible from the top menu named ‘Issues’. The index of years shows how many issues exist for each year using the YAML metadata embedded in each issues’ post.

Blog Pages

Each issue exists as a separate blog entry, but the dates of the blog are the publication date of the issue itself. This allows us to write spotlight and howto articles with recent dates showing an interesting aspect of a particular issue. For example, the declaration of war in World War II.

Putting It All Together

Another script walks the list of images and PDF files and then generates an output folder that contains the Jekyll website pages. The website is in a separate directory from the image sources because the Jekyll theme and scaffolding for the website won’t work with files that aren’t the website itself. It’s easier and less error-prone to generate the website pages based on the content of the folders that the script finds. This script also generates OpenGraph metadata for each page, so social media links will include information about the page.

With the Jekyll site files generated, I run the Jekyll build process on my workstation. This takes about 1 minute, starting from a clean state. If I’m only updating a few files, it takes only a few seconds until I can review it with http://localhost:4000.

Once I’ve tested the local instance, I check in the changes to Github. Github notices that I’ve checked something into the main branch and uses that to build the site. Assuming the site builds with no errors, it is sent to the AWS hosting via Github Actions. The last step is for the Github Action to invalidate the site cache in AWS CloudFront. This ensures that anyone viewing the site will see the latest updates and not stale content. From the time that I submit the changes to Github until I can see the results in a live browser is usually about 5 minutes.

To make the search engine work, I run another build of the website locally using a different set of parameters to activate the Algolia search plugin. This plugin reviews all the blog posts and issue pages, looking for text that isn’t part of the page markup. Since I have the text for each page in the hidden

on each issue page, the Algolia plugin treats it as part of the text of the page. For the user, they won’t see the text unless they deliberately expand the toggle arrow underneath all the page thumbnails. If they do, they’ll see a bunch of words with a note indicating why they are there.

Re-running the Process

When working with this much data, there will always be a need to go back and re-process a subset of the data.

For example, while showing off the site to the board of the non-profit, one of the members noticed there was an issue with 5 pages. “Shouldn’t a newspaper always have an even number of pages?”. I replied that it is indeed odd, but there were issues that were poorly scanned and didn’t have the full set of pages.

After the meeting, I searched for issues with an odd number of pages. Sure enough, about 25 issues (out of 2,262) had duplicated pages. When I wrote the OCR and conversion scripts, I deliberately added an option that allows me to limit processing to a single issue, month, or year. So, for those 25 issues, I re-ran the scripts with those specific dates and then rebuilt the website contents. Rather than taking 14+ hours, it took about 45 minutes once I had identified the duplicates.

On the Web

With the site on the web, I spent time making sure that social media links to the site looked nice. The scripts I wrote generates appropriately-sized OpenGraph images for each issue. At first, the images didn’t work correctly and I had to rely on web tools to debug the process. I tried several and the least horrible was opengraph.xyz. The Facebook Sharing Debugger was useful here. One thing I had to do was allow the Facebook web scraper to access the site, via putting an explicit allow statement into robots.txt.

I have the site connected to Google Analytics, but I’m not seeing the types of results that I want to see there, so I need to do more investigating why page views aren’t triggering the analytics code on the pages. The non-profit doesn’t care about the analytic data, per se, they will see the results in people sharing on social media. However, I want to know that the site is working correctly and Google Analytics is one of the tools. I’m not sure if my specific location is not triggering because I run privacy tools on my workstation and phone or if no devices are triggering the analytics. I need to let the site operate a while and do more research over the holidays.

Lessons Learned

I’ll be clear that the results of this project are a mixed bag. It was a fun project and I learned better ways to write and run Python. I also learned about the OpenCV project and how it might be applied to other work I might want to do in the future. I also wrote a bunch of really bad Python code that I ended up throwing away. That’s part of the path to discovery, so it wasn’t wasted.

These are a completely non-ordered list of the things that stick in my mind about what I’ve learned.

I used git submodules for the first time in the website project. The Jekyll theme needs to be in the same folder with the web code, but it is tracked in its own repo because I want to continue to apply upstream changes. I would make changes to my fork of the theme and test them out on my workstation. When they were ready, I would check in the changes to the theme first, then check in changes to the website repo for everything to show up with the theme changes. That made updating the site much simpler.
The RegEx search/replace functions in VSCode are extremely useful when handling large data sets. In the past, I would record a macro in Notepad++ and run it repeatedly on a large set of files. The VSCode implementation is nice in that it highlights a preview of every replacement. You can even apply a RegEx replace across an entire project, which is great until it isn’t. Notepad++ is still great for its ability to copy/cut/paste in columns–I wish VSCode had this ability.
The initial scripts used subprocess.Popen() to call Exiftool and ImageMagick tools. When I would run those scripts on all 13,000+ images, one particular script would take 3 hours to complete. I learned that there is a wrapper class for ImageMagick called Wand and another for Exiftool that operate in the same process space as the calling tool. Further, if I instantiated the instance of those classes at launch rather than for each invokation, I gained even more time. After refactoring the code and using the wrapper classes, the process took about 20 minutes to process all those same files again. Clearly, external process calls are expensive in terms of CPU cycles. I similarly found a wrapper class for OCRmyPDF.
StackOverflow was more useful for Python code questions than AI summaries from Google. I recognize that this could be my own bias, so I continue to look at the summaries.
Similarly, the CoPilot code generator did not provide a working example the one time I tried to use it for something moderately complex. Copilot itself wants too much autonomy in VSCode and I eventually grew tired of its recommendations. To be clear, I absolutely rely on intelligent code highlighting, hints when filling out API calls, and static analysis in the editor. What I don’t want while trying to solve a problem is a window popping up and offering unprompted suggestions.
The things that I’ve mentioned in this post that made it into the final site are mentioned on the Technical Details page of the site. I am grateful for the work of those that went to the trouble to create and maintain those tools and I hope this document helps someone else that might be looking at a similar project. You are certainly welcome to contact me and I’ll share what knowledge I have.

Future Investigations

There is a project called Open ONI that is connected to the Library of Congress’ Chronicling America project. I’ve talked with the developers and archivists using this project and it has promise as a web platform. I’ve installed their code base under Docker on one of my spare machines, but it has a very steep learning curve.
The Oklahoma Historical Society has their Newspaper Gateway that is based on a project from the University of North Texas. I’m meeting with the OHS archivist in January 2026 to see if we might be able to get our issues into their portal.
The person I’ve been working with on this project is in possession of the archives (aka morgue books) for the newspaper. For those issues that the Oklahoma Historical Society doesn’t have on microfilm, we might consider building a photography rig to digitize the pages ourselves. This would be a pretty significant step, but there are homebrew rigs online and I’ve bookmarked them for future consideration.

Technical Debt

There’s always things that could be done better and features that aren’t fully implemented. The site is live and I’ll make small changes as requested by the non-profit, but until we get new issues from the Oklahoma Historical Society or start digitizing the archive books, I won’t be spending as much time on it. However, I am keeping these things in mind.

Naming of files - The processing scripts rely on the files being named a certain way, namely HR-YYYY-MM-DD-PP. This isn’t so bad, but it might be better to use a manifest to indicate what files should be processed.
Better exception handling - I have basic exception handling in the scripts, but I have run into exceptions because files didn’t exist where I expected them to and the scripts crashed.
Better search - The site search isn’t going to find “Bob Smith” as two words in order. It will find if Bob exists on the page and if Smith exists on the page. If both are there, it will return the result. But if Bob is on page 1 and Smith is on page 4, it’s still going to return a match. One change I could easily make is to keep every word in the allow list in the search index for each page in the order it is found, even if the word is already found on the page. This is probably one of the first things I will investigate.
Monolithic python scripts - Initially, I had 2 scripts that did a lot of work. I’ve refactored the processing so that I can restart specific parts as needed, but they still tend to be long sections of 100-200 lines of code without a lot of reuse. I can do better.
Python requirements - I pulled in a lot of libraries into the Python Virtual Environment for these projects. I need to isolate them and put them as a single include in the tools repo documentation.
Manual intervention - There are still times when I have to manually enter a command or move a file. This is better than it was, but always room for improvement.
Content and website the same domain/bucket - Right now, all the images and PDFs are in content.thehydroreview.com and in a separate S3 bucket. This isn’t a strong requirement, but I may move them in the future. However, the PDFs and images are publicly linked in the site, so I could break social media links if people have shared the PDF link rather than the page link.
Refine the dictionary - Update the allowed list to include more surnames specifically from this area, common diseases and their historical names, and other relevant names.

Final Thoughts

This project was a labor of love, obviously. I had a personal connection to the newspaper having worked there in the 1980s. The newspaper documented the lives of the early settlers, which included my own great-grandfather and his mother, who moved here in 1899. I never knew them, but I found mentions of them in these pages and it connects me to them.

Similarly, my wifes’ family moved here prior to statehood and those family names show up frequently in the pages of the newspaper as well. We’ve shared at least a dozen snapshots of articles mentioning their ancestors with her mother and siblings, which has prompted them to question if the stories they always heard happened the way they were described.

Finally, I was able to reconnect with someone I knew almost 40 years ago while studying for my Masters degree. His grandfather had an unfortunate connection to my town and his death was documented in the pages of the newspaper. Even though we haven’t spoken in at least 20 years, I still had his phone number. I shared this history with him and he was grateful. We had a good chat and plan to meet again in the future. So, all-in-all it’s been a good project.

Maps of Eric’s Travels

2025-12-12T00:00:00+00:00

Thanks to a friend, I found a new website to obsess over.

A friend put a map of his travels on a social media post with a link to the site that created it. I’ve always kept a set of pins in Google Maps as a KML file, but I like this better.

The site is called MapChart and you can make your own at that site. They have an Excel file that you use to keep track of your travels that you can upload into their site to produce the visuals. You can download as JPG, PNG, and SVG (vector) with lots of options.

US Travel

US Counties in RED are those that I’ve only flown through and haven’t actually done anything other than be in an airport. Counties in GREEN are those that I’ve lived in. YELLOW are counties that I’ve driven through and ORANGE are those that I’ve spent at least one night.

The one in Alaska was Anchorage in 1965 or 1966. We were on a military transport to join our father at Subic Bay, Philippines. Mom said I was such a perfect angel that the pilots took me with them while the plane refueled. I don’t remember any of it and I still plan to visit Alaska as an adult.

Similarly, I don’t remember living in Sunnyvale, California, but Dad was stationed at Moffett Field NAS. I’ve seen enough of the area, having worked twice within half a mile of the place I was born.

Click the image to view a zoomable vector image of the map.

You can see a lot of road trips, usually taking the kids on multi-week vacations. Before kids, Jackie and I explored a lot of Colorado and did a 2 week road trip in a van, north into Manitoba, heading in a roundabout way to a wedding in Minnesota. A month later, we headed for the West Coast. We listened to a lot of books on tape and took showers at state parks and truck stops. It was a lot of fun and probably the biggest factor in us buying an RV a few years later.

International Travel

As you can see, I’ve done a lot of travel in Europe and the Americas, but not so much in Asia other than work trips to South Korea and a trade show in Beijing. If you look closely, you’ll see the Philippines in GREEN as I mentioned earlier about living there when I was 2 years old. I was scheduled to fly to Manila to meet my new team in March 2020, but then COVID happened, so I’ll have to wait a bit longer.

Click the image to view a zoomable vector image of the map.

The one RED country is Japan. I spent about 4 hours in the United Club in Narita, waiting for a connecting flight. My son has visited Japan before I have and I’m only slightly jealous about that.

Travel to Amsterdam and Netherlands

2025-06-30T00:00:00+00:00

We took a 10-day trip to Amsterdam and the region and wanted to share some insights for other Americans.

I don’t usually write travel logs, but Amsterdam is one of the world’s most popular destinations and there is conflicting information online, so here’s my €0.02 on the experience. I won’t bore you with photos and tales of our exploits–we had a great time with minimal disruptions.

I thought I’d share some insights for other (North) Americans who travel there and may not be familiar with the lay of the land.

Don’t rent a car. You do not need a vehicle, even if you plan to leave the city. There isn’t a lot of parking for cars and driving seems miserable inside the city, constantly avoiding bicycles and pedestrians. If you want to get out of the city for a day somewhere that isn’t easily reached by the transit system, then go ahead and get a day rental, but you do not need or want a car in the city.
Take public transit. There are a myriad of options when you search for how to use public transit. There are 3-day passes, 7-day passes, and some requiring a European bank account. Ignore all of that.

All the public transportation systems in the greater Amsterdam area and other nearby cities use the same infrastructure, whether it’s busses, trams, subway, or the intercity trains. All use NFC/tap-to-pay and you don’t need a special card. Just your phone.

Before you leave the States, set up your phone with wallet software. Whether Apple Pay, Google Wallet, or Samsung Pay. After installing the app, register a credit card in the wallet, ideally one with no foreign transaction fees.

Once you’re in Amsterdam, tap your phone at the entrance to the Metro (subway) or as you’re getting on a bus or tram. You have to remember to tap when you enter and when you exit or you will pay a higher fare. Your phone needs to be unlocked before you hit the payment reader or you’ll get mildly rebuked by the 7 people in the queue behind you. Most of the Metro stations have turnstiles, so you can’t bypass them easily or legally.

This includes getting from Schiphol airport to Amsterdam Central (Centraal) station. When you leave the customs/baggage area, look for the intercity trains, which are 1 level below via an escalator or elevator. Before you go down, look for the NS pedestal near the escalator. It is probably yellow and blue and waist-high. Tap it with your phone and get on the next train to Central. They leave about every 15 minutes and take about 20 minutes to get to the Central Station with maybe 1 stop.

If you’re traveling by intercity trains, you may be at a station where there aren’t turnstiles. In that case, there will be a pedestal with a reader on the platform where the trains depart. Just remember to tap as you enter and exit.

Be aware that the public transit workers will occasionally strike and some lines may be disrupted. We had this happen and fortunately didn’t have anything that relied on us taking the intercity trains that day. We found things to do in the city and just waited for the next day to visit the countryside.

For public transit, even between big cities, the 9292 app is your friend. Tell it where you are and where you’re going and it will give you multiple options with the platform numbers you need to be on.
Get reservations for anything you really want to see. Pretty much all of the major attractions use a reservation system now and you can’t just walk in when you choose. This includes the Van Gogh Museum, Rijksmuseum, Anne Frank House, etc. As of 2025, the Anne Frank house has a 6-week prior signup restriction, so you should set an alarm on your phone for the specific day you want to visit and get the reservations that day if you want a selection of times.
Get reservations for supper. Especially in the nice months when the place is full of other tourists. You don’t need to reserve far in advance unless it’s a place that’s truly noteworthy. I booked 2 restaurants about 6 weeks out, but we did run into quite a few places that weren’t taking food orders past 8 PM or were booked up. So, if you want to eat past 8 PM, you should plan where you want to eat and get a seat. I found the Zenchef app is more common there than Opentable.
Leave the city. Get on a train and go somewhere. You don’t really need advance tickets, just tap your phone and go.
- There are windmills a 20 minute train ride from Central in Zaandijk/Zaanse Schans.
- Utrecht has a neat inner city and market days on Saturday are filled with people.
- Delft and Den Haag (The Hague) are an hour away.
- Alkmaar is a nice smaller city with a cheese market.
Get lost. Leave some free space your calendar. There are a lot of things we just happened across while walking around that we’re so glad we encountered. We watched groups walking in double-time to get to the next thing on their list that they missed the vibrant city all around them.
Build up your calves and thighs. You’re going to walk and you’re going to be walking up and down stairs. They have tried to adapt the city for disabilities to the best they can, but they’re not knocking down buildings simply because they don’t have an elevator. If you rent an Airbnb, there’s a good chance you’ll be upstairs. Also, not every train station has an escalator or elevator. Even if they do, they’re not guaranteed to be in working condition.
Leave your cash at home, mostly. I took €300 with me and rarely used cash. Even for a Coke with a street vendor. Everyone took tap-to-pay. You might want some €0.50, or €1 coins for restrooms. I came back with probably €100 and that was after spending in the Duty Free store at Schiphol.
Leave our tipping culture at home. Restaurant workers and bartenders make a living wage. They don’t need tips. In fact, if you’re presented with a credit card machine that wants a tip, just hit 0% with no guilt. I did encounter one server that tried to quickly tap the tip button on the machine and I voided the transaction and asked them to restart it.

HOWEVER, it’s OK to leave a €1 coin at your table if you enjoyed the service. I’ve seen this done in France, but I’ve been told by a Dutch citizen who read this blog that they’ve never left a tip, so it is neither expected nor customary.

Definitely visit and take in everything that a modern, mixed-culture city has to offer. You’ll see and experience things unlike anything at home. You might find things that interest and excite you, but you might see things that bewilder and annoy you. It’s all part of the joys of travel.

Where did Eric go?

2025-01-17T00:00:00+00:00

If you were redirected here, here’s how to find me.

I completely wiped my Twitter account down to nothing. Sorry about that for folks who actually linked to my account. It’s all gone now. Don’t blame me–blame the idiot who owns Twitter.

Where is Eric?

As far as social media is concerned, I guess I’m on Bluesky. I’m @ericcloninger.bsky.social. I’m also on Mastodon at https://mastodon.world/@ericc but I’m not really using that account.

Time will tell where I publicly post, if anywhere ever again.

Good luck out there. Peace.

Goodbye, Netlify

2024-03-01T00:00:00+00:00

In the tech world, the saying goes “If the service is free, you are the product”.

Or, as our parents’ generation might’ve said “TANSTAAFL” (There Ain’t No Such Thing As A Free Lunch).

A few years ago, I switched my hosting option for this site from Amazon AWS to a service called Netlify. I didn’t mind the few dollars per month it took to host the site on Amazon, but the process of getting my updates posted required a number of manual steps, that if done incorrectly, would render the site unreachable.

After screwing up a few times, I realized I needed to spend some time researching and implementing a modern, automated workflow. That’s when someone on the Jekyll forums mentioned Netlify. Netlify wraps up all the automation in a smooth package for the low, low price of nothing for non-profits and vanity projects.

Netlify made their money getting nerds like me to use it for our personal projects and then hoping we’d recommend it to our bosses when we need such a service. No harm in that, companies do it all the time with students and educators.

I was happy using Netlify and I really have nothing bad to say about the service as I used it.

However

This week a rather disturbing post came across Reddit. A free-tier user with a hobby site, like mine, was sent an invoice for $104,000 because of a traffic spike that sent hundreds of terrabytes of requests to his site. The user, who is a web developer, said the requests were all after a copy of a 20-year-old song by a Cantonese singer. Hmmm.

The developer contacted the company about the invoice and they offered to reduce the charge to $21,000 and after some pushback, they offered “only” $5,000. There is no argument that there was a traffic spike, although there is some speculation about why this happened at this time. The CEO came onto HackerNews and said they’ve cancelled the invoice and they do that occasionally when this specific problem arises. So, they’ve known this can happen, but did nothing to prevent it.

The problem is that there are currently no controls on Netlify to prevent these kinds of traffic spikes that can disable a website and lead to crazy invoices. Amazon AWS, on the other hand, has fine grained controls over such activity. The account I used when I previously had this site on AWS was configured to send me an email if my monthly bill ran over $30 and to stop all activity if it hit $40.

That was the driver behind me moving back to Amazon AWS. Now, to spend a few hours at the keyboard.

Getting There

Here I am on a Friday night, trying to remember all the steps to getting a static site running on AWS with SSL enabled. I’ve done this a dozen times, but it’s been a few years. When it was all done, I’d spent about 4 hours going through all these steps several times for several domains:

Transferring the domain ownership to the AWS account for my hosting for 3 domains. I could’ve left them with the existing domain hosting account, but I prefer to have it all in one place with the content.
Getting all the Route53 records pointing to the the same DNS servers as the zone records.
Requesting ACM certificates so that I could present my site with HTTPS. This took WAY too long for my liking, but fortunately I had other steps I could work on while I waited.
Creating S3 buckets with the right permissions for static website hosting without leaving them wide open.
Creating roles and users that can access the S3 bucket from Github Actions
Creating a CloudFront distribution to serve up the content from S3 worldwide with HTTPS.
Getting the Github Actions to push my updated website to AWS. This took about 45 minutes for this domain and 10 for the second. I decided not to publish the third domain just yet and I have a subdomain that I’m not sure I want to put online any longer.
Shutting down my Netlify account

If you’re interested, here’s a good summary of the Github Actions setup. There was one change I had to make in AWS that is mentioned in this StackOverflow comment regarding ACLs. In addition to that change, there is a setting in S3 buckets to re-enable ACL access to the bucket. I don’t know why AWS has all these settings with similar purposes, but all I can do is write it down here for the next person to discover. Good luck.

Welcome Back

If you’re reading this, it means that I got everything moved over to AWS and configured correctly. I’ll go back to paying $4/month knowing that I won’t ever get a $104,000 invoice.

My Holiday Downtime Project

2023-12-28T00:00:00+00:00

My employer shuts the entire company down over the holiday break, so I find myself with idle hands. Naturally, it’s time to break something.

Caution: There is no TL;DR on this one.

Our ISP has been running fiber in our neighborhood for about 6 months and they’re close to having things wired up. I have a pedestal in the alley behind the house and as an existing DSL (yes, you read that right) customer, I’m one of the first to get the upgrade. In anticipation, I’m doing some upgrades to my home network.

Back in Time

OK, first about the DSL. I had cable internet for about 18 years from the local cable TV company. In 2001, when we bought this house, the 20MB down/2 MB up connection was pretty amazing. There were always issues, but we learned to deal with them. Over time, that speed went up to 50/5 and then 100/10. Along that same period, the company changed from “Weatherford Cable” to “ClassicNet”. Then from “ClassicNet” to “CEBridge”. A few years later, it changed from “CEBridge” to “Suddenlink” and that’s when things really started to go off the rails.

When Suddenlink took over, I had monthly downtimes lasting hours. I could count on an hour-long outage on Monday at 11AM as the local employees brought down the network for maintenance. At least once a quarter, I had an outage of more than 1 day. A couple times, during ice and snow storms, the entire community were without internet for 4 or 5 days. This was beginning to have an impact on my ability to work. During those outages, I would tether my work laptop to my phone, but that’s really not a reliable or scalable solution.

In 2019, I started looking at my options. There are several fixed-wireless ISPs that operate in the area. I know the owner of one such provider and I asked him to see if I could switch to his service. Unfortuately, there was no place on my property where we could locate my antenna so it could see his antenna. There are quite a few trees in our neighborhood and these were blocking line-of-sight to his tower.

The local telephone company also provides internet services. I asked about their packages and they offered both fixed wireless and DSL. The DSL came over the twisted-pair copper lines and could provide up to 25MB down/5MB up. I know that seems insanely slow these days, but I talked with their network engineers and they said their DSL hadn’t gone down in over 3 years. I decided to roll the dice.

I kept my cable internet account for a few months while I set up the DSL and experimented with it. True to their word, I had no interruptions in service. In January 2020, I cancelled my cable internet.

Think about that. January 2020. What was happening about that time? People were saying stuff like Have you heard about that virus going on in China? Well, the cable company did 2 things at that point:

They sold themselves, once again, and became Optimum
Optimum shut down the local office, so trouble calls were now routed through a central office in Arkansas. I learned from friends and family that still used them that sometimes it took 3 weeks to get a problem fixed. Meanwhile, all those new telecommuters didn’t have internet.

Talk about dodging a bullet.

Back to Now

My ISP has a combo modem/router box. They do all the management on the box and they do not allow customers to modify settings. They made one small concession to me, which was to bridge one of the ethernet ports on the modem so that I could attach my own network to theirs. I bought a set of Google Home mesh routers and placed them around the house. I don’t like the Google router, to be honest, but it sucked less than the Netgear Orbi mesh that I was replacing.

The problem with the Google router is that it can only be updated with a mobile app. That app hides some of the crucial settings in strange places and sometimes it would just refuse to connect. And, in typical Google fashion, there was no customer support.

What I wanted to get was a modern router similar to the old workhorse Cisco/Linksys WRT54G that I had years ago with a custom Tomato or DD-WRT firmware. I could tweak a lot of settings to my liking and it always worked.

House network for personal devices with adblocking across entire network.
Separate network segment for IOT devices that can’t see house network. I’m not comfortable with the amount of telemetry some of these shitboxes send home and I’d like to throttle it without killing my home network. Would run on 2.4ghz because a lot of IOT stuff doesn’t work on 5ghz.
Have a lab segment for work.
A guest network that can be accessed up to 24 hours without me having to hand out a password, but isn’t so convenient that neighbors just leech off my network.

As it happens, my director at my new employer is formerly from Cisco. He filled my head with ideas on how to achieve my goals. Naturally, it involved spending a bit of money. Toward this goal, I’ve purchased:

(1) 4-port mini-PC with 1TB storage and 32GB RAM. You can get these on Amazon or Ali-Express (if you’re willing to wait for international shipping). I bought the barebones model and installed Corsair memory and NVME modules.
(1) TP Link 8-port managed gigabit switch that supports VLAN and Power-over-Ethernet (POE).
(2) TP Link EAP 610s with POE for indoor use
(1) TP Link EAP 610 (Outdoor sealed) with POE

Let’s Get it Started

I didn’t want to disrupt internet access over Christmas while our kids were home, so I started trying to work on this away from our regular internet infrastructure. My wife is was on holiday last week and this week and will be back to work next week while I’m still not back until January 8th. I wanted to make sure I had everything working in the garage before tearing out the existing connections. I told her that this is like trying to grow a tree from the limbs out to the leaves without having a trunk.

So, imagine me in the cold garage, sitting on a folding chair with a 3 foot (1 meter) piece of lumber for a worktable. I had a portable heater at my feet to hold off the chill as I fastened things together and configured software.

The day after Christmas, with the kids back at their own homes, I installed Proxmox VE on the mini-PC. Proxmox is a virtualization environment that will let me run a bunch of small services that normally would go on a Raspberry Pi. In the first VM, I installed OPNSense, which is an open-source firewall.

I had the wireless APs attached to the switch, which was in turn attached to the mini-PC in port 2. Port 1 (aka ETH0) will eventually go into the ISP box, where it will get a dynamic IP address via their DHCP server.

I was having a lot of problems understanding how to configure things like virtual adapters in Proxmox, as well as bridging them in a way that traffic could move about. I watched hours of YouTube videos from alleged experts in this field. I’m not sure if they’re experts in virtual environments or just really good at getting their videos to the top of searches for virtualization. Regardless, I did a lot of experimenting using a couple of older PCs that worked well in the confines of the garage. I thought I had everything working right and was ready for the moment to plug it into the ISP box.

The Point Where Everything Fell Apart

I asked my wife if she minded me turning off the internet for an hour or two and she said she didn’t mind, as she was reading a book I bought her for Christmas. So, off I went to disconnect the existing network and plug in my new network.

I’ll save you the suspense. Nothing worked. As in didn’t work and didn’t work badly.

I tweaked settings and nothing would send traffic out of my local network. After 2 hours of this, I was frustrated and I disconnected the new network. I made so many changes trying to get it to work that I couldn’t even connect to the Proxmox environment at that point. I decided the best thing to do was to wipe everything and start over, taking my learnings the last 2 days to maybe do things right. So, I factory reset the mini-PC and then set it down for the day.

When I plugged the old network back to the ISP box, nothing worked there, either. I powered everything down and restarted it all back in order, so that the ISP box would renew its lease to the ISP DHCP server. That didn’t work, either. I factory reset the Google router and the mesh nodes as well. Nothing would work.

At this point, it was 5:30 PM and I was tired and angry. I was definitely ready for an adult beverage. I knew my ISP office was closed, but I didn’t know what else to do, so I called their technical support line. They have a 24-hour rollover to a remote management company, so I knew I could at least maybe get them to reset things from their side.

That’s when I heard the message.

We apologize for the inconvenience, but our network connection is not working. Please be patient while we fix the problem.

That’s right. For the first time in 4 years, my ISP had an outage. And it just so happened on the very afternoon that I was making big changes to my network.

As the kids these days say, FML. The ISP network came back on about 9 PM and I finished reseting my Google router. Which, it turns out is another aggravating thing about the Google setup. If you don’t have an upstream connection, you can’t configure their router with their app. I suppose it’s made for dummies, not anyone wanting to do anything remotely technical.

Let’s Try This Again

On Wednesday, I reinstalled all the software on the mini-PC and made sure the APs could communicate. I left everything alone until Thursday, when my wife was leaving for the day to go see her best friend.

On Thursday, I made sure everything still worked correctly when not on the main network. With that done, I repeated the process from 2 days prior. I removed the Google router from the ISP box and unplugged it from power. I plugged in the mini-PC and tested the connection. I had a few minor problems that I needed to update to get the firewall and router to work correctly. It’s amazing how things work when they actually work.

At this point, I disassembled everything and started running ethernet cabling to 2 of the places where the indoor APs will go. This involved digging in the attic, which has fiberglass insulation. Yay. Once I had the cabling ran to those points and the mounting brackets installed, I attached the APs to the ethernet and… Voila! I had 5GH Wifi-6 served throughout the house.

With everything working, I backed up the server and stored the backup image on my local PC. Then, I installed the AdGuardHome ad-blocker on the OPNSense firewall. In doing so, I didn’t pay attention to the installation options close enough. I ended up overwriting the port ID of the main administration tool with the AdGuardHome tool. Fortunately, this was easily fixed with a couple of edits. I spent about 30 minutes digging through directories with the command line to find the configuration setting that controls which port does what.

So now, I have ad-blocking on the entire house network. Woot.

Loose Ends

There’s still work to do, but I’m going to take a few days off until I start mucking about with things.

At this point, I have 2 SSIDs serving the house. One is for IOT traffic (Roomba, TVs, thermostats, etc.) and the other is for our personal devices. These are not on a VLAN, so both networks are getting the full firewall treatment. The IOT network is on 2.4GHz and the house network is on 5GHz. I will need to set up a VLAN on the firewall, switch, and APs to get these separated.
The outdoor AP is not installed where it needs to go. I’ll wait until we’re not getting precipitation and 30 MPH (50 kph) winds before I install the outdoor AP on top of the 30’ (10m) antenna mast. For now, it sits in the garage, serving up internet to our vehicles.
I have a lot to learn about routing in the days ahead. I have the simplest routing setup now, but I know I want to do more complicated things. But, routing is one of those things that if you screw it up, you have to start over from scratch. Best to learn more concepts before experimenting on a live network.
I want to install a second VM with some Linux distros that I want to try out. I want to play with NextCloud and a few other tools.
Eventually, I want to replace my dedicated Arlo wifi camera system with one that uses POE ethernet cameras. The Arlo is OK, but it’s very limited.
The big TODO and the typical technical debt, is that I only have this written down in a notebook. All the details of how it all works together is just a scribble on a sheet of paper. I need to write down all the procedures to make the network operate for the inevitable day when I’m on a work trip and my wife is dealing with a non-functioning network at home. Bah, kick that can a little further down the road!

Finally, the point of all this work is in preparation for fiber internet. I see the work trucks at the nearby phone offices and their trucks are still pulling fiber bundles along the streets. I’m hoping to get set up in the next few months.

Some site (and life) updates

2023-06-02T00:00:00+00:00

It’s been a while, so I guess I’ll get off my lazy butt and make some updates.

I wanted to add Google Analytics to this site as an experiment for some work projects. Getting GA4 going and configured where my Jekyll theme generates the correct content is taking a couple hours to get straight. So that’s what’s driving this update. I’ll likely make similar changes to My72MGB as it uses the same Jekyll template as this site.

I don’t post much personal information on here because I value my privacy and the privacy of those around me. The short update is

Life has had some ups and some downs. Mostly ups, so I don’t have a lot to complain about.
I started a new job–hopefully the last time I say this.

There you go.

Where on the Internet is Eric?

2022-11-29T00:00:00+00:00

With the recent chaos at Twitter, finding me on the Internet is a bit more difficult. So, here’s an update.

I’ve noticed a recent trend on various social platforms of people making updates about where they can be found, away from Twitter. I had a rather enviable early user name there (@ericc) that was frequently tagged by fans of a couple of other “Eric C’s” (Church and Clapton). Perhaps Twitter will come out of its spiral, but it’s unlikely, so here are some other ways to contact me, ranked in their likelihood to reach me.

Email :arrow_double_up:

The domain is pobox.com and the account is my first name plus the first letter of my last name (i.e. the first 5 letters of this website). I have aggressive spam settings on that domain set as high as they go with filters for domains based on country of the sending system. So, if you really want to email, send it as plain text and use a well-known sender, like gmail.com, outlook.com, etc. and don’t send it from any of the countries with a reputation for scams.

LinkedIn :arrow_up:

I have a LinkedIn profile and I will respond to invites if I know the requester. I check LinkedIn about once per week, so don’t expect an immediate response.

Phone :arrow_up:

If you have my cell number from the last 22+ years, that number still works. It starts with ‘4’ and ends with ‘9’. I have a Google Voice account and that number begins with ‘4’ and ends with ‘8’. I no longer have a landline, so if you had a number with ‘663’ in it, that number no longer works.

If you use Signal Messenger and know my cell number, I will connect.

Discord :arrow_up:

I’ve been on Discord for a few years, but I really don’t socialize there. It’s mostly out of need, but I think I will likely increase my presence there.

ehcloninger

Slack :neutral_face:

I use Slack frequently for work, but I haven’t got into finding communities with it. If you have a Slack workspace where you’d like a joker such as myself, send an invite.

Mastodon :neutral_face:

Like many other refugees from Twitter, I have created a Mastodon account, but I haven’t yet dug into the culture and behavior of the service to see if it suits me.

@[email protected]

Twitter :arrow_down:

RIP

Facebook :arrow_double_down:

I have a Facebook account for the simple reason that my first wife has a memorial page there. I log in perhaps once every few years. I have no content other than a few listings in Marketplace. I don’t accept new friend requests.

Whatsapp, Instagram :no_entry_sign:

I would love to use Whatsapp for the community and its near-standard status as a messaging platform, but I’m not using a Meta platform. Similarly, Instagram.

Telegram :no_entry_sign:

I don’t use services with proprietary cryptographic techniques regardless of where the service is located, so this is a non-starter.

EricCloninger.com

Self-Hosting Vaultwarden on Proxmox with Tailscale and HTTPS

The Boring Background Bits

Proxmox

Tailscale

VaultWarden on Proxmox

Get On With It, Already

Admin Access

SSL with LetsEncrypt

Admin Panel Change

New Accounts

Set Up Mobile Devices

Wait 90 Days

U-Turns

Wrap It Up

Talk to Me

My Music Hosting Project

The Boring Background Bits

My Own Private Spotify

You Can Take My Server When You Pry My Cold, Dead Fingers From the Keyboard

Sifting Through the Options

My Metadata is Crap!

Can I Get in a Word or Two

Play it Again, Sam

From My Server to My Ears

Cans

Cones

Technical Deep Dive of a Newspaper Archival Project

The Data

Big Picture

User Stories

Converting and Processing the Data

Creating an Index of Pages

Converting to Raster Images

OCR Investigations

OpenCV

Image Improvement

Back to OCR

Restricting Search

Building a Website

Technical Stack

Jekyll Themes

Hosting

Search Indexes

Project Layout, Navigation, and Data Architecture

Issue Pages

Blog Pages

Putting It All Together

Re-running the Process

On the Web

Lessons Learned

Future Investigations

Technical Debt

Final Thoughts

Maps of Eric’s Travels

US Travel

International Travel

Travel to Amsterdam and Netherlands

Where did Eric go?

Where is Eric?

Goodbye, Netlify

However

Getting There

Welcome Back

My Holiday Downtime Project

Back in Time

Back to Now

Let’s Get it Started

The Point Where Everything Fell Apart

Let’s Try This Again

Loose Ends

Some site (and life) updates

Where on the Internet is Eric?

Email :arrow_double_up:

LinkedIn :arrow_up:

Phone :arrow_up:

Discord :arrow_up:

Slack :neutral_face:

Mastodon :neutral_face:

Twitter :arrow_down: