dev.joynt.co.uk

GitHub Pages: Jekyll Archive Pages

2025-01-21T00:00:00+00:00

The popular jekyll-archives plugin isn’t available for GitHub Pages, yet is functionality that is frequently asked for: it’s a way to create listings of blog posts grouped by category, tag or year of posting. Luckily there is a way to duplicate this functionality with a little ingenuity, code and a GitHub Action.

Credit

Credit for this solution must go to Kannan Suresh, who published a walkthrough for this solution on his website, aneejian.com, and hosts example code on his GitHub Repo, jekyll-blog-archive-workflow.

Overview

High-level design

In order to make this work, a few things are needed in your Jekyll site, and in your GitHub Repo:

1. A data file (JSON) listing all categories, tags and years of blog posts

This will be used by a GitHub Action to loop through each category, tag and year associated with the site’s blog posts.

2. Layout templates to be used for each grouping (categories, tags, years)

Three layout templates will define the appearance of each listing of blog pages. In order words, these are templates for the pages that show, for example, a list of blog pages that match a given category, etc.

If you don’t need different formatting for these listings it would be possible to change this solution to have just one template, but other changes would be needed, especially in the Python script used by the GitHub Action.

3. A GitHub Action workflow

The GitHib Action includes steps that will generate new pages in your site for each category, tag and publication year of your posts, listing the blog posts relating to that metadata.

4. Create index pages for each category, tag and year

Not included in the solution on aneejian.com are pages that hold indexes of each category, tag or year by which the blogs are grouped, with links to each page of grouped content.

Implementation

It is assumed that you are reading/following the published solution on aneejian.com.

A rough elaboration of the solution, plus details of a few changes which were needed in my own implementation, are detailed here:

_archives folder

The solution suggests putting the archivedata.txt file in a folder named _archives. The name of this folder seems to be optional, but if you want to change it you’ll need to adjust some of the configuration that follows.

The contents of this folder are published as a Collection, defined in the _config.yml file:

collections:
  archives:
    output: true
    permalink: /archives/:path/

Thus, the archivedata.txt file is published under the /archives/archivedata/ path in your site, which is important as it is needed later on when the GitHub Action fires.

NOTE: Despite there (eventually) being other folders and files under this path, e.g. /_archives/years/2024.md, they aren’t published under the /archives/ path. This is because each page has it’s own permalink defined in its Front Matter, overriding the path that the Collection would have created.

Example Front Matter for `/_archives/years/2024.md`

---
title: 2024
year: "2024"
layout: archive-years
permalink: "year/2024"
---

_layouts folder

Eventually, a number of new files will be generated by the GitHub Action under the _archives folder, e.g. /_archives/years/2024.md. These files will have Front Matter data that defines the layout (i.e. template) to be used to render that page. See above for example Front Matter for such a page.

The published solution defines three layouts depending on whether posts are being grouped by category, tag or year of publication. These layouts are effectively hard-coded in the Python script used in the GitHub action so, unless you want to rewrite the script, it’s easiest to create the three layout files as suggested.

Example layout files

WARNING: The example layout files are themselves based on a default layout (template) for the site. In the example this template is named default but in my Jekyll Minima site the default template has been renamed as base, so the Front Matter for the layout files needed to be changed:

---
layout: base
---

Also, I just wanted a simple listing of the blog pages, with links to each, rather than the more advanced formatting used in the solution (which used an include), so I changed the content of my layout files accordingly.

GitHub Action

The GitHub Action (workflow) is triggered by changes made under the _posts folder, but it can be triggered manually from the Actions tab of your repo on GitHub. As noted above, it performs the following steps:

Checkout your Git repo.
Run the jekyll-blog-archive code:
- Spin up a docker container
- Run a Python script
- Create new pages for each category, tag and year JSON data file
Configure git (CLI)
Push the new pages back to your repo

Trigger path

The provided solution will run the workflow when changes are made within the _posts path, but this assumes that folder is in the root of the repo. In my case the Jekyll site was within a docs folder, so this needed to be fixed:

on:
  workflow_dispatch:
  push:
    paths:
      - "docs/_posts/**"

Python script

I made the required ones to specify the location of the archivedata file and the output path for the archive files. The following snippet of the add_archives.yml file shows these changes, as part of the step where the original repo is used to define the core of the Action:

  - name: Generate Jekyll Archives
    uses: kannansuresh/jekyll-blog-archive-workflow@master
    with:
      archive_url: "https://dev.joynt.co.uk/archives/archivedata"
      archive_folder_path: "docs/_archives"

From what I can tell, this looks for the action.yml file in that repo which then creates a docker image using a dockerfile from that same repo. The Docker image includes a copy of the Python script that creates the many output files for each category, tag and year.

/dist/_create-archive-files.py

Snags

When I first tried to run this action I noticed that I was getting errors saying that it failed to update my repo. This was caused by two problems. Firstly the workflow was using master as the branch name (mine is main) and the second was restrictive permissions not allowing unauthenticated updates. So I made some more changes:

master vs. main branch

The original code used the master branch:

git push origin master

I needed to change this to main (the newer default name for the first branch of a new GitHub repo). However I needed to make further changes to this line; see below.

Adding a permissions section

I added the following, although I cannot be certain it was needed:

permissions:
  contents: write

Using a Personal Access Token for `git push`

My repo doesn’t allow pushes from just anyone, so I created a new Personal Access Token (PAT) and saved this as a secret in my repo. I changed the final line of the workflow to use this token in the git push command:

git push https://x-access-token:[email protected]/$.git HEAD:main || echo "No changes to push."

Conditional echo statements

It is worth noting, as they originally confused me, how the trailing echo statements worked, e.g.

primary_command || echo "some text"

If the primary_command fails, i.e. gives a non-zero exit value, then the command following the || (OR operator) runs.

Thus, if there were no changes that needed to be made to the repo then the logs for the Action would show this explicitly.

Index pages for categories and tags

It may be that some Jekyll themes automatically include templates that will create index pages for categories, but these pages weren’t created in my case. To fix this I created two new pages (I wasn’t so bothered about indexing by year):

/_pages/categories.md
/_pages/tags/md

These needed to list all the categories/tags in the site.posts list, and to provide links to the “archive” page for each category/tag. This would have been easy, except that these archive pages had names generated by the Python script.

Sanitized category and tag names

The script sanitizes the names of the files it creates, which are based on the names of the categories and tags included in the blog posts. Thus, the names of the pages these indexes need to link to cannot be taken immediately from the original category or tag names. The following Liquid code was used to update the names of the categories and tags to match the output of the script:

{% assign value_escaped = category[0] | replace: ' ', '-' | replace: '.', '-' %}
{% assign value_escaped = value_escaped | replace: '#', 'sharp' %}
{% assign value_escaped = value_escaped | downcase %}
{% assign value_escaped = value_escaped | replace_regex: '[^a-z0-9_-]', '-' %}

Example index page for Categories

---
layout: page
title: "Categories"
permalink: /categories/
---


{% assign sorted_categories = site.categories | sort %}
{% for category in sorted_categories %}
{% assign value_escaped = category[0] | replace: ' ', '-' | replace: '.', '-' %}
{% assign value_escaped = value_escaped | replace: '#', 'sharp' %}
{% assign value_escaped = value_escaped | downcase %}
{% assign value_escaped = value_escaped | replace_regex: '[^a-z0-9_-]', '-' %}
    
        {{ category[0] }} ({{ category[1].size }} posts)
    
{% endfor %}

DevOps: Deploying Syncthing

2025-01-20T00:00:00+00:00

Now that I have a Kubernetes environment up and running, it’s time to deploy my first app: Syncthing. This is a service that will allow me to keep my music library in sync between my PC (running iTunes) and my home server, from which I can stream music to my home music system.

Samba

Unrelated to the deployment of Syncthing, but relevant to the overall goal of the exercise, was the use of Samba. This was used to set up a share on the local network so that devices could stream music files from this server. Thus, the location of where Syncthing would save these files was important.

Docker

There is an official docker container available for Syncthing on DockerHub: syncthing/syncthing:latest.

I was able to get this up and running easily, directly in Docker, however I was sure that I wanted to orchestrate this container using Kubernetes, so I moved on quickly to Microk8s.

Microk8s

Network access

As expected, a NodePort was needed to expose the Syncthing UI to the local network.

Permissions

One of the challenges / worries that I had when considering how to run Syncthing with Microk8s was how it would access the host file system. I didn’t want it to just create it’s own folders for mounting storage within the Docker container. I knew there would be a lot of data copied to/from the server and it needed to be on a separate physical disk, mounted at /data/ on the host.

In order to get this working, I implemented the following combination:

Creating a local user (on the host) for running the service.
Granting that user (and it’s default group) access to the Syncthing data and config folders.
Instructing the Docker container to run using the UID/GID of that user.

# Under spec/template/spec/containers:
    env:
    - name: PUID
      value: "110" # local account UID
    - name: PGID
      value: "110" # local account GID
    securityContext:
      runAsUser: 110
      runAsGroup: 110

Finally, the paths on the local host were mapped into the container:

# Under spec/template/spec/containers:
    volumeMounts:
    - mountPath: /var/syncthing
      name: syncthing-config
    - mountPath: /data/syncthing
      name: syncthing-data
# Under spec/template/spec/
volumes:
  - name: syncthing-config
    hostPath:
      path: /data/syncthing/config
      type: Directory
  - name: syncthing-data
    hostPath:
      path: /data/syncthing
      type: Directory

It didn’t seem necessary to create a PersistentVolumeClaim for this one-off instance.

DevOps: Configuring Microk8s

2025-01-19T00:00:00+00:00

I’m using an Ubuntu 22.04 server to play around with Kubernetes. Since Ubuntu ships with Microk8s and docker installed as snaps, I ran into a few problems that I hadn’t seen before…

Snaps

Snaps are app packages for desktop, cloud and IoT that are easy to install, secure, cross‐platform and dependency‐free. Snaps are discoverable and installable from the Snap Store, the app store for Linux with an audience of millions.

A snap is a bundle of an app and its dependencies that works without modification across Linux distributions. –Canonical Snapcraft

Benefits

Snaps run on various Linux distributions.
A background service, snapd checks for updated versions of any installed snaps and updates them if available.

Restrictions

According to the documentation, and advice from AI ChatBots, services such as Microk8s and docker, when installed as snaps, are more restricted than when they are installed as normal packages (apt-get etc.).

Applications in a Snap run in a “container” [sandbox] with limited access to the host system. The Snap sandbox heavily relies on the AppArmor Linux Security Module from the upstream Linux kernel. –Wikipedia

In theory this means that services running as snaps cannot access certain paths on the host system, however in my own deployments I was able to mount local paths into Docker containers.

Docker

With Docker installed as a snap, it behaved pretty much as expected, at least for the simple testing I did.

microk8s

microk8s Documentation

microk8s kubectl

For those used to simply running kubectl commands, when using Microk8s it seems that these have to be “wrapped” by the microk8s command. For example:

microk8s kubectl get all -n kube-system

microk8s configuration files

Since Microk8s is a version of Kubernetes, designed for small systems, test deployments, etc. it is configured by applying YAML configuration files. To avoid losing these config files, I have saved them in a Microk8s folder, with sub-folders per namespace.

microk8s kubectl apply -f  -n

microk8s Dashboard

To help see what is going on within the Kubernetes deployment, the microk8s dashboard was enabled, however it wasn’t immediately visible on the local network. To achieve this I had to:

Add a NodePort for the dashboard
Create a ServiceAccount (user) for the dashboard
Create a Secret to enable persistence of the user’s token
Create a ClusterRole (role) with limited permissions for this user
Create a ClusterRoleBinding to link the user to the role

Note that one of the restrictions was that the NodePort port needed to be in the range of 30000 to 32767.

VSCode: Using LMStudio for code completion

2025-01-14T00:00:00+00:00

Now that I am able to run LLMs on my local network, I wanted to use LMStudio (in server mode) for code completion in VSCode, similar to how GitHub Copilot works.

VSCode supports additional functionality via extensions that can be added to the IDE via .vsix files downloaded from the VSCode Marketplace. The one I chose to experiment with first is:

continue.dev

continue.dev docs

Installation

If you have other extensions, such as GitHub Copilot, ensure they are disabled.
Download and install the extension direct from the VSCode Marketplace.

Interface layout

Once installed (and VSCode reloads its extensions) there will be a new icon in the Activity Bar. Clicking on this opens the Continue extension in the Primary Side Bar. The recommendation is to move this to the Secondary Side Bar panel on the right hand side.

If it is not already visible, open the Secondary Side Bar panel with Ctrl+Alt+B. You can then drag the Continue icon from the Activity Bar over to the Secondary Side Bar.

You should now see a prompt box with the text “Ask anything. ‘@’ to add context” and a few other icons.

Features

Continue.dev is a VSCode (and JetBrains) extension that provides the following functionality, similar to GitHub Copilot:

ChatBot interface, using workspace files as context.
Code completion as you type.
Editing selected text.

Getting started

Models

Out of the box, the extension is configured to use Anthropic’s Claude 3.5 Sonnet, for which you’ll need an API key and spending money. You can select (add) other models via a drop-down at the bottom of the prompt box and then choosing the + Add Chat Model option. A new panel will open, proving a list of LLM providers to choose from. If you have an OpenAI API key, you could enter that here and use GPT-4o, for example.

LM Studio

In my case I want to use a LLM running on my local network, at least to test how well this worked compared to GitHub Copilot. I already have LM Studio running as a server, so I chose LM Studio from the list of providers. I left the other options as default (“Install provider” was the LM Studio website and “Model” was set to auto-detect) and clicked Connect.

Doing this adds the following configuration for Continue in its config.json file:

    {
      "apiBase": "http://localhost:1234/v1/",
      "model": "AUTODETECT",
      "title": "Autodetect (1)",
      "provider": "lmstudio"
    }

If you’re not running LM Studio on the local machine, change localhost to be the IP address or hostname of the machine LM Studio is running on. Once you have done this you should now see a list of “auto-detected” models in the drop-down list as an alternative to Claude 3.5 Sonnet. The list is provided by LM Studio so if you want different models you need to download them there.

Chat

Once you have a model selected, you can use the prompt box to interrogate the LLM. You can use data provided by VSCode for context; just type @ and a list of options appears. For example, you can specify the currently active file and ask questions about the contents.

If you want to include a chunk of highlighted code in your message to the LLM, press CTRL+L to copy the highlighted code across.

See the documentation for more information about context selection.

Code completion

You may not want everything you are working on to be sent to the LLM for code completion suggestions. Click on Continue in the status bar (right-hand end) and you can enable/disable code completion.

Code completion options are set in the config.json file. In order to set up completion suggestions you need to add/update the following JSON in the config:

  "tabAutocompleteModel": {
    "apiBase": "http://localhost:1234/v1/",
    "title": "lmstudio",
    "provider": "lmstudio",
    "model": "stable-code-instruct-3b"
  },

Again, replace localhost with the server name or IP address of the machine running LM Studio if it isn’t running locally. You may see some errors in the chat bot panel if the extension attempts to retrieve completion suggestions from the LLM but isn’t able to connect.

Note also that the model name needs to be hard-coded in this chunk of JSON. The documentation suggests entering the full “path” of the model in LM Studio, but based on the 404 error message from the server it’s only necessary to use the short name for the model in LM Studio.

In-place editing

Editing code in-place in the active file tab works as described (see link above) but the quality of the suggestions will obviously depend on the model used.

Model choice

Not all models are suitable for all tasks. For example, when selecting stable-code-instruct-3b in the example above for code completion, the Continue extension shows the following warning:

Warning: stable-code-instruct-3b is not trained for tab-autocomplete, and will result in low-quality suggestions. See the docs to learn more about why:

I want better completions, should I use GPT-4?

Tab completion models

The FAQs and other documentation on docs.continue.dev explain that models need to be trained for fill-in-the-middle usage to work well for auto-completion, while general purpose “chat” LLMs are not so optimised.

The recommendation from Continue is to use Qwen2.5-Coder 1.5B.

Chat & Edit models

For normal chat and editing of code, more traditional models can be used. For example, the Llama 2 model might be useful. Choosing a model that is small enough to run on your hardware but yet sufficient to perform well is important.

Alternative IDEs

While looking for suitable tools, I came across a number of different IDEs that supported AI integration. There’s not much point listing them here as a web search for “IDE with AI integration” provides what is needed.

LMStudio: Running LLMs on my local network

2025-01-06T00:00:00+00:00

While I have been happy using ChatGPT for generative AI tasks, especially as a co-pilot for coding and IT knowledge support, this all happens through the web UIs provided by the LLM provider (OpenAI, Anthropic, etc.). Do I need to use these to get the benefits of generative AI?

Some projects that I have wanted to work on require access to APIs, which charge for access. While the prices aren’t high, especially for relatively low usage, it seemed unnecessary to pay anything when open-source models are available that can run on relatively modest hardware (NVidia RTX GPUs).

So I started looking into options for serving large language models (LLMs) on my local network, specifically, on a Windows 11 PC with an 8GB RTX 2080 GPU. As it happens, the first app I tried seemed to meet my needs so I didn’t look further!

LMStudio

LM Studio

LM Studio, on version 0.3.6 at the time of writing, is described on its own website as “a desktop app for developing and experimenting with LLMs on your computer”. There are installers for Windows, MacOS and Linux. Once installed you can:

Browse for models on HuggingFace.
Run models locally and interact with them via a ChatBot interface.
Serve models on the local network, providing an OpenAI compatible API.
Run in headless mode
And more, such as a CLI and tool use (currently in Beta).

Models

It doesn’t seem to matter what architecture you choose for the models you want to run (e.g. LLama, Phi, etc.). Perhaps obviously, you can only download open-source models; you can’t run versions of ChatGPT or Claude.

You need to choose models that will fit in the available VRAM (8GB in my case). There are options to offload some inference to the CPU and main RAM but I chose not to experiment with that.

API

Despite the model that you choose, the API that is offered by LM Studio running as a server is OpenAI-compatible.

Ollama

It is worth mentioning that the other LLM server that I have most frequent come across is Ollama. I may experiment with this in the future but for now, LM Studio is working fine.

vLLM

Another option is vLLM. Like Ollama this appears to be command-line only, and designed to be run as a server rather than a desktop app with a UI.

GitHub Pages: Full minima functionality enabled

2024-03-20T00:00:00+00:00

I have solved the problems I was having with the Jekyll configuration for my GitHub Pages. It seems that the version of the minima theme used by GitHub is an older release; it was necessary to use a remote-theme setting to pull down the latest version of the theme to support all the documented features.

The breakthrough was as simple as reading the _config.yml file used by the GitHub Pages for the jekyll/minima theme itself:

jekyll/minima/_config.yml

Configuration changes

All that was really needed was to comment out the theme setting and replace it with a remote-theme setting, like so:

# As of November 2023, GitHub Pages still uses Minima 2.5.1
# (https://pages.github.com/versions/).
# If you want to use the latest Minima version on GitHub Pages, use the
# following setting and add comment out the "theme: minima" line.
remote_theme: jekyll/minima
# theme: minima

Features enabled

This has ensured that the following features (and probably some others I haven’t found) now work:

Skins
Social media icons & links

ChatGPT: GPT development begins

2024-03-15T00:00:00+00:00

I’m experimenting with GPT development for ChatGPT. GPTs are specialized instances of ChatGPT, customized to complete tasks in a specific manner, or to utilize bespoke information sources that may not have been included in the OpenAI training data.

Initially the purpose of creating these GPTs was simply to explore the new feature(s) being released by OpenAI. Simple/trivial use-cases such as customised DALL-E image generation seemed to be more fun. Over time these GPTs have been extended slightly to explore how GPT Actions can interact with remote APIs such as the GitHub API and AWS Lambda functions.

For more information on these, please visit my gpt.joynt.co.uk site, which will be more up-to-date than this post here!

GitHub Pages: Getting started with GitHub Pages

2024-03-01T00:00:00+00:00

I’m starting to add GitHub pages to this “site”, i.e. a personal page in GitHub. This page is a short collection of notes about how I managed it, especially if there were any gotchas even when following the official documentation.

Official Documentation

It’s probably best to open and read these pages first, then come back to the notes below:

Setting up

You enable GitHub pages via the Settings for your repository, where there should be a Pages option in the menu at the left hand side.

Custom domain

In brief, GitHub allows users to host pages under the owner.github.io domain, where owner is your GitHub username. You can use your own custom domain if you want. If you want to use a custom domain name you need to set up a CNAME in your own domain that points to owner.github.io.

Document structure

I have chosen to put my documents in the /docs/ subfolder in my main branch, rather than use a separate gh-pages branch, at least for now. It seems a more intuitive to just use one branch from a personal GitHub Pages site.

Deployment

Setting up the site as above has automatically created a GitHub Action that will build the site. The success (or otherwise) of these deployments can be seen in the Actions section of the repo.

Jekyll

Formatting for GitHub pages is provided by Jekyll, which can transform Markdown (and other markups) into themed websites, with little more than a few lines of YAML at the top of each page (known as Front Matter) and/or in a /docs/_config.yml file.

Defaults

Simply adding the index.md file in the /docs/ folder seems to have created a simple site.
Using a custom DNS name created a file named CNAME in the /docs/ path.

Config YAML file

The documentation seems to suggest that a _config.yml file will be created automatically; this didn’t seem to happen so I have created one manually within the /docs/ folder.

This file defines the overall look and feel for the site, albeit with a number of options pre-defined by GitHub.

Themes

There is a list of supported themes which can be specified by adding theme: jekyll-theme-NAME in _config.yml:

title: dev.joynt.co.uk
description: Home
theme: minima

However it seems that, for some of these themes at least, the version of the theme provided by GitHub is older than the main release of the theme. I wanted to try the latest minima theme, so I had to comment out theme: minima and specify a remote-theme:

#theme: minima
remote_theme: jekyll/minima

Pages sub-folder

As I created more pages, the root folder of the site started filling up. I created a _pages subfolder and moved various markdown pages into that. Sadly, when the site rebuilt they weren’t present. The fix for this was to specify the folder in the _config.yml file:

include:
  - _pages

NOTE: For the files in the _pages folder to appear in the built site directly under the root path, e.g. https://dev.joynt.co.uk/blog/, it was necessary to add permalink: to the Front Matter (see below) on each page.

Front Matter

Each page in the Jekyll-themed site should have some lines of YAML at the very top, known as Front Matter.

Front Matter documentation

For example, this “blog post” page has the following:

---
layout: post
title: Learning about GitHub Pages
---

dev.joynt.co.uk

GitHub Pages: Jekyll Archive Pages

Credit

Overview

Archives

High-level design

1. A data file (JSON) listing all categories, tags and years of blog posts

2. Layout templates to be used for each grouping (categories, tags, years)

3. A GitHub Action workflow

4. Create index pages for each category, tag and year

Implementation

_archives folder

Example Front Matter for /_archives/years/2024.md

_layouts folder

GitHub Action

Trigger path

Python script

Snags

master vs. main branch

Adding a permissions section

Using a Personal Access Token for git push

Conditional echo statements

Index pages for categories and tags

Sanitized category and tag names

Example index page for Categories

DevOps: Deploying Syncthing

Samba

Docker

Microk8s

Network access

Permissions

DevOps: Configuring Microk8s

Snaps

Benefits

Restrictions

Docker

microk8s

microk8s kubectl

microk8s configuration files

microk8s Dashboard

VSCode: Using LMStudio for code completion

continue.dev

Installation

Interface layout

Features

Getting started

Models

LM Studio

Chat

Code completion

In-place editing

Model choice

Tab completion models

Chat & Edit models

Alternative IDEs

LMStudio: Running LLMs on my local network

LMStudio

Models

API

Ollama

vLLM

GitHub Pages: Full minima functionality enabled

Configuration changes

Features enabled

ChatGPT: GPT development begins

GitHub Pages: Getting started with GitHub Pages

Official Documentation

Setting up

Custom domain

Document structure

Deployment

Jekyll

Defaults

Config YAML file

Themes

Pages sub-folder

Front Matter

Posts, Categories, Tags & Excerpts

Example Front Matter for `/_archives/years/2024.md`

Using a Personal Access Token for `git push`