This week I’ve been dealing with PostgreSQL table bloat issues with some production data. When tables accumulate dead tuples from frequent updates and deletes, they can become severely bloated, impacting query performance.
I explored pg_repack, a powerful extension that reorganizes tables online without holding exclusive locks.
Unlike VACUUM FULL, pg_repack allows concurrent reads and writes during the reorganization process. This makes it perfect for production environments where downtime isn’t an option.
The tool works by creating a new table copy, applying all changes via triggers, and then swapping the tables atomically.
pg_repack removes bloat from tables and indexes, and optionally restores the physical order of clustered indexes.
Note: This was tested on macOS with a local PostgreSQL instance.
First we installed pg_repack using the PostgreSQL Extension Network client:
$ brew install pgxnclient
$ cd /path/to/development-kit
$ pgxn install pg_repack
Then we enabled the extension in our target database:
CREATE EXTENSION IF NOT EXISTS pg_repack;
Finally we ran pg_repack on a specific partitioned table that was experiencing heavy bloat:
$ pg_repack \
--dbname="postgresql://user:[email protected]:5432/development_db" \
--table='gitlab_partitions_dynamic.ci_builds' \
--jobs=4 \
--no-order \
--wait-timeout=36000 \
-w \
--echo \
2>&1 | ts '[%Y-%m-%d %H:%M:%S]' | tee -a "pg_repack_ci_builds_$(date +%Y%m%d_%H%M%S).log"
The --echo flag shows all SQL commands being executed, which is helpful for understanding what pg_repack is doing behind the scenes.
The entire repack of our 500GB partition completed in 3 hours, achieving a 30% reduction in table size.
💰 Total: €
💰 Total: 1 225€
| Jour | Activité | Budget |
|---|---|---|
| 1 | Kuta | € |
| 2 | Ubud | € |
| 3 | Ubud | € |
| 4 | Ubud | € |
| 5 | Ubud | € |
| 6 | Lovina | € |
| 7 | Lovina | € |
| 8 | Lovina | € |
| 9 | Lovina | € |
| 10 | Ahmed | € |
| 11 | Ahmed | € |
| 12 | Ahmed | € |
| 13 | Jimbarran | € |
| 14 | Jimbarran | € |
| 15 | Jimbarran | € |
| 16 | Sanur | € |
💰 Total: €
| Type | Budget |
|---|---|
| Transport | 2 841€ |
| Logement | 1 225€ |
| Activité | € |
| Total | ~4 066€ |
💰 Total: 990€
💰 Total: 735€
| Jour | Activité | Budget |
|---|---|---|
| 1 | Arrivé | € |
| 2 | € | |
| 3 | € | |
| 4 | € | |
| 5 | € | |
| 6 | € | |
| 7 | € | |
| 8 | Départ | 0€ |
💰 Total: €
| Type | Budget |
|---|---|
| Transport | 990€ |
| Logement | 735€ |
| Activité | € |
| Total | ~€ |
This week I have been playing around with AI models from Anthropic and Vertex AI at GitLab to familiarize myself with this new technology.
I have learned a lot of new skills and discovered how powerful those recents models are.
We have been working on a small project in order to identify when an issue contains the same information.
The idea was to convert the content of an issue an generating an embedding for it.
This technique is being called Semantic Similarity.
An embedding is a compressed numerical representation of a piece of text that captures its meaning and is generated by an ML model.
Note: The code is available here.
First we installed pgvector, a new postgres extension in order to store those records in our database:
$ brew install pgvector
CREATE EXTENSION vector;
Then we added a new column to persist those records in the issue table:
ALTER TABLE issues ADD COLUMN embedding vector(768);
And a new index to speed up the queries:
CREATE INDEX ON issues USING hnsw (embedding vector_l2_ops);
Finally we ran a script to import all existing issues for a given team and generating embedding for it.
To determine how similar an issue is with another one we leveraged the neighboor gem to measure the euclidean distance between those vectors:
nearest_issue = issue.nearest_neighbors(:embedding, distance: "euclidean").first
nearest_issue.neighbor_distance
=> 0.6025755637811935
This gives us a great approximation of how similar an issue with the rest of them and we need to adjust the distance to determine good threshold with our entire dataset.
For our use case, a distance of 0.3 gave us a good approximation of similar issues.
18m avec un instructeur20m la pression est de 3bar5m50 bars/500 psi restantsBALLO = Bouée; Air; Lestage; Largage; OK.| Exercice | Prix |
|---|---|
| Elearning | 210€ |
| Plongées | 420€ |
| Lieu | Temps | Profondeur |
|---|---|---|
| Lion de mer | 37min |
6m |
| Lion de mer | 47min |
13m |
| Lion de mer | 47min |
20m |
| Pyramide | 46min |
20m |
| Sec de Suisse | 41min |
20m |
| La roche Percé | 46min |
20m |
💰 Total: €
💰 Total: €
| Jour | Activité | Budget |
|---|---|---|
| 1 | € | |
| 2 | € | |
| 3 | € | |
| 4 | € | |
| 5 | € | |
| 6 | € | |
| 7 | € | |
| 8 | € | |
| 9 | € | |
| 10 | € | |
| 11 | € |
💰 Total: €
| Type | Budget |
|---|---|
| Transport | € |
| Logement | € |
| Activité | € |
| Total | ~€ |
💰 Total: $2769
Marco: $119
💰 Total: $2541
| Jour | Activité | Budget | Rando |
|---|---|---|---|
| 1 | Las Vegas -> Zion | $ | The watchman trail |
| 2 | Zion -> Bryce Canyon -> Lake Powell | $ | Navajo Loop |
| 3 | Lake Powell -> Horseshoe Bend -> Antelope Canyon -> Grand Canyon | $ | Horseshoe bend trail |
| 4 | Grand Canyon | $ | Avion, Rim trail |
| 5 | Grand Canyon -> Death Valley | $ | |
| 6 | Death Valley | $ | Red Cathedral, Dante view |
| 7 | Death Valley -> Sequoia Park | $ | |
| 8 | Sequoia Park | $ | Sherman Tree |
| 9 | Sequoia Park -> San Francisco | $ | |
| 10 | San Francisco | $ | |
| 11 | San Francisco | $ | |
| 12 | San Francisco | $ | |
| 13 | San Francisco -> Los Angeles | $ | |
| 14 | Los Angeles | $ | |
| 15 | Los Angeles | $ | |
| 16 | Los Angeles -> Paris | $ |
💰 Total: $
| Type | Budget |
|---|---|
| Transport | $2803 |
| Logement | $2541 |
| Activité | $ |
| Total | ~$5344 |
💰 Total: 370€
💰 Total: 220€
| Jour | Activité | Budget |
|---|---|---|
| 1 | Ajaccio | 30€ |
| 2 | Porticcio | 30€ |
| 3 | Porticcio | 30€ |
| 4 | Porticcio | 30€ |
| 5 | Porticcio | 30€ |
| 6 | Porticcio | 30€ |
| 7 | Sotta | 30€ |
| 8 | Porto-Vecchio | 50€ |
| 9 | Bonifacio | 50€ |
| 10 | Sartene, Propriano | 50€ |
| 11 | Ajaccio | 50€ |
💰 Total: 600€
| Type | Budget |
|---|---|
| Transport | 100€ |
| Logement | 220€ |
| Activité | 300€ |
| Total | ~620€ |
💰 Total: 100€
💰 Total: 110€/nuit
| Jour | Activité | Budget |
|---|---|---|
| 1 | Milan | 150€ |
| 2 | Milan | 150€ |
| 3 | Milan | 50€ |
💰 Total: 600€
| Type | Budget |
|---|---|
| Transport | 100€ |
| Logement | 220€ |
| Activité | 300€ |
| Total | ~620€ |