Skip to content

feat: add podSelectorRefs for dynamic pg_hba address resolution#10148

Merged
mnencia merged 13 commits intomainfrom
dev/10087
Mar 10, 2026
Merged

feat: add podSelectorRefs for dynamic pg_hba address resolution#10148
mnencia merged 13 commits intomainfrom
dev/10087

Conversation

@armru
Copy link
Member

@armru armru commented Mar 5, 2026

Introduces a declarative way to manage pg_hba rules by resolving pod label selectors into IP addresses, eliminating the need for static CIDR ranges or manual updates when client pods restart.

Users define named label selectors in .spec.podSelectorRefs and reference them with ${podselector:NAME} in pg_hba address fields. The operator resolves matching pod IPs and the instance manager expands each reference into individual /32 or /128 entries at render time, with automatic reload on pod lifecycle changes.

Closes #10087

@armru armru requested review from a team, NiccoloFei, jsilvela and litaocdl as code owners March 5, 2026 12:36
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 5, 2026
@armru
Copy link
Member Author

armru commented Mar 5, 2026

/test

@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22718198409

@dosubot dosubot bot added the enhancement 🪄 New feature or request label Mar 5, 2026
@armru armru added do not backport This PR must not be backported - it will be in the next minor release and removed backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Mar 5, 2026
@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label Mar 5, 2026
@sxd
Copy link
Member

sxd commented Mar 5, 2026

@mnencia probably this should be tested with a pg_hba config like the one here https://github.com/cloudnative-pg/postgres-keycloak-oauth-validator?tab=readme-ov-file#example-cloudnativepg-configuration
Also I don't see a case of dual-stack in the testing, for order probably that case should be only one line?
On the other hand, instead of using a hard coded string for the mask is better to use something like this https://pkg.go.dev/net#IPMask.String but just an idea,
That was a quick check but the dual-stack thing is something that needs to be tested for sure

@gbartolini
Copy link
Contributor

There is a problem with the markdown code for docs/src/cloudnative-pg.v1.md.

@gbartolini
Copy link
Contributor

I created the angus cluster with a pgbouncer pooler (marshall):

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: angus
spec:
  instances: 3

  podSelectorRefs:
    - name: marshall-pooler
      selector:
        matchLabels:
          cnpg.io/podRole: pooler
          cnpg.io/cluster: angus
          cnpg.io/poolerName: marshall

  postgresql:
    pg_hba:
      - "hostssl all all ${podselector:marshall-pooler} scram-sha-256"
      - "host all all all reject"

  storage:
    size: 1Gi
---
apiVersion: postgresql.cnpg.io/v1
kind: Pooler
metadata:
  name: marshall
spec:
  cluster:
    name: angus

  instances: 3
  type: rw
  pgbouncer:
    poolMode: session
    parameters:
      max_client_conn: "1000"
      default_pool_size: "10"

The cluster defines the pooler podSelectorRef that points to the pooler's pods via three labels. Then it enables access to the database via the pooler and then adds a fallback rule to reject any access.

I then create these two simple jobs with pgbench, one going through the pooler (PGHOST=marshall) and one going directly to the angus-rw service.

apiVersion: batch/v1
kind: Job
metadata:
  name: marshall-pgbench
spec:
  template:
    spec:
      containers:
      - args:
        - -i
        command:
        - pgbench
        env:
        - name: PGHOST
          value: marshall
        - name: PGDATABASE
          value: app
        - name: PGPORT
          value: "5432"
        - name: PGUSER
          valueFrom:
            secretKeyRef:
              key: username
              name: angus-app
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: angus-app
        image: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie
        name: pgbench
      restartPolicy: Never
---
apiVersion: batch/v1
kind: Job
metadata:
  name: angus-pgbench
spec:
  template:
    spec:
      containers:
      - args:
        - -i
        command:
        - pgbench
        env:
        - name: PGHOST
          value: angus-rw
        - name: PGDATABASE
          value: app
        - name: PGPORT
          value: "5432"
        - name: PGUSER
          valueFrom:
            secretKeyRef:
              key: username
              name: angus-app
        - name: PGPASSWORD
          valueFrom:
            secretKeyRef:
              key: password
              name: angus-app
        image: ghcr.io/cloudnative-pg/postgresql:18.1-system-trixie
        name: pgbench
      restartPolicy: Never

This is the result:

$ kubectl get pods
NAME                        READY   STATUS      RESTARTS   AGE
angus-1                     1/1     Running     0          17m
angus-2                     1/1     Running     0          17m
angus-3                     1/1     Running     0          14m
angus-pgbench-7q56c         0/1     Error       0          7s
csi-hostpathplugin-0        8/8     Running     0          6h21m
marshall-6796d674ff-czmkj   1/1     Running     0          17m
marshall-6796d674ff-gzmf9   1/1     Running     0          17m
marshall-6796d674ff-wmlwk   1/1     Running     0          17m
marshall-pgbench-blm7c      0/1     Completed   0          7s

$ kubectl logs jobs/angus-pgbench
Found 2 pods, using pod/angus-pgbench-7q56c
pgbench: error: connection to server at "angus-rw" (10.96.155.110), port 5432 failed: FATAL:  pg_hba.conf rejects connection for host "10.244.1.20", user "app", database "app", SSL encryption
connection to server at "angus-rw" (10.96.155.110), port 5432 failed: FATAL:  pg_hba.conf rejects connection for host "10.244.1.20", user "app", database "app", no encryption
pgbench: error: could not create connection for initialization

$ kubectl logs jobs/marshall-pgbench
dropping old tables...
creating tables...
generating data (client-side)...
100000 of 100000 tuples (100%) of pgbench_accounts done (elapsed 0.01 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 0.10 s (drop tables 0.03 s, create tables 0.00 s, client-side generate 0.03 s, vacuum 0.02 s, primary keys 0.01 s).

The pg_hba rule we created using the pod selector is able to allow access through the pooler. The job that tries to connect directly through the angus-rw primary service is prevented from accessing the database.

@gbartolini
Copy link
Contributor

Moreover, the status of the cluster contains:

  podSelectorRefs:
  - ips:
    - 10.244.1.16
    - 10.244.2.16
    - 10.244.3.24
    name: pooler

If I delete a pgbouncer pod, a new one is recreated with a different IP and the status is immediately updated:

  podSelectorRefs:
  - ips:
    - 10.244.1.16
    - 10.244.2.20
    - 10.244.3.24
    name: pooler

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 6, 2026
@gbartolini gbartolini changed the title feat: add podSelectorRefs for dynamic pg_hba address resolution feat: add podSelectorRefs for dynamic pg_hba address resolution Mar 6, 2026
@mnencia mnencia self-assigned this Mar 6, 2026
@mnencia mnencia requested review from gbartolini and mnencia March 6, 2026 09:48
@gbartolini
Copy link
Contributor

postgres=# SELECT database, user_name, address, netmask FROM pg_catalog.pg_hba_file_rules WHERE auth_method = 'scram-sha-256' and type = 'hostssl';
 database | user_name |   address   |     netmask
----------+-----------+-------------+-----------------
 {all}    | {all}     | 10.244.1.16 | 255.255.255.255
 {all}    | {all}     | 10.244.2.20 | 255.255.255.255
 {all}    | {all}     | 10.244.3.24 | 255.255.255.255
(3 rows)

@gbartolini
Copy link
Contributor

gbartolini commented Mar 6, 2026

@mnencia probably this should be tested with a pg_hba config like the one here https://github.com/cloudnative-pg/postgres-keycloak-oauth-validator?tab=readme-ov-file#example-cloudnativepg-configuration

Although I have not tested the validator, I have introduced the row in the pg_hba.conf, and it was parsed correctly, which to me was enough.

Also I don't see a case of dual-stack in the testing, for order probably that case should be only one line? On the other hand, instead of using a hard coded string for the mask is better to use something like this https://pkg.go.dev/net#IPMask.String but just an idea, That was a quick check but the dual-stack thing is something that needs to be tested for sure

The dual-stack issue should be addressed globally, as it is not tied to this specific feature (as far as I understand). There is an ongoing discussion about this (#7116), but my response aligns with what I’ve been saying: considering everything, I believe there are more critical areas for the project to focus on at this stage, particularly regarding technical debt.

I believe the feature, as it stands, provides significant value and should be considered a major feature for version 1.29. However, it would be great to get your review if you have time.

@mnencia mnencia force-pushed the dev/10087 branch 2 times, most recently from b091b17 to 1589982 Compare March 9, 2026 15:36
@mnencia
Copy link
Member

mnencia commented Mar 9, 2026

/test

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22861590448

armru and others added 11 commits March 9, 2026 21:35
Enable dynamic pod IP resolution in pg_hba rules via named label
selectors. Users can define podSelectorRefs with label selectors
that match application pods, then reference them in pg_hba rules
using ${podselector:<name>} syntax. The operator resolves matching
pod IPs and the instance manager expands ${podselector:<name>}
tokens into one pg_hba line per IP with /32 (IPv4) or /128 (IPv6)
masks.

Signed-off-by: Armando Ruocco <[email protected]>

Co-authored-by: Leonardo Cecchi <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Add list-type markers to status field for consistent SSA behavior,
detect duplicate selector names in the webhook, fix documentation
inaccuracies and clarify code comments.

Signed-off-by: Marco Nenciarini <[email protected]>
Signed-off-by: Marco Nenciarini <[email protected]>
Add tests for dual-stack pods, multiple selectors, pending pods
without IPs, label and deletion timestamp predicate transitions,
mixed IPv4/IPv6 HBA expansion, duplicate names, invalid label
selectors, and multiple podselector references per line.

Assisted-by: Claude Opus 4.6
Signed-off-by: Marco Nenciarini <[email protected]>
Replace ensureCIDR with hostCIDR using net.IPNet for proper CIDR
formatting and IPv6 address normalization.

Signed-off-by: Marco Nenciarini <[email protected]>
@mnencia
Copy link
Member

mnencia commented Mar 9, 2026

/test

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/22873766938

Clarify actor attribution between operator and instance manager,
remove duplicate zero-matches bullet already covered by the warning
admonition, tone down "real-time" wording, drop unrelated
max_worker_processes from the sample, and remove dead slices.Sort
call from the e2e test.

Signed-off-by: Marco Nenciarini <[email protected]>
@mnencia mnencia merged commit 4c41964 into main Mar 10, 2026
41 of 42 checks passed
@mnencia mnencia deleted the dev/10087 branch March 10, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not backport This PR must not be backported - it will be in the next minor release enhancement 🪄 New feature or request lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Dynamic pg_hba entries via spec.podSelectorRefs

6 participants