Skip to content

fix(keycloak): use management port health endpoints for probes#2162

Merged
kvaps merged 2 commits intocozystack:mainfrom
mattia-eleuteri:fix/keycloak-health-probes
Mar 10, 2026
Merged

fix(keycloak): use management port health endpoints for probes#2162
kvaps merged 2 commits intocozystack:mainfrom
mattia-eleuteri:fix/keycloak-health-probes

Conversation

@mattia-eleuteri
Copy link
Contributor

Summary

  • Fix Keycloak crashloop caused by misconfigured liveness/readiness probes
  • Add KC_HEALTH_ENABLED=true to activate health endpoints on management port
  • Switch probes from application port 8080 (/, /realms/master) to management port 9000 (/health/live, /health/ready)

Problem

Keycloak 26.x redirects all HTTP requests on port 8080 to the configured KC_HOSTNAME (HTTPS). Since kubelet does not follow redirects, probes fail with:

Probe terminated redirects, Response body:

After consecutive failures, kubelet kills the container → restart → crashloop.

Additionally, KC_HEALTH_ENABLED was not set, so the dedicated health endpoints on the management port (9000) returned 404 even though the management interface was active (via KC_METRICS_ENABLED=true).

Changes

  • packages/system/keycloak/templates/sts.yaml:
    • Add KC_HEALTH_ENABLED=true env var to activate /health/live and /health/ready
    • Expose management port 9000 in container ports
    • Liveness probe: GET /health/live on port 9000 (was GET / on 8080)
    • Readiness probe: GET /health/ready on port 9000 (was GET /realms/master on 8080)
    • Increase failure thresholds for better startup tolerance

Test plan

  • Verified /health/live returns {"status":"UP"} (HTTP 200) on port 9000
  • Verified /health/ready returns {"status":"UP","checks":[{"name":"Keycloak database connections async health check","status":"UP"}]} (HTTP 200)
  • Confirmed 0 restarts after 10+ minutes
  • Confirmed no more ProbeWarning or Killing events

🤖 Generated with Claude Code

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels Mar 6, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue causing Keycloak instances to crashloop due to misconfigured liveness and readiness probes. By enabling specific health endpoints and directing Kubernetes probes to the correct management port and paths, the stability and reliability of Keycloak deployments are significantly improved, preventing unnecessary restarts.

Highlights

  • Keycloak Crashloop Fix: Resolved a Keycloak crashloop issue by reconfiguring liveness and readiness probes to correctly interact with the application's health endpoints.
  • Health Endpoint Activation: Enabled the KC_HEALTH_ENABLED environment variable to activate dedicated health endpoints on Keycloak's management port.
  • Probe Configuration Update: Switched liveness and readiness probes from the application port (8080) and generic paths to the management port (9000) and specific /health/live and /health/ready endpoints.
  • Probe Threshold Adjustment: Increased the failureThreshold for both liveness and readiness probes to provide better startup tolerance for Keycloak.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • packages/system/keycloak/templates/sts.yaml
    • Added KC_HEALTH_ENABLED=true environment variable.
    • Exposed the management port 9000.
    • Updated the liveness probe to use /health/live on port 9000 with increased periodSeconds and failureThreshold.
    • Updated the readiness probe to use /health/ready on port 9000 with increased periodSeconds, timeoutSeconds, and failureThreshold.
Activity
  • Verified that /health/live returns {"status":"UP"} (HTTP 200) on port 9000.
  • Verified that /health/ready returns {"status":"UP","checks":[{"name":"Keycloak database connections async health check","status":"UP"}]} (HTTP 200).
  • Confirmed zero restarts of Keycloak pods after more than 10 minutes of operation.
  • Confirmed the absence of ProbeWarning or Killing events in the Kubernetes cluster.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The head commit changed during the review from c9a52fb to 0873691.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Keycloak 26.x exposes dedicated health endpoints on the management
port (9000) via /health/live and /health/ready. The previous probes
used GET / on port 8080 which redirects to the configured KC_HOSTNAME
(HTTPS), causing kubelet to fail the probe with "Probe terminated
redirects" and eventually kill the pod in a crashloop.

Changes:
- Add KC_HEALTH_ENABLED=true to activate health endpoints
- Expose management port 9000 in container ports
- Switch liveness probe to /health/live on port 9000
- Switch readiness probe to /health/ready on port 9000
- Increase failure thresholds for more tolerance during startup

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: mattia-eleuteri <[email protected]>
@mattia-eleuteri mattia-eleuteri force-pushed the fix/keycloak-health-probes branch from c9a52fb to 0873691 Compare March 6, 2026 09:30
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a crash loop in Keycloak by reconfiguring the health probes to use the management port and its dedicated health endpoints. The changes are logical and directly address the described problem. I have added one suggestion to further enhance the probe configuration by introducing a startupProbe. This is a Kubernetes best practice for applications with slow startup times like Keycloak, and it will make the deployment more robust by preventing premature restarts during initialization.

Comment on lines 136 to +151
livenessProbe:
httpGet:
path: /
port: http
path: /health/live
port: management
initialDelaySeconds: 120
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: /realms/master
port: http
path: /health/ready
port: management
initialDelaySeconds: 60
timeoutSeconds: 1
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For applications with long startup times like Keycloak, using a startupProbe is a more robust approach than setting a long initialDelaySeconds on the liveness and readiness probes. A startup probe defers other probes until the application has successfully started, preventing the pod from being killed prematurely. This also allows for more responsive liveness checks immediately after startup. Consider replacing the current probe configuration with one that uses a startupProbe.

          startupProbe:
            httpGet:
              path: /health/ready
              port: management
            failureThreshold: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health/live
              port: management
            periodSeconds: 15
            timeoutSeconds: 5
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /health/ready
              port: management
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, applied in d18ed79. Added a startupProbe and removed initialDelaySeconds from both liveness and readiness probes.

Use a startupProbe to defer liveness/readiness checks until Keycloak
has fully started, instead of relying on initialDelaySeconds. This is
more robust for applications with variable startup times.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: mattia-eleuteri <[email protected]>
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 10, 2026
@kvaps kvaps added backport Should change be backported on previus release backport-previous labels Mar 10, 2026
@kvaps kvaps merged commit 5c7311d into cozystack:main Mar 10, 2026
10 of 11 checks passed
@github-actions
Copy link

Successfully created backport PR for release-1.0:

@github-actions
Copy link

Successfully created backport PR for release-1.1:

kvaps added a commit that referenced this pull request Mar 10, 2026
…oints for probes (#2178)

# Description
Backport of #2162 to `release-1.0`.
kvaps added a commit that referenced this pull request Mar 10, 2026
…oints for probes (#2179)

# Description
Backport of #2162 to `release-1.1`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Should change be backported on previus release backport-previous bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants