Skip to content

opencl: fix leak in Adreno q8_0 path#21212

Merged
max-krasnyansky merged 1 commit intoggml-org:masterfrom
qualcomm:lh/q8_0-leak-fix
Apr 1, 2026
Merged

opencl: fix leak in Adreno q8_0 path#21212
max-krasnyansky merged 1 commit intoggml-org:masterfrom
qualcomm:lh/q8_0-leak-fix

Conversation

@lhez
Copy link
Copy Markdown
Contributor

@lhez lhez commented Mar 31, 2026

Overview

There is a leak in Adreno q8_0 GEMM path. In particular, a subbuffer and an image1d_buffer were not released after transposing the activation.

This has little impact on Linux based systems, but causes considerable slow down on Windows (Adreno Windows driver seems extremely sensitive to the cumulation of outstanding CL objects). It also causes big discrepancy between tg number reported by llama-bench and llama-completion for q8_0 on Windows.

Requirements

@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Mar 31, 2026
@lhez lhez marked this pull request as ready for review March 31, 2026 14:17
@lhez lhez requested a review from a team as a code owner March 31, 2026 14:17
@max-krasnyansky max-krasnyansky merged commit 95a6eba into ggml-org:master Apr 1, 2026
49 of 50 checks passed
slartibardfast pushed a commit to slartibardfast/llama.cpp that referenced this pull request Apr 12, 2026
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants