Dragonfly - Latest posts

Help segmenting

@sharkjess Jess — Tue, 03 Mar 2026 10:32:07 +0000

I have this CT scan of a stack of mussles. I would like to separate these mussles to individual slides so mussle 1 on one slide, mussle 2 one one slide etc. without loosing any data since I want to analyse the mussle for shell thickness and porosity etc. The mussles are fairly close together but never touch but due to their curvation you can’t just split them , they will overlapp. Could anyone guide me in the right direction in dragonfly ( I have dragonfly 3d 2025) on the steps to do this?

Much appreciated

Jessica

Error training model training model(s) – Pre-trained 2.5D (3 slices) U-Net Depth 5

@Valdenilson Valdenilson Felipe — Tue, 24 Feb 2026 13:58:39 +0000

We are currently performing training for particulate organic matter (POM) segmentation using the Segmentation Wizard. Specifically, we are using the Model Generation Strategy Pre-trained 2.5D (3 slices) U-Net Depth 5. However, during the training process, we consistently encounter an error message stating:

“The following error occurred while training model(s) – Pre-trained 2.5D (3 slices) U-Net Depth 5. Unknown error occurred during training.”

Unusual snapshot behaviour

@adochan — Tue, 17 Feb 2026 15:15:21 +0000

Hi @joezhou_df,

Did you look into this at all? I just returned to it and retested after significantly reducing the volume of data in our s3 bucket and now it appears to be functioning better - it’s now identifying the snapshots in the bucket in around 30 seconds and starting successfully. It seems like maybe there is some bad S3 path traversal happening in there somewhere.

Thanks

Unusual snapshot behaviour

@adochan — Tue, 03 Feb 2026 16:15:39 +0000

Here you go:

Dragonfly:
image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.31.0

Operator:
image: docker.dragonflydb.io/dragonflydb/operator:v1.2.1

OS:
Ubuntu 22.04.5 LTS

Kernel:
Linux vector-state-gen1-0 6.12.63-84.121.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 31 02:07:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Yeah this is running on k8s - EKS specifically.

Unusual snapshot behaviour

@joezhou_df Joe Zhou — Tue, 03 Feb 2026 11:18:13 +0000

Hi @adochan,

Thanks for sharing the issue with us. I suppose you are running Dragonfly using the K8s operator? Do you mind sharing more information, and I will ask the engineering team to check.

Dragonfly version: 1.31.0 or 1.36.0
Dragonfly K8s operator version: (i.e., 1.4.0)
OS: (i.e., Ubuntu 20.04)
Kernel: (i.e., using the command uname -a)
Containerized?: Kubernetes

RedisSemanticCache with LangChain not working after migrating from Redis to Dragonfly

@joezhou_df Joe Zhou — Tue, 03 Feb 2026 09:38:49 +0000

Hi @blancamartnez01,

Thank you for choosing Dragonfly, and sorry for the trouble you’re experiencing with RedisSemanticCache.

The issue is likely due to compatibility. Dragonfly has its own built-in search (Dragonfly Search), and its FT.SEARCH command does not yet support all the options that RediSearch does. This gap is probably why your semantic cache isn’t hitting.

We’d really like to investigate this. Could you please open a GitHub issue on our repo with your specific use case details and potentially error logs? This helps us track and prioritize these compatibility needs.

Unusual snapshot behaviour

@adochan — Wed, 28 Jan 2026 11:41:34 +0000

Hi there,

We are currently deploying multiple dragonfly clusters which are configured to snapshot to S3. We are seeing a very odd problem with one of those clusters.

We have created a bucket - let’s call it df-snapshots and in each of the dragonfly instances, we define the snapshots as:

  snapshot:
    cron: '*/5 * * * *'
    dir: s3://df-snapshots/clusterX

where X is a number from 1 to 22.
You will notice that there is no trailing slash on the dir config and we found that our cluster1 cluster was failing to start up - it would log that it was searching for snapshot and then would eventually timeout and the readiness probes would kill it.
Without the slash in place, we realised that this was trying to read from cluster1* (which, at the time had a lot of snapshots in them). We resolved this by adding the slash to the dir and it looked like the problem was resolved - we restarted the pods and they came up instantly.
However, we later found that the same issue was happening again - pods failing while searching for snapshot on startup.
We can replicate this by deleting the cluster1 folder from S3, this will allow the pod to start with the log line:
W20260128 11:34:07.311956 1 server_family.cc:945] Load snapshot: No snapshot found
but as soon as a snapshot is created in that location and we roll the pod, we get the same hanging behaviour.
We can edit the DF config to point to a completely different folder - e.g. cluster123 and it appears to work just fine.

While trying to figure this out, I temporarily upgraded to 1.36.0 (we’re currently on 1.31.0) and saw an equally perplexing issue. In this case, pointing to the cluster1 folder works just fine - I can see snapshots being created and the pods can restart. However, the pods always indicate that
W20260128 11:34:07.311956 1 server_family.cc:945] Load snapshot: No snapshot found

What is going on here?

RedisSemanticCache with LangChain not working after migrating from Redis to Dragonfly

@blancamartnez01 Blancamartnez01 — Mon, 26 Jan 2026 10:54:08 +0000


Hi everyone,

I’m running into an issue after migrating from Redis to Dragonfly and was wondering if anyone has experienced something similar.

I previously had a system working correctly using Redis together with LangChain’s RedisSemanticCache. Semantic caching worked as expected.

I’ve now migrated the backend to Dragonfly. The standard (non-semantic) cache works fine, with no errors and expected behavior. However, the semantic cache does not seem to work anymore:

No errors are thrown

The application runs normally

Cache entries appear to be written/read for the normal cache

But semantic cache hits never occur

From the application side everything looks correct, and the same code worked with Redis before the migration.

My questions are:

Is RedisSemanticCache fully compatible with Dragonfly?

Are there any known limitations regarding vector similarity / semantic caching features?

Is there any additional configuration required in Dragonfly for this use case?

Any pointers, documentation, or similar experiences would be greatly appreciated.

Jedis jsonSet Large data storage exception

@joezhou_df Joe Zhou — Fri, 05 Dec 2025 19:07:46 +0000

Related discussion on GitHub.

Affinity not applied

@joezhou_df Joe Zhou — Fri, 05 Dec 2025 19:05:24 +0000

Thanks for the update!

We don’t get a lot of issue reports regarding this, to be honest.

If you encounter something in the future regarding the operator, consider opening a GitHub issue directly as well in the Dragonfly K8s Operator repo.

Affinity not applied

@josemrs Jose Maria Rodriguez Saez — Wed, 03 Dec 2025 11:00:20 +0000

It works now. Perhaps we had something not updated.

Table_used_memory metric information

@joezhou_df Joe Zhou — Tue, 02 Dec 2025 17:50:33 +0000

Recording @borys’s answer here for visibility:

table_used_memory shows total memory usage by internal hash tables. It can be different for the primary instance and replicas, because a replica is created after the primary and has more information regarding needed metadata and hash table size.

In most cases users should ignore this metric, as it was created for developers & maintainers.

Table_used_memory metric information

@joezhou_df Joe Zhou — Fri, 28 Nov 2025 17:25:23 +0000

While I’m not sure about the exact meaning of this specific field, but in code, it is related to snapshotting.

During the initial full sync between a primary and a replica, a snapshot is needed from the primary instance. Replicas don’t need to send snapshots to others. Thus, I’d say that a higher value of table_used_memory on the primary instance is expected.

We will update the documentation to reflect this.

Table_used_memory metric information

@nunodio Nuno Dio — Thu, 27 Nov 2025 11:04:59 +0000

Hello everyone,

I’m running a Dragonfly setup on Kubernetes with a Master/Replica configuration. I’ve noticed a significant discrepancy in the memory usage between the two instances.

Specifically, the table_used_memory metric is different on the Master and the Replica, and I’m unsure of its exact meaning.

I have checked the official documentation and other online sources, but I cannot find a clear definition for table_used_memory.

Could anyone help clarify what this metric represents ?

(I’ve also opened a documentation issue here: https://github.com/dragonflydb/documentation/issues/490)

Thank you!

Affinity not applied

@josemrs Jose Maria Rodriguez Saez — Mon, 24 Nov 2025 06:33:32 +0000

Hi,

We got fed up with Redis not behaving well and constantly crashing, even though it was expected to be highly available. So, we moved to Dragonfly, and we’ve been quite happy since then.

However, we noticed an odd behavior, and perhaps it’s something obvious. When setting up podAntiAffinity it’s not actually applied to the StatefulSet (STS) or the pods. I don’t think it can be, but does the operator perform any magic to enforce the affinity settings defined in the Dragonfly resource?

Below is the Dragonfly resource. The affinity section is not being propagated to the StatefulSet created by the operator,

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  labels:
    app.kubernetes.io/managed-by: dragonfly-operator
    app.kubernetes.io/version: 0.0.1
    contact/alerts.pagerduty: Kubernetes_B.Hours
    contact/alerts.slack: engops-notifications
    contact/help.slack: engops-help
    contact/jira: ENGOPS
    contact/owner: EngOps
    helm.sh/chart: levelblue-dragonfly-0.0.4
  name: apm-redis
  namespace: apm
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              app: apm-redis
          topologyKey: topology.kubernetes.io/zone
        weight: 100
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              app: apm-redis
          topologyKey: kubernetes.io/hostname
        weight: 100
  args:
  - --cluster_mode=emulated
  image: nexus.aveng.me:5000/levelblue-dragonfly:v1.34.2
  imagePullPolicy: Always
  labels:
    app: apm-redis
    app.kubernetes.io/managed-by: dragonfly-operator
    app.kubernetes.io/version: 0.0.1
    contact/alerts.pagerduty: Kubernetes_B.Hours
    contact/alerts.slack: engops-notifications
    contact/help.slack: engops-help
    contact/jira: ENGOPS
    contact/owner: EngOps
    helm.sh/chart: levelblue-dragonfly-0.0.4
  replicas: 3
  resources:
    limits:
      cpu: 600m
      memory: 750Mi
    requests:
      cpu: 500m
      memory: 500Mi

DragonFly RAM Entries and SSD Entries

@joezhou_df Joe Zhou — Tue, 18 Nov 2025 07:47:11 +0000

Please try data tiering with the latest Dragonfly v1.35, which includes plenty of improvements for this feature.

M_replace_aux Error with DragonFly 1.34.2

@thangaprakash Thanga — Wed, 29 Oct 2025 19:32:46 +0000

Hi @joezhou_df

Thanks for your response.

I was performing an experiment with 20M keys with 1KB payload with Set & expire TTL value for 1 week while SSD Tiering was enabled with 0.7 threshold.

[root@ip-172-31-0-22 dragonfly]# uname -a
Linux ip-172-31-0-22.ap-south-1.compute.internal 6.1.148-173.267.amzn2023.aarch64 #1 SMP Sun Aug 24 03:50:23 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Flags Configurations Details were getting lost - 1.34.2

@thangaprakash Thanga — Wed, 29 Oct 2025 19:28:36 +0000

Hi @joezhou_df

during my experiments, I reached memory usage to 20GB so set maxmemory as 25GB using redis-cli.

[root@ip-172-31-0-22 dragonfly]# uname -a
Linux ip-172-31-0-22.ap-south-1.compute.internal 6.1.148-173.267.amzn2023.aarch64 #1 SMP Sun Aug 24 03:50:23 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

[root@ip-172-31-0-22 ~]# cat /etc/dragonfly/dragonfly.flags.CONFIG_REWRITE.conf 

# Generated by CONFIG REWRITE
--maxmemory=24.13GiB
[root@ip-172-31-0-22 ~]#

Some switchover happened with other replicas (might be restart). I found when the log directory was set default log directory instead of configured one. I checked conf file with this single entry alone.

I will share further details if I reproduce the same.

Jedis jsonSet Large data storage exception

@shaoguangchao — Wed, 29 Oct 2025 09:04:00 +0000

uname -a
Linux it663-com 5.14.0-366.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Sep 14 23:37:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Jedis jsonSet Large data storage exception

@shaoguangchao — Wed, 29 Oct 2025 06:20:28 +0000

redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:262)
at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:55)
at redis.clients.jedis.Protocol.process(Protocol.java:131)
at redis.clients.jedis.Protocol.read(Protocol.java:221)
at redis.clients.jedis.Connection.protocolRead(Connection.java:430)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:443)
at redis.clients.jedis.Connection.getOne(Connection.java:416)
at redis.clients.jedis.Connection.executeCommand(Connection.java:212)
at redis.clients.jedis.executors.DefaultCommandExecutor.executeCommand(DefaultCommandExecutor.java:24)
at redis.clients.jedis.UnifiedJedis.executeCommand(UnifiedJedis.java:265)
at redis.clients.jedis.UnifiedJedis.jsonSet(UnifiedJedis.java:4220)
at cn.wind.rdemo.DD.main(DD.java:43)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)
at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)
at java.base/java.io.InputStream.read(InputStream.java:218)
at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:256)
… 11 more v1.25.1The version is particularly fast

M_replace_aux Error with DragonFly 1.34.2

@joezhou_df Joe Zhou — Wed, 29 Oct 2025 04:11:32 +0000

Do you know how this error happened? (i.e., what command you used to trigger this).

I suppose you are using the same setup as mentioned in the other thread?

OS - 6.1.148-173.267.amzn2023.aarch64
Dragonfly version : 1.34.2

Can you provide these as well?

Kernel, by using command: uname -a
Containerized?: [Bare Metal, Docker, Docker Compose, Docker Swarm, Kubernetes, Other]

Flags Configurations Details were getting lost - 1.34.2

@joezhou_df Joe Zhou — Wed, 29 Oct 2025 04:07:30 +0000

Hi @thangaprakash,

Can you please elaborate?

If you use the CONFIG REWRITE command after setting configs at runtime, these configs should be persisted. Are you saying it’s not the case for you?

Thanks.

Flags Configurations Details were getting lost - 1.34.2

@thangaprakash Thanga — Tue, 28 Oct 2025 20:59:40 +0000

When I set maxmemory using redis-cli with Config set maxmemory 25GB, I observed the current booted configurations were overwritten with single configuration.

DragonFly version 1.34.2

/etc/dragonfly/dragonfly.flags.conf :

#CONFIG REWRITE

–maxmemory=25GB

M_replace_aux Error with DragonFly 1.34.2

@thangaprakash Thanga — Mon, 27 Oct 2025 20:42:38 +0000

ct 27 19:54:58 ip-172-31-0-22 dragonfly[78125]: M_replace_aux
Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: ocalalias
Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: tObjES3.localalias
Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: rdINS0_5fiberEN4util3fb219FixedStackAllocatorEZN ing_viewIcSt11char_traitsIcEENS6_13FiberPriority Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: PC: @ Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: r()
Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:00 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:01 ip-172-31-0-22 audit[78125]: ned_service_t:s0 pid=78125 comm=“Proactor7” Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: }::operator()()
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: }::_FUN()
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: x()
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: M_replace_aux
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: ocalalias
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: tObjES3.localalias
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: rdINS0_5fiberEN4util3fb219FixedStackAllocatorEZN ing_viewIcSt11char_traitsIcEENS6_13FiberPriority Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: e already in AbslFailureSignalHandler()
Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: Oct 27 19:55:02 ip-172-31-0-22 dragonfly[78125]: F20251027 19:54:57.888782 78133 init.cc:140] Uncaught exception: basic_string::_
*** Check failure stack trace: ***
@ 0xaaaab1f1f0f4 google::LogMessage::SendToLog()
@ 0xaaaab1f17fd8 google::LogMessage::Flush()
@ 0xaaaab1f19914 google::LogMessageFatal::~LogMessageFatal()
@ 0xaaaab1335f5c _ZZN13MainInitGuardC4EPiPPPcjENKUlvE_clEv.isra.0
@ 0xaaaab133604c _ZZN13MainInitGuardC4EPiPPPcjENUlvE_4_FUNEv
@ 0xaaaab2026e94 __cxxabiv1::__terminate()
@ 0xaaaab2026ee8 std::terminate()
@ 0xaaaab202706c __cxa_throw
@ 0xaaaab12f90d0 std::__throw_length_error()
@ 0xaaaab20a2918 std::__cxx11::basic_string<>::_M_replace_aux()
@ 0xaaaab1a01d80 dfly::CompactObj::GetString()
@ 0xaaaab1a01e48 dfly::CompactObj::GetSlice()
@ 0xaaaab13be138 _ZN4dfly13RdbSerializer9SaveValueERKNS_10CompactObjE.l
@ 0xaaaab13be2e4 dfly::RdbSerializer::SaveEntry()
@ 0xaaaab13d2514 ZN4dfly13SliceSnapshot14SerializeEntryEtRKNS_10Compac
@ 0xaaaab13d29d8 dfly::SliceSnapshot::SerializeBucket()
@ 0xaaaab13d2cbc dfly::SliceSnapshot::BucketSaveCb()
@ 0xaaaab13d30f8 dfly::SliceSnapshot::IterateBucketsFb()
@ 0xaaaab13d33c0 _ZN5boost7context6detail11fiber_entryINS1_12fiber_reco
S6_6detail15WorkerFiberImplIZN4dfly13SliceSnapshot5StartEbNSB_13SnapshotFlushEEUlvE2_JEEC4IS7_EESt17basic_str
ERKNS0_12preallocatedEOT_OSD_EUlOS4_E_EEEEvNS1_10transfer_tE
@ 0xaaaab1c43794 make_fcontext
*** SIGABRT received at time=1761594900 on cpu 7 ***
0xffffa09779b4 (unknown) __pthread_kill_implementation
@ 0xaaaab1f77a4c 464 absl::lts_20250512::AbslFailureSignalHandle
@ 0xffffa0c79830 5104 (unknown)
@ 0xffffa092e3a0 64 gsignal
@ 0xffffa091a264 320 abort
@ 0xaaaab1f24bb4 32 google::DumpStackTraceAndExit()
@ 0xaaaab1f185d4 192 google::LogMessage::Fail()
ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:unconfi
exe=“/usr/local/bin/dragonfly” sig=6 res=1
@ 0xaaaab1f1f0f4 16 google::LogMessage::SendToLog()
@ 0xaaaab1f17fd8 208 google::LogMessage::Flush()
@ 0xaaaab1f19914 80 google::LogMessageFatal::~LogMessageFatal()
@ 0xaaaab1335f5c 16 MainInitGuard::MainInitGuard()::{lambda()#1
@ 0xaaaab133604c 352 MainInitGuard::MainInitGuard()::{lambda()#1
@ 0xaaaab2026e94 16 __cxxabiv1::__terminate()
@ 0xaaaab2026ee8 16 std::terminate()
@ 0xaaaab202706c 16 __cxa_throw
@ 0xaaaab12f90d0 48 std::__throw_length_error()
@ 0xaaaab20a2918 32 std::__cxx11::basic_string<>::M_replace_au
@ 0xaaaab1a01d80 64 dfly::CompactObj::GetString()
@ 0xaaaab1a01e48 32 dfly::CompactObj::GetSlice()
@ 0xaaaab13be138 224 dfly::RdbSerializer::SaveValue()
F20251027 19:55:01.475198 78131 init.cc:140] Uncaught exception: basic_string::
*** Check failure stack trace: ***
@ 0xaaaab1f1f0f4 google::LogMessage::SendToLog()
@ 0xaaaab1f17fd8 google::LogMessage::Flush()
@ 0xaaaab1f19914 google::LogMessageFatal::~LogMessageFatal()
@ 0xaaaab1335f5c _ZZN13MainInitGuardC4EPiPPPcjENKUlvE_clEv.isra.0
@ 0xaaaab133604c _ZZN13MainInitGuardC4EPiPPPcjENUlvE_4_FUNEv
@ 0xaaaab2026e94 __cxxabiv1::__terminate()
@ 0xaaaab2026ee8 std::terminate()
@ 0xaaaab202706c __cxa_throw
@ 0xaaaab12f90d0 std::__throw_length_error()
@ 0xaaaab20a2918 std::__cxx11::basic_string<>::_M_replace_aux()
@ 0xaaaab1a01d80 dfly::CompactObj::GetString()
@ 0xaaaab1a01e48 dfly::CompactObj::GetSlice()
@ 0xaaaab13be138 _ZN4dfly13RdbSerializer9SaveValueERKNS_10CompactObjE.l
@ 0xaaaab13be2e4 dfly::RdbSerializer::SaveEntry()
@ 0xaaaab13d2514 ZN4dfly13SliceSnapshot14SerializeEntryEtRKNS_10Compac
@ 0xaaaab13d29d8 dfly::SliceSnapshot::SerializeBucket()
@ 0xaaaab13d2cbc dfly::SliceSnapshot::BucketSaveCb()
@ 0xaaaab13d30f8 dfly::SliceSnapshot::IterateBucketsFb()
@ 0xaaaab13d33c0 _ZN5boost7context6detail11fiber_entryINS1_12fiber_reco
S6_6detail15WorkerFiberImplIZN4dfly13SliceSnapshot5StartEbNSB_13SnapshotFlushEEUlvE2_JEEC4IS7_EESt17basic_str
ERKNS0_12preallocatedEOT_OSD_EUlOS4_E_EEEEvNS1_10transfer_tE
@ 0xaaaab1c43794 make_fcontext
[failure_signal_handler.cc : 377] RAW: Signal 6 raised at PC=0xffffa09779b4 whil
@ 0xaaaab13be2e4 176 dfly::RdbSerializer::SaveEntry()
@ 0xaaaab13d2514 224 dfly::SliceSnapshot::SerializeEntry()
@ 0xaaaab13d29d8 272 dfly::SliceSnapshot::SerializeBucket()
@ 0xaaaab13d2cbc 80 dfly::SliceSnapshot::BucketSaveCb()
@ 0xaaaab13d30f8 320 dfly::SliceSnapshot::IterateBucketsFb()
@ 0xaaaab13d33c0 240 boost::context::detail::fiber_entry<>()

DragonFly RAM Entries and SSD Entries

@thangaprakash Thanga — Mon, 27 Oct 2025 20:28:26 +0000

3 VMs – Each VM 8 vCPU, 32 GB, SSD Storage 250GB

OS - 6.1.148-173.267.amzn2023.aarch64

DragonFly version : 1.34.2
MaxMemory : 20GB
SSD Storage: 200GB
1 Master and 2 Replicas with 3 Sentinels

Sentinel version : 7.2.5 (redis)

With SSD Tiering enabled, I need to insert 50M entries of unique keys with 1024 Bytes Payload with TTL 864000.

I am unable to insert even 30M entries. TPS (set with expire) was also reducing gradually.

tiered_offload_threshold=0.7

Please guide me for the same.

Sample Entry:

Key : DRAGONFLY_POC_1:3177222
Value : “Value-3177222:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA”

DragonFly Config File Support in v1.34.2 - Equivalent to Redis

@thangaprakash Thanga — Mon, 27 Oct 2025 20:04:43 +0000

thanks @joezhou_df for guidence.

.

DragonFly Config File Support in v1.34.2 - Equivalent to Redis

@joezhou_df Joe Zhou — Thu, 23 Oct 2025 20:38:10 +0000

Hi @thangaprakash,

Sorry for the confusion. Dragonfly doesn’t support exactly the same configuration options Redis does. Due to the architectural design differences, we need a different set of configs. They can all be found here: Server Configuration Flags | Dragonfly

For example, we use a cron string to save backups periodically:

--snapshot_cron="*/5 * * * *"

To streamline the experience of using CLI, environment variables, or a flagfile, I believe that all options should use _ instead of -, so it should be --slowlog_log_slower_than for Dragonfly.

Also, we don’t have 8 maxmemory eviction policies. If you want eviction, simply pass:

--cache_mode=true

DragonFly Config File Support in v1.34.2 - Equivalent to Redis

@thangaprakash Thanga — Thu, 23 Oct 2025 19:12:56 +0000

Starting journey with DragonFly version 1.34.2 …

I am unable to provide dragonfly.conf (as that of redis) while starting the dragonfly service. (linux)

I am able to provide “-–flagfile flags.conf” but it doesn’t support all properties similar to redis.

[root@ip-172-31-0-22 ~]# /dragonFly/dragonfly --flagfile /etc/dragonfly/dragonfly_flags.conf
ERROR: Unknown command line flag ‘announce-ip’
ERROR: Unknown command line flag ‘announce-port’
ERROR: Unknown command line flag ‘databases’
ERROR: Unknown command line flag ‘loglevel’
ERROR: Unknown command line flag ‘maxmemory-policy’
ERROR: Unknown command line flag ‘notify-keyspace-events’
ERROR: Unknown command line flag ‘protected-mode’
ERROR: Unknown command line flag ‘save’
ERROR: Unknown command line flag ‘save’
ERROR: Unknown command line flag ‘save’
ERROR: Unknown command line flag ‘slowlog-log-slower-than’
ERROR: Unknown command line flag ‘slowlog-max-len’
ERROR: Unknown command line flag ‘tcp-keepalive’

Content of “Flags.conf”

–announce-ip=172.Y.x.x
–announce-port=6379
–bind=172.Y.x.x
–bind=127.0.0.1
–databases=16
–dbfilename=dump.rdb
–dir=/dragonFly/dragondata/data
–log_dir=/dragonFly/var/log/dragonfly
–loglevel=notice
–maxmemory=20GB
–maxmemory-policy=noeviction
–notify-keyspace-events=“”
–port=6379
–protected-mode=no
–save=300 10
–save=60 10000
–save=900 1
–slowlog-log-slower-than=10000
–slowlog-max-len=128
–tcp-keepalive=300
–tiered_prefix=/dragonFly/dragondata/ssd

Please guide for the right documentation.

Is it possible to change maxmemory_policy?

@KleinenberG Aleksey Koltsov — Thu, 04 Sep 2025 06:52:20 +0000

Thank you very much for your answer!
It works!

Is it possible to change maxmemory_policy?

@joezhou_df Joe Zhou — Wed, 03 Sep 2025 17:21:19 +0000

Hi @KleinenberG,

I don’t think Dragonfly allows changing the eviction policy on the fly while the server is running at the moment, but a server flag (and a restart of the server) can be used.

In the meantime, we don’t have the same eviction policies Redis has. We have only one cache mode, which is a sophisticated and efficient LRU + LFU combined algorithm.

Read more about Dragonfly’s cache mode

Configure Dragonfly with --cache_mode=true

After the change, the below is what I get. And Dragonfly would evict keys when the used memory is getting close to maxmemory.

dragonfly$> INFO MEMORY # ... cache_mode:cache maxmemory_policy:eviction # ...

Is it possible to change maxmemory_policy?

@KleinenberG Aleksey Koltsov — Wed, 03 Sep 2025 15:39:11 +0000

Hello!
I see in the output INFO MEMORY:
maxmemory_policy:noeviction

can i change this policy?

127.0.0.1:6379> info server
redis_version:7.4.0
dragonfly_version:df-v1.28.1

as far as I understand - no!?

Arguements are getting passed but not overriding

@Ben — Fri, 29 Aug 2025 15:15:54 +0000

Thank you Args are passing into container, but are duplicate · Issue #5746 · dragonflydb/dragonfly · GitHub

Arguements are getting passed but not overriding

@joezhou_df Joe Zhou — Thu, 28 Aug 2025 16:24:37 +0000

Hey @Ben, sorry for the delay. Our engineers that has expertise in this area will have time to take a look at this soon as I’ve already notified them. In the meantime, feel free to add your discovery as a GitHub issue in our repo too if you’d like.

Arguements are getting passed but not overriding

@joezhou_df Joe Zhou — Sun, 24 Aug 2025 03:36:27 +0000

Hey @Ben, thanks for reporting.

The Helm chart is not an area I am most familiar with. It seems that some configs are default, but it’s weird to have duplicates. Notified our engineering team to check, and we will get back to you.

Arguements are getting passed but not overriding

@Ben — Thu, 21 Aug 2025 21:47:24 +0000

Hello,

I have installed the Dragonfly operator in my cluster (via Helm) chart v1.33.1. I have been testing how I can pass configuration arguments using a manifest.

For example:

apiVersion: dragonflydb.io/v1alpha1 kind: Dragonfly metadata: labels: app.kubernetes.io/name: dragonfly app.kubernetes.io/instance: dragonfly-sample app.kubernetes.io/part-of: dragonfly-operator app.kubernetes.io/managed-by: kustomize app.kubernetes.io/created-by: dragonfly-operator name: dragonfly-sample spec: replicas: 3 resources: requests: cpu: 550m memory: 550Mi limits: cpu: 600m memory: 750Mi labels: app.kubernetes.io/name: dragonfly annotations: prometheus.io/scrape: "true" prometheus.io/path: /metrics prometheus.io/port: "9999" args: - --primary_port_http_enabled=true - --requirepass=$(DFLY_requirepass) - --admin_port=9999 env: - name: DFLY_requirepass valueFrom: secretKeyRef: name: dragonfly-auth # The name of your secret key: password # The key within the secret snapshot: cron: "*/15 * * * *" persistentVolumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 750Mi topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app.kubernetes.io/name: dragonfly affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - "dragonfly" topologyKey: kubernetes.io/hostname

This configuration runs, and the state is good. However, digging a little deeper I am not sure what to think about the arguements I’ve passed.

~ kubectl get po dragonfly-sample-0 -oyaml | grep arg -A10 - args: - --alsologtostderr - --primary_port_http_enabled=false - --admin_port=9999 - --admin_nopass - --primary_port_http_enabled=true - --requirepass=$(DFLY_requirepass) - --admin_port=9999 - --dir=/dragonfly/snapshots - --snapshot_cron=*/15 * * * *

Some of these arguments are present even if I don’t add the spec.args section to my Dragonfly Kind manifest.

Further:

~ kubectl exec -it dragonfly-sample-0 -- /bin/sh # ps ax PID TTY STAT TIME COMMAND 1 ? Ssl 0:06 dragonfly --logtostderr --alsologtostderr --primary_port_http_enabled=false --admin_port=9999 --admin_nopass --primary_port_http_enabled=true --requirepass=password --admin_port=9999 --dir=/dra 1608 pts/0 Ss 0:00 /bin/sh 1631 pts/0 R+ 0:00 ps ax

This is all unexepected, especially the primary_port_http_enabled=false AND primary_port_http_enabled=true.

Is there something I am doing wrong?

DragonflyDB Active-Active (Multi-Master) Support in 2025?

@BelroDe — Tue, 19 Aug 2025 22:27:06 +0000

Hi!! Thank you so much for this precious information! I will keep searching and thinking about all of that. Thanks! Wishing you the best.

About the Off Topic category

@joezhou_df Joe Zhou — Mon, 18 Aug 2025 19:09:14 +0000

(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)

Use the following paragraphs for a longer description, or to establish category guidelines or rules:

Why should people use this category? What is it for?

How exactly is this different than the other categories we already have?

What should topics in this category generally contain?

Do we need this category? Can we merge with another category, or subcategory?

DragonflyDB Active-Active (Multi-Master) Support in 2025?

@joezhou_df Joe Zhou — Mon, 18 Aug 2025 15:49:27 +0000

That’s not as easy as it seems. Take databases (or data stores) as an example.

Some databases rely on sophisticated strong consensus algorithms (like Raft or Paxos) to ensure strict consistency across nodes, even across regions. However, this comes at the cost of higher latency. These systems are typically on-disk databases that prioritize durability and throughput, where slightly higher latency is an acceptable tradeoff for scalability and performance.

On the other hand, some databases opt for eventual consistency, often using conflict-free replicated data types (CRDTs). These prioritize low latency and high throughput. If Dragonfly were to implement active-active replication, CRDTs could be a viable choice, but this would mean sacrificing strong immediate consistency. In such cases, a certain degree of stale reads might be acceptable.

Alternatively, you can consider just region-splitting the application (amazon.com for the US and amazon.ca for Canada). If an application is not there yet (popularity, user base, etc.), I’d suggest not complicating things.

Ultimately, it all comes down to tradeoffs, whether at the database layer or the application layer. Hope that helps.

DragonflyDB Active-Active (Multi-Master) Support in 2025?

@BelroDe — Sun, 17 Aug 2025 22:50:57 +0000

Okay, thank you for the clear clarification!
I’m really looking forward to seeing what you can do regarding the active-active solution.

Since you’re an expert, I’d love to hear how you would implement an active-active approach in the current context. My challenge is at the level of a rate limiter: I want to use an active-active setup to share the limit across all my backend applications (for example, my backend is distributed across multiple regions, like Germany and the US).

Currently, not replicating a user’s quota creates a risk of token rate limiting evasion — for instance, a user could hit the limit in the Germany region and then immediately make additional requests in the US region, effectively bypassing the intended limits.

As a junior dev, I’d appreciate your thoughts on the best way to handle this in an active-active configuration.

DragonflyDB Active-Active (Multi-Master) Support in 2025?

@joezhou_df Joe Zhou — Sun, 17 Aug 2025 18:41:04 +0000

Hi @BelroDe,

Dragonfly Swarm is our solution for horizontal scaling. It is a sharded topology with clustering. So, Dragonfly standalone scales vertically first. And if your workload is getting beyond 1TB, Dragonfly Swarm can be used to scale horizontally.

Here is an example Dragonfly Swarm topology. For example, replica-0 is there to provide high availability and optionally serve additional reads for primary-0. Data in primary-0 & replica-0 never appears in other shards. Please don’t confuse sharding with active-active, even though they both have multiple primary instances (masters).

On the other hand, in an active-active topology, a key-value pair may reside in multiple masters.

As of 2025, Dragonfly Swarm multi-shard cluster is already available in Dragonfly Cloud. But we don’t have a solid plan for active-active yet.

DragonflyDB Active-Active (Multi-Master) Support in 2025?

@BelroDe — Sun, 17 Aug 2025 18:20:55 +0000

Hi everyone,

I’m trying to clarify the current state of active-active / multi-master support in DragonflyDB as of 2025. Information online seems a bit confusing, especially with the introduction of DragonflyDB Swarm.

Some sources suggest that Swarm is bringing clustering and replication improvements, but it’s not clear whether this includes true active-active (multi-master) capabilities, or if it’s still limited to primary-replica setups.

Does anyone know the exact status? Is active-active supported now, on the roadmap, or still not part of DragonflyDB’s design?

Thanks!

ASYNC flushes possible?

@joezhou_df Joe Zhou — Mon, 11 Aug 2025 15:23:33 +0000

Thanks for the feedback. I’ll update the documentation to reflect this behavior. And I’m glad the current behavior can provide what you need at least.

Going back to your original post, I’d like to clarify one point: when using FLUSHALL or FLUSHDB in Redis, the ASYNC option doesn’t necessarily “slow down” the deletion. Just that the SYNC option handles the entire deletion in the Redis main thread, which essentially blocks all other incoming requests during that period.

You’re right, in this case, Dragonfly behaves differently, but I’d argue this is actually an improvement. As newer versions of Redis are also leaning towards using ASYNC as the default.

ASYNC flushes possible?

@bergfreund — Wed, 06 Aug 2025 10:34:02 +0000

Hi Joe,

thanks for your answer.

It would be good if you could include this in the documentation. It would also be helpful if not only flushAll accepted the parameter for compatibility reasons, but also flushDB. In our case, this was the reason why Dragonfly was not fully Redis compatible.

ASYNC flushes possible?

@joezhou_df Joe Zhou — Tue, 29 Jul 2025 15:47:33 +0000

Hey @bergfreund, I confirmed with the team that FLUSHALL and FLUSHDB are actually by default async. They cannot be sync at the moment, and accepting the option is for compatibility reasons.

ASYNC flushes possible?

@joezhou_df Joe Zhou — Mon, 28 Jul 2025 17:28:10 +0000

Hey @bergfreund,

You’re right. At the moment, the ASYNC/SYNC option is accepted, but I don’t think they take effect. Accepting the option is for compatibility reasons.

Although I can check where (i.e., in the background or not) the operation actually happens.

Thanks!

ASYNC flushes possible?

@bergfreund — Fri, 25 Jul 2025 10:25:38 +0000

We have previously emptied our databases in Redis using “flushDB ASYNC”. Dragonfly does not accept parameters for flushDB. However, “flushALL ASYNC” is accepted.

The documentation does not mention this, however.

So my question is: Is there a way in Dragonfly to perform asynchronous flushes in order to slow down the deletion of keys and thus reduce the load on the DB?

Unable to restore from .dfs unless --force_epoll=true

@joezhou_df Joe Zhou — Thu, 17 Jul 2025 18:44:09 +0000

Hey @hgorni,

Please take a look at this GitHub issue.

There seems to be a bug in the kernel version you’re using. For Dragonfly Cloud, we run on 6.8.x versions. Is it possible to upgrade on your side?

This could be related to the maxmemory issue you have in the other thread as well.

Memory usage keeps increasing after reaching --maxmemory limit

@joezhou_df Joe Zhou — Thu, 17 Jul 2025 14:29:09 +0000

Hi @hgorni, Thanks for reaching out and for the details. I will ask our engineers to check this out.

Unable to restore from .dfs unless --force_epoll=true

@hgorni Henrique Gorni — Mon, 14 Jul 2025 20:01:13 +0000

My Dragonfly instance is configured with --cache_mode and --maxmemory=160G , running on a GCP Compute Engine VM with 22 vCPUs and 176 GB of RAM (c3-highmem-22 , Intel Sapphire Rapids, x86_64, Debian 6.1.140-1).

I’ve noticed that while my .dfs snapshots are around 50 GB, the restore process works smoothly. However, once the snapshot size grows above ~150 GB, Dragonfly detects the file, starts loading, and then immediately gets stuck, typically before even reaching 1 GB of memory usage.

Initially, I suspected snapshot corruption and rebuilt the cache from scratch. But after using the SAVE command and restarting Dragonfly, the issue reappeared with the newly generated snapshot.

I’ve tried several approaches to work around this:

Reduced proactor threads to 6, then to 4

Changed disk type

Recreated the VM from scratch

Tuned various runtime parameters

The only configurations that successfully restore large snapshots (~150 GB+) are:

Setting --proactor_threads=1

Or enabling --force_epoll=true

Currently, I’m using --force_epoll, since restoring with a single thread is too slow. But I understand that Dragonfly prefers the default io_uring I/O engine for performance.

I’d appreciate any insight into why this might be happening and whether there’s a way to restore large .dfssnapshots using multiple threads without disabling io_uring.

For referrence, this is how I start my Dragonfly instance:

docker run -d --name "app_cache" \ --network host \ --log-driver=gcplogs \ --log-opt gcp-log-cmd=true \ -v /data:/data \ -m "172g" \ docker.dragonflydb.io/dragonflydb/dragonfly \ --port "6379" \ --logtostderr \ --force_epoll=true \ --dir /data \ --cache_mode \ --maxmemory 160G

Thanks in advance!

Memory usage keeps increasing after reaching --maxmemory limit

@hgorni Henrique Gorni — Mon, 14 Jul 2025 19:42:17 +0000

I recently migrated from Redis to Dragonfly in my production environment, setting an instance with --cache_mode and --maxmemory=160G . Everything works as expected at first, but as the memory usage approaches the limit, old keys are evicted, yet the memory usage keeps increasing instead of stabilizing or decreasing.

Observed behavior:

After restarting the instance, the .dfs dump is restored correctly and memory usage drops to normal. But once new writes resume, memory usage starts growing again and eventually exceeds maxmemory.

Running MEMORY DEFRAGMENT manually has no visible effect - memory stats don’t change, even if I completely stop new writes.

Key eviction occurs while writes are active, but stops immediately once I pause writes, even though memory usage remains above the limit.

After hours of inactivity (no new keys), memory usage remains high and does not reduce.

I tried increasing --max_eviction_per_heartbeat to 200000 (from the default of 100), but that didn’t make a difference.

Here’s how I’m running my Dragonfly instance:

docker run -d --name "app_cache" \ --network host \ --log-driver=gcplogs \ --log-opt gcp-log-cmd=true \ -v /data:/data \ -m "172g" \ docker.dragonflydb.io/dragonflydb/dragonfly \ --port "6379" \ --logtostderr \ --dir /data \ --cache_mode --maxmemory 160G

And these are the instance’s stats:

127.0.0.1:6379> info all

Server

redis_version:7.4.0
dragonfly_version:df-v1.31.1
redis_mode:standalone
arch_bits:64
os:Linux 6.1.0-37-cloud-amd64 x86_64
thread_count:22
multiplexing_api:epoll
tcp_port:6379
uptime_in_seconds:3993
uptime_in_days:0

Clients

connected_clients:1
max_clients:64000
client_read_buffer_bytes:256
blocked_clients:0
pipeline_queue_length:0
send_delay_ms:0
timeout_disconnects:0

Memory

used_memory:154617454176
used_memory_human:144.00GiB
used_memory_peak:154619189920
used_memory_peak_human:144.00GiB
fibers_stack_vms:10256368
fibers_count:157
used_memory_rss:176138014720
used_memory_rss_human:164.04GiB
used_memory_peak_rss:176138014720
maxmemory:171798691840
maxmemory_human:160.00GiB
used_memory_lua:0
object_used_memory:96439049648
type_used_memory_string:96439049648
table_used_memory:56645575768
prime_capacity:852799080
expire_capacity:850650360
num_entries:474603534
inline_keys:0
small_string_bytes:48701660592
pipeline_cache_bytes:0
dispatch_queue_bytes:0
dispatch_queue_subscriber_bytes:0
dispatch_queue_peak_bytes:0
client_read_buffer_peak_bytes:65792
tls_bytes:5664
snapshot_serialization_bytes:0
commands_squashing_replies_bytes:0
lsn_buffer_size_sum:0
lsn_buffer_bytes_sum:0
cache_mode:cache
maxmemory_policy:eviction
replication_streaming_buffer_bytes:0
replication_full_sync_buffer_bytes:0

Stats

total_connections_received:426235
total_handshakes_started:0
total_handshakes_completed:0
total_commands_processed:411575
instantaneous_ops_per_sec:0
total_pipelined_commands:0
pipeline_throttle_total:0
pipelined_latency_usec:0
total_net_input_bytes:55554981821
connection_migrations:0
connection_recv_provided_calls:0
total_net_output_bytes:7043752
rdb_save_usec:0
rdb_save_count:0
big_value_preemptions:0
compressed_blobs:0
instantaneous_input_kbps:-1
instantaneous_output_kbps:-1
rejected_connections:-1
expired_keys:44717
evicted_keys:90086370
total_heartbeat_expired_keys:44567
total_heartbeat_expired_bytes:4755136
total_heartbeat_expired_calls:8634627
hard_evictions:0
garbage_checked:45269533
garbage_collected:150
bump_ups:0
stash_unloaded:4
oom_rejections:0
traverse_ttl_sec:17887
delete_ttl_sec:21
keyspace_hits:0
keyspace_misses:0
keyspace_mutations:564734621
total_reads_processed:1724481
total_writes_processed:429089
huffenc_attempt_total:0
huffenc_success_total:0
defrag_attempt_total:507013963
defrag_realloc_total:8012608
defrag_task_invocation_total:1417605
reply_count:429089
reply_latency_usec:0
blocked_on_interpreter:0
lua_interpreter_cnt:0
lua_blocked_total:0
lua_interpreter_return:0
lua_force_gc_calls:0
lua_gc_freed_memory_total:0
lua_gc_duration_total_sec:0

Tiered

tiered_entries:0
tiered_entries_bytes:0
tiered_total_stashes:0
tiered_total_fetches:0
tiered_total_cancels:0
tiered_total_deletes:0
tiered_total_uploads:0
tiered_total_stash_overflows:0
tiered_heap_buf_allocations:0
tiered_registered_buf_allocations:0
tiered_allocated_bytes:0
tiered_capacity_bytes:0
tiered_pending_read_cnt:0
tiered_pending_stash_cnt:0
tiered_small_bins_cnt:0
tiered_small_bins_entries_cnt:0
tiered_small_bins_filling_bytes:0
tiered_cold_storage_bytes:0
tiered_offloading_steps:0
tiered_offloading_stashes:0
tiered_ram_hits:0
tiered_ram_cool_hits:0
tiered_ram_misses:0

Persistence

current_snapshot_perc:0
current_save_keys_processed:0
current_save_keys_total:0
last_success_save:1752515648
last_saved_file:
last_success_save_duration_sec:0
loading:0
saving:0
current_save_duration_sec:0
rdb_changes_since_last_success_save:564734621
rdb_bgsave_in_progress:0
rdb_last_bgsave_status:ok
last_failed_save:0
last_error:
last_failed_save_duration_sec:0

Transaction

tx_shard_polls:0
tx_shard_optimistic_total:409898
tx_shard_ooo_total:0
tx_global_total:0
tx_normal_total:409898
tx_inline_runs_total:18655
tx_schedule_cancel_total:0
tx_batch_scheduled_items_total:391243
tx_batch_schedule_calls_total:391243
tx_with_freq:409898,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
tx_queue_len:0
eval_io_coordination_total:0
eval_shardlocal_coordination_total:0
eval_squashed_flushes:0

Replication

role:master
connected_slaves:0
master_replid:cacbb1d0e0d11212117a0782f6810a744fcccf03

Commandstats

cmdstat_command:calls=2,usec=12673,usec_per_call=6336.5
cmdstat_info:calls=1549,usec=1680434,usec_per_call=1084.85
cmdstat_memory:calls=3,usec=1196,usec_per_call=398.667
cmdstat_ping:calls=122,usec=3259,usec_per_call=26.7131
cmdstat_set:calls=409898,usec=43167388,usec_per_call=105.313

Modules

module:name=ReJSON,ver=20000,api=1,filters=0,usedby=[search],using=,options=[handle-io-errors]
module:name=search,ver=20000,api=1,filters=0,usedby=,using=[ReJSON],options=[handle-io-errors]

Search

search_memory:0
search_num_indices:0
search_num_entries:0

Errorstats

COMMAND DOCS Not Implemented:1
-LOADING Dragonfly is loading the dataset in memory:14665
syntax_error:2

Keyspace

db0:keys=474603534,expires=474187297,avg_ttl=-1

Cpu

used_cpu_sys:405.383363
used_cpu_user:1968.673714
used_cpu_sys_children:0.1299
used_cpu_user_children:0.1633
used_cpu_sys_main_thread:16.125300
used_cpu_user_main_thread:88.914462

Cluster

cluster_enabled:0
migration_errors_total:0
total_migrated_keys:0

And the result of memory malloc-stats:

127.0.0.1:6379> memory malloc-stats ___ Begin malloc stats ___ arena: 14913536, ordblks: 56, smblks: 0 hblks: 0, hblkhd: 0, usmblks: 0 fsmblks: 0, uordblks: 10634416, fordblks: 4279120, keepcost: 197632 ___ End malloc stats ___ ___ Begin mimalloc stats ___ heap stats: peak total freed current unit count reserved: 168.0 GiB 168.0 GiB 0 168.0 GiB committed: 163.3 GiB 168.0 GiB 4.8 GiB 163.1 GiB reset: 0 purged: 337.5 MiB touched: 947.7 KiB 15.1 MiB 163.6 MiB -148.5 MiB ok segments: 241 243 2 241 not all freed -abandoned: 0 0 0 0 ok -cached: 0 0 0 0 ok pages: 0 0 693 -693 ok -abandoned: 0 0 0 0 ok -extended: 0 -noretire: 0 arenas: 35 -crossover: 0 -rollback: 0 mmaps: 0 commits: 0 resets: 0 purges: 232 threads: 44 44 0 44 not all freed searches: 0.0 avg numa nodes: 1 elapsed: 5828.089 s process: user: 2030.446 s, system: 417.763 s, faults: 17, rss: 164.0 GiB, commit: 163.3 GiB ___ End mimalloc stats ___

Any help or insights on what I might be doing wrong in my configuration or how to sort this out are much appreciated.