Description
I'm having an issue with my docker swarm setup where I get a connection reset between containers every 2 hours. I'm not having this issue when I run the same containers in a docker compose setup.
I'm having the following containers:
- 1 Redis instance on one worker
- 3 Redis Sentinel instances monitoring the Redis instance on two workers and the manager
- A Spring Boot client connecting to the Sentinels on one worker and retrieving values from Redis every 5 seconds
Steps to reproduce the issue:
- Start docker swarm
- Leave the services running for 2 hours
- Connection reset exception in Sprint Boot client logs
Describe the results you received:
2017-09-15 13:08:45.871 ERROR 5 --- [.122.141:26380]] redis.clients.jedis.JedisSentinelPool : Lost connection to Sentinel at 192.168.122.141:26380. Sleeping 5000ms and retrying.
redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Connection reset
at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:202) ~[jedis-2.9.0.jar!/:na]
at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.Protocol.process(Protocol.java:151) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.Protocol.read(Protocol.java:215) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.Connection.getRawObjectMultiBulkReply(Connection.java:285) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:121) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:115) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.Jedis.subscribe(Jedis.java:2680) ~[jedis-2.9.0.jar!/:na]
at redis.clients.jedis.JedisSentinelPool$MasterListener.run(JedisSentinelPool.java:291) ~[jedis-2.9.0.jar!/:na]
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210) ~[na:1.8.0_144]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_144]
at java.net.SocketInputStream.read(SocketInputStream.java:127) ~[na:1.8.0_144]
at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196) ~[jedis-2.9.0.jar!/:na]
... 9 common frames omitted
Analysis on tcpdump show the connection being reset by the Sentinel docker instance
tcpdump.txt
Describe the results you expected:
No TCP connection reset
Additional information you deem important (e.g. issue happens only occasionally):
Docker Swarm configuration:
version: '3'
# a 192.168.122.32
# b 192.168.122.141
# mgr 192.168.122.162
services:
redis:
image: redis
command: redis-server --bind 0.0.0.0 --slave-announce-ip 192.168.122.32 --slave-announce-port 6379
ports:
- 6379:6379
healthcheck:
test: ["CMD", "/usr/local/bin/redis-cli", "-h", "127.0.0.1", "ping"]
interval: 5s
timeout: 10s
retries: 300
user: root
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.hostname == worker-a]
sentinel_1:
image: joshula/redis-sentinel
command:
- --port 26379
- --sentinel announce-ip 192.168.122.32
- --sentinel announce-port 26379
- --sentinel monitor master 192.168.122.32 6379 1
expose:
- 26379
ports:
- 26379:26379
depends_on:
- redis
healthcheck:
test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26379", "ping"]
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.hostname == worker-a]
sentinel_2:
image: joshula/redis-sentinel
command:
- --port 26380
- --sentinel announce-ip 192.168.122.141
- --sentinel announce-port 26380
- --sentinel monitor master 192.168.122.32 6379 1
expose:
- 26380
ports:
- 26380:26380
depends_on:
- redis
healthcheck:
test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26380", "ping"]
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.hostname == worker-b]
sentinel_3:
image: joshula/redis-sentinel
command:
- --port 26381
- --sentinel announce-ip 192.168.122.162
- --sentinel announce-port 26381
- --sentinel monitor master 192.168.122.32 6379 1
expose:
- 26381
ports:
- 26381:26381
depends_on:
- redis
healthcheck:
test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26381", "ping"]
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.hostname == controller]
ixortalk-cache-client:
image: ixortalk/ixortalk.cache.client
environment:
- JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dspring.redis.sentinel.nodes=192.168.122.32:26379,192.168.122.141:26380,192.168.122.162:26381
expose:
- 5005
- 8200
ports:
- 5005:5005
- 8200:8200
depends_on:
- sentinel_1
- sentinel_2
- sentinel_3
deploy:
mode: replicated
replicas: 1
placement:
constraints: [node.hostname == worker-a]
Output of docker version:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Experimental: false
Output of docker info:
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 3a232c8
Built: Tue Feb 28 08:10:07 2017
OS/Arch: linux/amd64
Experimental: false
[root@tpsplus-a ~]# docker info
Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 20
Server Version: 17.03.0-ce
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: active
NodeID: mpd7188kvciah3gtgbxof89b7
Is Manager: false
Node Address: 192.168.122.32
Manager Addresses:
192.168.122.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-514.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 11.58 GiB
Name: tpsplus-a
ID: ETPN:6AWH:3WIG:V62L:EADH:3VTX:TPWN:YIQJ:EUIN:ZVVD:2IFT:4GBI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: davydewaele
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
Docker swarm running 2 workers and a manager in VMs running on one physical server.
Description
I'm having an issue with my docker swarm setup where I get a connection reset between containers every 2 hours. I'm not having this issue when I run the same containers in a docker compose setup.
I'm having the following containers:
Steps to reproduce the issue:
Describe the results you received:
Analysis on tcpdump show the connection being reset by the Sentinel docker instance
tcpdump.txt
Describe the results you expected:
No TCP connection reset
Additional information you deem important (e.g. issue happens only occasionally):
Docker Swarm configuration:
Output of
docker version:Output of
docker info:Additional environment details (AWS, VirtualBox, physical, etc.):
Docker swarm running 2 workers and a manager in VMs running on one physical server.