Skip to content

Docker swarm resetting TCP connection every 2 hours #34867

@wjans

Description

@wjans

Description

I'm having an issue with my docker swarm setup where I get a connection reset between containers every 2 hours. I'm not having this issue when I run the same containers in a docker compose setup.

I'm having the following containers:

  • 1 Redis instance on one worker
  • 3 Redis Sentinel instances monitoring the Redis instance on two workers and the manager
  • A Spring Boot client connecting to the Sentinels on one worker and retrieving values from Redis every 5 seconds

Steps to reproduce the issue:

  1. Start docker swarm
  2. Leave the services running for 2 hours
  3. Connection reset exception in Sprint Boot client logs

Describe the results you received:

2017-09-15 13:08:45.871 ERROR 5 --- [.122.141:26380]] redis.clients.jedis.JedisSentinelPool    : Lost connection to Sentinel at 192.168.122.141:26380. Sleeping 5000ms and retrying.

redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketException: Connection reset
	at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:202) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.Protocol.process(Protocol.java:151) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.Protocol.read(Protocol.java:215) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.Connection.getRawObjectMultiBulkReply(Connection.java:285) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:121) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:115) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.Jedis.subscribe(Jedis.java:2680) ~[jedis-2.9.0.jar!/:na]
	at redis.clients.jedis.JedisSentinelPool$MasterListener.run(JedisSentinelPool.java:291) ~[jedis-2.9.0.jar!/:na]
Caused by: java.net.SocketException: Connection reset
	at java.net.SocketInputStream.read(SocketInputStream.java:210) ~[na:1.8.0_144]
	at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[na:1.8.0_144]
	at java.net.SocketInputStream.read(SocketInputStream.java:127) ~[na:1.8.0_144]
	at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196) ~[jedis-2.9.0.jar!/:na]
	... 9 common frames omitted

Analysis on tcpdump show the connection being reset by the Sentinel docker instance
tcpdump.txt

Describe the results you expected:

No TCP connection reset

Additional information you deem important (e.g. issue happens only occasionally):
Docker Swarm configuration:

version: '3'

# a   192.168.122.32
# b   192.168.122.141
# mgr 192.168.122.162

services:
  redis:
    image: redis
    command: redis-server --bind 0.0.0.0 --slave-announce-ip 192.168.122.32 --slave-announce-port 6379
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "/usr/local/bin/redis-cli", "-h", "127.0.0.1", "ping"]
      interval: 5s
      timeout: 10s
      retries: 300
    user: root
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname == worker-a]

  sentinel_1:
    image: joshula/redis-sentinel
    command:
      - --port 26379
      - --sentinel announce-ip 192.168.122.32
      - --sentinel announce-port 26379
      - --sentinel monitor master 192.168.122.32 6379 1
    expose:
      - 26379
    ports:
      - 26379:26379
    depends_on:
      - redis
    healthcheck:
      test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26379", "ping"]
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname == worker-a]

  sentinel_2:
    image: joshula/redis-sentinel 
    command:
      - --port 26380
      - --sentinel announce-ip 192.168.122.141
      - --sentinel announce-port 26380
      - --sentinel monitor master 192.168.122.32 6379 1
    expose:
      - 26380
    ports:
      - 26380:26380
    depends_on:
      - redis
    healthcheck:
      test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26380", "ping"]
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname == worker-b]

  sentinel_3:
    image: joshula/redis-sentinel
    command:
      - --port 26381
      - --sentinel announce-ip 192.168.122.162
      - --sentinel announce-port 26381
      - --sentinel monitor master 192.168.122.32 6379 1
    expose:
      - 26381
    ports:
      - 26381:26381
    depends_on:
      - redis
    healthcheck:
      test: ["CMD", "/usr/local/bin/redis-cli", "-p", "26381", "ping"]
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname == controller]


  ixortalk-cache-client:
    image: ixortalk/ixortalk.cache.client
    environment:
      - JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dspring.redis.sentinel.nodes=192.168.122.32:26379,192.168.122.141:26380,192.168.122.162:26381
    expose:
      - 5005
      - 8200
    ports:
      - 5005:5005
      - 8200:8200
    depends_on:
      - sentinel_1
      - sentinel_2
      - sentinel_3
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.hostname == worker-a]

Output of docker version:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64
 Experimental: false

Output of docker info:

Client:
 Version:      17.03.0-ce
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.0-ce
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   3a232c8
 Built:        Tue Feb 28 08:10:07 2017
 OS/Arch:      linux/amd64
 Experimental: false
[root@tpsplus-a ~]# docker info
Containers: 4
 Running: 3
 Paused: 0
 Stopped: 1
Images: 20
Server Version: 17.03.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: mpd7188kvciah3gtgbxof89b7
 Is Manager: false
 Node Address: 192.168.122.32
 Manager Addresses:
  192.168.122.162:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 977c511eda0925a723debdc94d09459af49d082a
runc version: a01dafd48bc1c7cc12bdb01206f9fea7dd6feb70
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 11.58 GiB
Name: tpsplus-a
ID: ETPN:6AWH:3WIG:V62L:EADH:3VTX:TPWN:YIQJ:EUIN:ZVVD:2IFT:4GBI
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: davydewaele
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
Docker swarm running 2 workers and a manager in VMs running on one physical server.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions