Skip to content

gracefulClose stops servers due to a lot of TCP states #438

@larskuhtz

Description

@larskuhtz

We run a p2p network with Haskell nodes using network + tls + warp for the server and network + tls + http-client for the client components.

We observed that nodes that are using network version < 3.1.1.0 have been running without issues for weeks, while nodes that are using network >=3.1.1.0 are stopping to make and serve requests after running for a few days.

Bad nodes don't accept any incoming connections and fail to establish outgoing connections.

On the bad nodes there is no increase in memory consumption and CPU usage is low, since they are not doing anything useful without being able to make network connections. The number of open file descriptors is moderate, but many of the TCP sockets are in a CLOSE_WAIT state. Most of those sockets are not listed by lsof, but are only shown by netstat without an associated process.

The following are two typical TCP sessions:

No.	Time	Source	Destination	Protocol	Length	Info
28	2.100345	172.31.20.24	47.108.144.139	TCP	66	41372 → 9443 [FIN, ACK] Seq=1 Ack=1 Win=211 Len=0 TSval=2546593558 TSecr=1766578430
30	2.373855	47.108.144.139	172.31.20.24	TCP	198	9443 → 41372 [PSH, ACK] Seq=1 Ack=2 Win=227 Len=132 TSval=1766583433 TSecr=2546593558 [TCP segment of a reassembled PDU]
31	2.373887	172.31.20.24	47.108.144.139	TCP	54	41372 → 9443 [RST] Seq=2 Win=0 Len=0
32	2.373894	47.108.144.139	172.31.20.24	TCP	98	9443 → 41372 [PSH, ACK] Seq=133 Ack=2 Win=227 Len=32 TSval=1766583433 TSecr=2546593558 [TCP segment of a reassembled PDU]
33	2.373897	172.31.20.24	47.108.144.139	TCP	54	41372 → 9443 [RST] Seq=2 Win=0 Len=0
34	2.373899	47.108.144.139	172.31.20.24	TCP	84	9443 → 41372 [PSH, ACK] Seq=165 Ack=2 Win=227 Len=18 TSval=1766583433 TSecr=2546593558 [TCP segment of a reassembled PDU]
35	2.373901	172.31.20.24	47.108.144.139	TCP	54	41372 → 9443 [RST] Seq=2 Win=0 Len=0
36	2.373902	47.108.144.139	172.31.20.24	HTTP	66	HTTP/1.1 426 Upgrade Required          (text/plain)
37	2.373904	172.31.20.24	47.108.144.139	TCP	54	41372 → 9443 [RST] Seq=2 Win=0 Len=0
No.	Time	Source	Destination	Protocol	Length	Info
161	7.803316	47.245.52.190	172.31.20.24	TCP	74	43686 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2824405820 TSecr=0 WS=128
220	8.818459	47.245.52.190	172.31.20.24	TCP	74	[TCP Retransmission] 43686 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2824406835 TSecr=0 WS=128
244	10.834435	47.245.52.190	172.31.20.24	TCP	74	[TCP Retransmission] 43686 → 443 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2824408851 TSecr=0 WS=128

HTTP TCP sessions from other processes seem fine.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions