-
Notifications
You must be signed in to change notification settings - Fork 873
Connection Pooling Deadlocks #6029
Description
We are experiencing application lock up which appears to be due to thread pool exhaustion.
Analysing the memory dump yields approx. 200 tasks waiting for connection to DB, whilst the other ~20 tasks are writing to error logs:

Within the last 6 hours, we migrated the database from a separate host, onto the same host as the application, both now within docker swarm in an attempt to isolate networking issues, without avail.
Traces:
[Error] An exception occurred while iterating over the results of a query for context type '"Rust.Domain.Infrastructure.Uow.RustDbContext"'."
""System.InvalidOperationException: An exception has been raised that is likely due to a transient failure.
---> Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
---> System.TimeoutException: Timeout during reading attempt
at Npgsql.Internal.NpgsqlReadBuffer.<Ensure>g__EnsureLong|55_0(NpgsqlReadBuffer buffer, Int32 count, Boolean async, Boolean readingNotifications)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.AuthenticateSASL(List`1 mechanisms, String username, Boolean async, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.Authenticate(String username, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.<Open>g__OpenCore|214_1(NpgsqlConnector conn, SslMode sslMode, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.PoolingDataSource.<Get>g__RentAsync|33_0(NpgsqlConnection conn, NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<Open>g__OpenAsync|42_0(Boolean async, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()"
System.InvalidOperationException: An exception has been raised that is likely due to a transient failure.
at async Task<TResult> Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync<TState, TResult>(TState state, Func<DbContext, TState, CancellationToken, Task<TResult>> operation, Func<DbContext, TState, CancellationToken, Task<ExecutionResult<TResult>>> verifySucceeded, CancellationToken cancellationToken)
at async ValueTask<bool> Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable<T>+AsyncEnumerator.MoveNextAsync() ---> Npgsql.NpgsqlException: Exception while reading from stream
at async void Npgsql.Internal.NpgsqlReadBuffer.Ensure(int count)+EnsureLong(?)
at async ValueTask<IBackendMessage> Npgsql.Internal.NpgsqlConnector.ReadMessageLong(bool async, DataRowLoadingMode dataRowLoadingMode, bool readingNotifications, bool isReadingPrependedMessage)
at async Task Npgsql.Internal.NpgsqlConnector.AuthenticateSASL(List<string> mechanisms, string username, bool async, CancellationToken cancellationToken)
at async Task Npgsql.Internal.NpgsqlConnector.Authenticate(string username, NpgsqlTimeout timeout, bool async, CancellationToken cancellationToken)
at async Task Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, bool async, CancellationToken cancellationToken)+OpenCore(?)
at async Task Npgsql.Internal.NpgsqlConnector.Open(NpgsqlTimeout timeout, bool async, CancellationToken cancellationToken)
at async ValueTask<NpgsqlConnector> Npgsql.PoolingDataSource.OpenNewConnector(NpgsqlConnection conn, NpgsqlTimeout timeout, bool async, CancellationToken cancellationToken)
at async ValueTask<NpgsqlConnector> Npgsql.PoolingDataSource.Get(NpgsqlConnection conn, NpgsqlTimeout timeout, bool async, CancellationToken cancellationToken)+RentAsync(?)
at async void Npgsql.NpgsqlConnection.Open()+OpenAsync(?)
at async Task Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(bool errorsExpected, CancellationToken cancellationToken) x 2
at async Task<bool> Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, bool errorsExpected)
at async Task<RelationalDataReader> Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at async Task<bool> Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable<T>+AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at async Task<TResult> Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync<TState, TResult>(TState state, Func<DbContext, TState, CancellationToken, Task<TResult>> operation, Func<DbContext, TState, CancellationToken, Task<ExecutionResult<TResult>>> verifySucceeded, CancellationToken cancellationToken) ---> System.TimeoutException: Timeout during reading attempt
--- End of inner exception stack trace ---
--- End of inner exception stack trace ---
This is the thread pool metrics, 1 sample per 5 seconds, leading up-to and including the event until the application is manually restarted at 04:44am:

Connection String
NpgsqlConnectionStringBuilder builder = new()
{
Database = configuration.Database.Database,
Pooling = true,
MaxPoolSize = configuration.Database.MaxPoolSize, // 2000
MinPoolSize = configuration.Database.MinPoolSize, // 1000
Password = configuration.Database.Password,
Port = configuration.Database.Port,
Host = configuration.Database.Host,
Username = configuration.Database.Username,
Timeout = 15,
KeepAlive = 30,
ApplicationName = nameof(Program),
CommandTimeout = 120
};Database configuration:
Other than some thread and memory adjustments, we have explicitly disabled SSL, the rest is default.
Packages:
Npgsql 9.0.2
Npgsql.EntityFrameworkCore.PostgreSQL version 9.0.2
Environment:
Docker Swarm
.NET 9.0.2
Unix 6.8.0.51 x64