RFC – Timeout Handling and Retry in LLM API Calls

Background

We recently got this error in our logs:

LLM API call failed: 504 Server Error: Gateway Time-out for url: https://api.xyz.com/chat/completions

This happened in our summarization service – the micro service responsible for summarizing transcripts.

We found additional logs related to the error:

Got transcript text from database for transcript with ID: xyz.
Calling LLM API with 194 seconds timeout...

The transcript with id xyz was a large piece of text, and the LLM API call took longer than 194 seconds, resulting in a timeout.

But timing out and losing the transcript is not ideal because it wastes expensive GPU resources and the user doesn't get a summary.

We should handle this case more gracefully.

Improve the code with a retry mechanism using tenacity and an improved estimation mechanism for the number of seconds to timeout.

Clone the repository (if you fork, other applicants will see your code)
Create a new branch for your changes
Make the necessary changes to the codebase in the branch
Commit your changes with a clear message and create a well-documented (description of your changes and reasoning) pull request in your clone
Send us the link to your pull request

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.dev		.env.dev
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
summarization_service.py		summarization_service.py