Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a rollup MR for a couple of bugs I've observed around timeouts while uploading large files to underpowered VMs.
The primary bug involved hitting nginx read/write timeouts while the upload has stalled network traffic after it's finished uploading the actual content but is waiting for the server to respond and finish the http request. This happened because we chunk large uploads into tiny files in minio, then merge them back together into one when the upload is finished. This consumes a fair bit of I/O, and could take longer than nginx's default timeouts.
This also exposed a weak spot in the codebase around upload failures and making sure that all related minio objects were being deleted. Particularly that only completed chunks were being added to the deletion queue should the request fail. But if there was a chunk in progress when the upload failed, the partial chunk itself would not be added and deleted. Leaving behind many <16mb chunks.
This also fixed a pnpm issue around the image we were using being unmaintained for the last 2 years. Causing newer local installations of pnpm breaking docker builds.