Use multipart upload in S3 push pull backend

Currently, pushing a Document stream to S3 works fine.

```python
N: int = 2 ** 20
DocumentArray[TextDoc].push_stream((TextDoc(text=f'text {i}') for i in range(N)), url=f's3://da-pushpull/da-{N}', show_progress=True)
```

However, the upload does not use multipart upload. This is because ``smart_open``, which we use to write to S3 object, will be very slow when uploading parts other than the first one.

The multipart upload was turned off so that the upload succeeds, but this causes the upload to use more memory than is necessary.

**Describe the solution you'd like**
Uploading DocumentArrays with size above the multipart size should have constant memory usage.
i.e. Uploading a 1GB Document stream should use the same memory as a 2GB Document stream

**Describe alternatives you've considered**
Maybe we can implement the multipart uploading ourselves?

**Additional context**
The mutlipart upload works fine in local tests and CI, which uses a minio container as the S3 endpoint.
However, it will get stuck on uploads to "the real" S3.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multipart upload in S3 push pull backend #1230

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use multipart upload in S3 push pull backend #1230

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions