refactor: optimization on stacking by samsja · Pull Request #1016 · docarray/docarray

samsja · 2023-01-13T16:41:05Z

Context

We recently introduced some refactoring on DocumentArrayStacked : #1008 which change a little bit the way we handled the stacking (We removed any state where the document does not hold the data but instead some time its hold a view of the data that has been moved to the column)

As noticed by @JohannesMessner this new way of stacking could be optimized because once the data has been stacked we re traverse the DocumentArray to put the view from the column to the document.

Originally posted by @samsja in #1008 (comment) there is room for improvements.
We could:

pre-allocate the data for the tensor column (we can predict the shape and the type by looking at the first row of the da)
when collecting the actual data we could replace the data with a view to the pre alocated column
when stacking put the data into the preallocate column in place. Therefore each document has already a view to the good part of the column and we don't need iterate again

What this PR do:

Use the optimization explained above
refactor the column creation code base to make it more concise and more readable

Signed-off-by: Sami Jaghouar <[email protected]>

JohannesMessner

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

Signed-off-by: Sami Jaghouar <[email protected]>

samsja · 2023-01-16T10:45:05Z

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

JohannesMessner · 2023-01-16T10:47:11Z

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

We talked about this on the other PR; "this" being the technique where we pre-allocate a tensor of desired shape and then set slices of it iteratively. I just want to make sure that that doesn't do anything funky in terms of runtime/memory that torch.stack() wouldn't do

samsja · 2023-01-16T10:47:48Z

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

okay now I remember, unfortunately https://discuss.pytorch.org/t/what-is-happening-under-the-hood-when-calling-torch-stack/170105 nobody responded to me on pytorch forums. Though I checked it looks like it is doing the same, and from my understanding it should do the same

JohannesMessner · 2023-01-16T10:50:26Z

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

okay now I remember, unfortunately https://discuss.pytorch.org/t/what-is-happening-under-the-hood-when-calling-torch-stack/170105 nobody responded to me on pytorch forums. Though I checked it looks like it is doing the same, and from my understanding it should do the same

Okay then!

Signed-off-by: Sami Jaghouar <[email protected]>

github-actions · 2023-01-16T11:02:14Z

📝 Docs are deployed on https://ft-refactor-optimization-stack--jina-docs.netlify.app 🎉

samsja added 2 commits January 13, 2023 17:03

refactor: better column creation

ac0800f

Signed-off-by: Sami Jaghouar <[email protected]>

refactor: better column creation fix tests

ad6d4a0

Signed-off-by: Sami Jaghouar <[email protected]>

github-actions bot added size/m area/core area/testing component/array labels Jan 13, 2023

samsja changed the title ~~Refactor optimization stack~~ refactor: optimization on stacking Jan 13, 2023

fix: fix mypy

d54842b

Signed-off-by: Sami Jaghouar <[email protected]>

JohannesMessner requested changes Jan 16, 2023

View reviewed changes

fix: fix mypy

86f9ea3

Signed-off-by: Sami Jaghouar <[email protected]>

github-actions bot added the area/typing label Jan 16, 2023

fix: fix import

9d4c0b2

Signed-off-by: Sami Jaghouar <[email protected]>

fix: fix import

baede3e

Signed-off-by: Sami Jaghouar <[email protected]>

JohannesMessner approved these changes Jan 16, 2023

View reviewed changes

samsja merged commit e2c46d4 into feat-rewrite-v2 Jan 16, 2023

samsja deleted the refactor-optimization-stack branch January 16, 2023 11:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: optimization on stacking#1016

refactor: optimization on stacking#1016
samsja merged 6 commits intofeat-rewrite-v2from
refactor-optimization-stack

samsja commented Jan 13, 2023

Uh oh!

JohannesMessner left a comment

Uh oh!

samsja commented Jan 16, 2023

Uh oh!

JohannesMessner commented Jan 16, 2023

Uh oh!

samsja commented Jan 16, 2023

Uh oh!

JohannesMessner commented Jan 16, 2023

Uh oh!

github-actions bot commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samsja commented Jan 13, 2023

Context

What this PR do:

Uh oh!

JohannesMessner left a comment

Choose a reason for hiding this comment

Uh oh!

samsja commented Jan 16, 2023

Uh oh!

JohannesMessner commented Jan 16, 2023

Uh oh!

samsja commented Jan 16, 2023

Uh oh!

JohannesMessner commented Jan 16, 2023

Uh oh!

github-actions bot commented Jan 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants