Skip to content

refactor: optimization on stacking#1016

Merged
samsja merged 6 commits intofeat-rewrite-v2from
refactor-optimization-stack
Jan 16, 2023
Merged

refactor: optimization on stacking#1016
samsja merged 6 commits intofeat-rewrite-v2from
refactor-optimization-stack

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jan 13, 2023

Context

We recently introduced some refactoring on DocumentArrayStacked : #1008 which change a little bit the way we handled the stacking (We removed any state where the document does not hold the data but instead some time its hold a view of the data that has been moved to the column)

As noticed by @JohannesMessner this new way of stacking could be optimized because once the data has been stacked we re traverse the DocumentArray to put the view from the column to the document.

Originally posted by @samsja in #1008 (comment) there is room for improvements.
We could:

  • pre-allocate the data for the tensor column (we can predict the shape and the type by looking at the first row of the da)
  • when collecting the actual data we could replace the data with a view to the pre alocated column
  • when stacking put the data into the preallocate column in place. Therefore each document has already a view to the good part of the column and we don't need iterate again

What this PR do:

  • Use the optimization explained above
  • refactor the column creation code base to make it more concise and more readable

@samsja samsja changed the title Refactor optimization stack refactor: optimization on stacking Jan 13, 2023
Signed-off-by: Sami Jaghouar <[email protected]>
Copy link
Copy Markdown
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
@samsja
Copy link
Copy Markdown
Member Author

samsja commented Jan 16, 2023

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

@JohannesMessner
Copy link
Copy Markdown
Member

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

We talked about this on the other PR; "this" being the technique where we pre-allocate a tensor of desired shape and then set slices of it iteratively. I just want to make sure that that doesn't do anything funky in terms of runtime/memory that torch.stack() wouldn't do

@samsja
Copy link
Copy Markdown
Member Author

samsja commented Jan 16, 2023

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

okay now I remember, unfortunately https://discuss.pytorch.org/t/what-is-happening-under-the-hood-when-calling-torch-stack/170105 nobody responded to me on pytorch forums. Though I checked it looks like it is doing the same, and from my understanding it should do the same

@JohannesMessner
Copy link
Copy Markdown
Member

LGTM, but did we make sure that this is equivalent to torch.stack() with no unintended consequnces?

I am not sure to understand the question. What "this" refer to here ?

okay now I remember, unfortunately https://discuss.pytorch.org/t/what-is-happening-under-the-hood-when-calling-torch-stack/170105 nobody responded to me on pytorch forums. Though I checked it looks like it is doing the same, and from my understanding it should do the same

Okay then!

Signed-off-by: Sami Jaghouar <[email protected]>
@github-actions
Copy link
Copy Markdown

📝 Docs are deployed on https://ft-refactor-optimization-stack--jina-docs.netlify.app 🎉

@samsja samsja merged commit e2c46d4 into feat-rewrite-v2 Jan 16, 2023
@samsja samsja deleted the refactor-optimization-stack branch January 16, 2023 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants