Skip to content

[DPL] Support for resizable DataChunk#1846

Closed
matthiasrichter wants to merge 2 commits intoAliceO2Group:devfrom
matthiasrichter:dev-framework-data-allocator
Closed

[DPL] Support for resizable DataChunk#1846
matthiasrichter wants to merge 2 commits intoAliceO2Group:devfrom
matthiasrichter:dev-framework-data-allocator

Conversation

@matthiasrichter
Copy link
Copy Markdown
Collaborator

@matthiasrichter matthiasrichter commented Apr 2, 2019

Change DataChunk to be std::vector<char, o2::pmr::polymorphic_allocator>

Interface changes:

  • makeChunk returns reference to vector object
  • return value of adoptChunk is dropped, has not been used

This goes in line with the general idea of DataAllocator's make methods, where the
object is booked with one single call and is send automatically after processing ends.
Requires return by reference and to use a reference variable in the code. The disadvantage
is that accidental copy is possible and impossible to detect at compile time.

Details:
Introducing VectorObject to MessageContext, holding an instance of an o2::vector.
The memory is directly allocated in the message memory using the pmr functionality.
In order to avoid that objects go out of scope and the allocated buffer is available
at send, the created vector can not be returned by move, but is kept in the context
and returned by reference. This requires changes in the code to use the reference
instead of a copy.

The return value of adoptChunk must be dropped because a buffer can only be adopted
to a vector object using a special allocator. With this it can not simply be returned
as it is a different type. Using polymorphic_allocator even for trivially default
constructible types still invokes the constructor for the elements, thus changing
the memory of the underlying resource.

TODO:

  • adjust payload size in DataHeader before sending
  • add unit test for resizable chunk
  • use std::vector with polymorphic_allocator in favor of o2::vector
  • make sure that result of makeChunk is not copied to local variable - not in this PR requires more effort, see [DPL] Support for resizable DataChunk #1846 (comment)
  • add specialization of make for o2::vector<T> - not in this PR

@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch 2 times, most recently from 92a7ff6 to ad180b6 Compare April 2, 2019 23:06
@matthiasrichter matthiasrichter requested a review from ktf April 3, 2019 07:12
@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch from ad180b6 to 134d444 Compare April 5, 2019 06:09
@matthiasrichter matthiasrichter changed the title [DPL] Change DataChunk to be of type o2::vector<char> [DPL] Change DataChunk to vector type Apr 5, 2019
@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch from 134d444 to 392a1f9 Compare April 5, 2019 10:22
@matthiasrichter matthiasrichter changed the title [DPL] Change DataChunk to vector type [DPL] Support for resizable DataChunk Apr 5, 2019
@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch 2 times, most recently from 07395ee to f225104 Compare April 5, 2019 10:31
@matthiasrichter
Copy link
Copy Markdown
Collaborator Author

I have updated the PR and solved the task list. This introduces resizable DataChunk which one can get by a single newChunk call and which is automatically sent after processing. The make API though comes with a drawback, and this is the return by reference, because the instance needs to be stored internally for sending. The return by reference though easily leads to copy if the target variable is not a reference, i.e. a simple forgotten ampersand.

Have studied three options, just to summarize here:

  1. forbid the copy constructor
template<> std::vector<char, o2::pmr::polymorphic_allocator<char>>::vector(const std::vector<char, o2::pmr::polymorphic_allocator<char>>&) = delete;
  1. make DataChunk a class and forbid copy constructor
class DataChunk : public std::vector<char, o2::pmr::polymorphic_allocator<char>> {
 public:
  DataChunk(const DataChunk&) = delete;
  ...
}
  1. Use std::ref to return std::reference_wrapper

None of them is optimal.
Option 1) is probably the most efficient and cleanest solution but it has a drastic global effect, and for this side effect it's most likely out of the question.

Option 2) A bit more cumbersome code, and no option which can be applied to PODs, otherwise quite ok

Option 3) would work very nicely also for the make specializations for PODs because std::reference_wrapper implements the assignment operator. But the discouraging fact is that it also provides a type conversion operator, and this automatic conversion to the underlying type always allows copy. So the forgotten-ampersand mistake is not solved, and option 3 is ruled out.

In the end it's a question of DataAllocator's make interface., and probably we have to live with the drawbacks which come from not returning a pointer. Something which also makes sense. In summary there isn't an optimal solution, but this PR is in line with the current API.

@ktf
Copy link
Copy Markdown
Member

ktf commented Apr 5, 2019

Thank you for the nice write-up.

For 3) if you use auto you do not have the problem you mention, correct?

Anyways, I would go for 2), as 1) is a bit too drastic IMHO, we can always introduce it later on.

Another possibility would be to use a o2::observer_ptr (IIRC the name), but I know you have concerns for that.

To summarise I would go for what I consider the most local change (2) and then expand, once we make up our mind.

@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch from f225104 to 067e294 Compare April 10, 2019 09:44
Change DataChunk to be of type std::vector<char, o2::pmr::polymorphic_allocator>

Interface changes:
- makeChunk returns reference to std::vector<char> object with polymorphic_allocator
- return value of adoptChunk is dropped, not possible to preserve this, not been used
  anyhow

This goes in line with the general idea of DataAllocator's `make` methods, where the
object is booked with one single call and is send automatically after processing ends.
Requires return by reference and to use a reference variable in the code. The disadvantage
is that accidental copy is possible and impossible to detect at compile time.

Details:
Introducing VectorObject to MessageContext, holding an instance of a vector.
The memory is directly allocated in the message memory using the pmr functionality.
In order to avoid that objects go out of scope and the allocated buffer is available
at send, the created vector can not be returned by move, but is kept in the context
and returned by reference. This requires changes in the code to use the reference
instead of a copy.

The return value of adoptChunk must be dropped because a buffer can only be adopted
to a vector object using a special allocator. With this it can not simply be returned
as it is a different type. Using polymorphic_allocator even for trivially default
constructible types still invokes the constructor for the elements, thus changing
the memory of the underlying resource.
Make DataChunk a class deriving from std::vector instead of an alias. This allows
to delete the copy constructor and assignment operator.

DPL's output allocator returns the created DataChunk by reference and thus it's important
not to make a copy in the code. Now it's ensured to detect this at compile time.
@matthiasrichter matthiasrichter force-pushed the dev-framework-data-allocator branch from 067e294 to 5fb8d96 Compare April 10, 2019 10:36
@matthiasrichter
Copy link
Copy Markdown
Collaborator Author

Can not access CI results from this PR any more, will open another one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants