refactor: save schema in proto#1029
Conversation
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
JoanFM
left a comment
There was a problem hiding this comment.
A good test could be to add a simple RPC method receiving DocArrays
Isn't it overkill ? As long as the proto can be serialize and deserialize aren't we already sure it will work ? Or where u reference to the case when we don't have the class in the receiving part ? |
1855ad2 to
17a0008
Compare
|
@samsja raised a point that renaming classes in the future (as part of a refactoring) would break backwards and forwards compatibility of the proto between docarray versions. So let's go with an explicit key instead of the class name |
JohannesMessner
left a comment
There was a problem hiding this comment.
private decorators look so weird, @_whatever 🥲
Co-authored-by: Charlotte Gerhaher <[email protected]> Signed-off-by: samsja <[email protected]>
|
📝 Docs are deployed on https://ft-refactor-save-schema-in-proto--jina-docs.netlify.app 🎉 |
Context
refactor the proto serialization under the hood.
In DocArray we have different types with different functionality that will be stored using the same raw type (bytes, str, NdArray). So once we serialize this types we loose the information of what is the true DocArray type. Is the NdArray in the proto and AudioTensor or a VideoTensor ?
Therefore we need to have a way to encode this information somewhere in the proto.
Before this PR
Before this PR we were storing this information (which DocArray type does the object belongs to) in the key of the proto itself.
docarray.protolooked like this :But this does not scale well as we need to modify the proto each time we add a new modalities.
What this PR do
This PR change the behavior and instead of storing the information as a key we store it as a extra string. We basically save the schema of the Document in the proto.
Therefore for each of our Type we need to assign a corresponding string that will be used when serializing. In class is store its key so we have mapping from class to key. Nevertheless for deserializing we need to inverse, i.e, a mapping from key to class.
To avoid to store this double mapping in two place and doing copy pasting we introduce the
register_protoclass decorator that take a key as parameters and will assign the key to the class and at the same time register the class and the key into a dict to have the mapping from key to class. This dict will then be used in the deserializing part.The proto looks like this