Skip to content

refactor: save schema in proto#1029

Merged
samsja merged 18 commits intofeat-rewrite-v2from
refactor-save-schema-in-proto
Jan 19, 2023
Merged

refactor: save schema in proto#1029
samsja merged 18 commits intofeat-rewrite-v2from
refactor-save-schema-in-proto

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Jan 17, 2023

Context

refactor the proto serialization under the hood.

In DocArray we have different types with different functionality that will be stored using the same raw type (bytes, str, NdArray). So once we serialize this types we loose the information of what is the true DocArray type. Is the NdArray in the proto and AudioTensor or a VideoTensor ?

Therefore we need to have a way to encode this information somewhere in the proto.

Before this PR

Before this PR we were storing this information (which DocArray type does the object belongs to) in the key of the proto itself. docarray.proto looked like this :

message NodeProto {

  oneof content {
    bytes blob = 1;
    // the ndarray of the image/audio/video document
    NdArrayProto ndarray = 2;
    // a text
    string text = 3;
    // a sub Document
    
    NdArrayProto audio_ndarray = 4
    NdArrayProto video_ndarray = 5
  }

  oneof docarray_type {
    string type = 6;
  }

}

But this does not scale well as we need to modify the proto each time we add a new modalities.

What this PR do

This PR change the behavior and instead of storing the information as a key we store it as a extra string. We basically save the schema of the Document in the proto.

Therefore for each of our Type we need to assign a corresponding string that will be used when serializing. In class is store its key so we have mapping from class to key. Nevertheless for deserializing we need to inverse, i.e, a mapping from key to class.

To avoid to store this double mapping in two place and doing copy pasting we introduce the
register_proto class decorator that take a key as parameters and will assign the key to the class and at the same time register the class and the key into a dict to have the mapping from key to class. This dict will then be used in the deserializing part.

The proto looks like this

message NodeProto {

  oneof content {
    bytes blob = 1;
    // the ndarray of the image/audio/video document
    NdArrayProto ndarray = 2;
    // a text
    string text = 3;
    // a sub Document

  }

  oneof docarray_type {
    string type = 6;
  }

}

Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
Signed-off-by: Sami Jaghouar <[email protected]>
@samsja samsja changed the title Refactor save schema in proto refactor: save schema in proto Jan 18, 2023
Signed-off-by: Sami Jaghouar <[email protected]>
Copy link
Copy Markdown
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good test could be to add a simple RPC method receiving DocArrays

@samsja
Copy link
Copy Markdown
Member Author

samsja commented Jan 18, 2023

A good test could be to add a simple RPC method receiving DocArrays

Isn't it overkill ? As long as the proto can be serialize and deserialize aren't we already sure it will work ?

Or where u reference to the case when we don't have the class in the receiving part ?

JoanFM
JoanFM previously requested changes Jan 18, 2023
@samsja samsja force-pushed the refactor-save-schema-in-proto branch from 1855ad2 to 17a0008 Compare January 19, 2023 09:00
@JohannesMessner
Copy link
Copy Markdown
Member

@samsja raised a point that renaming classes in the future (as part of a refactoring) would break backwards and forwards compatibility of the proto between docarray versions. So let's go with an explicit key instead of the class name

Copy link
Copy Markdown
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private decorators look so weird, @_whatever 🥲

samsja and others added 2 commits January 19, 2023 16:31
@github-actions
Copy link
Copy Markdown

📝 Docs are deployed on https://ft-refactor-save-schema-in-proto--jina-docs.netlify.app 🎉

@samsja samsja dismissed JoanFM’s stale review January 19, 2023 15:40

orally agree

@samsja samsja merged commit cebc2b4 into feat-rewrite-v2 Jan 19, 2023
@samsja samsja deleted the refactor-save-schema-in-proto branch January 19, 2023 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants