Skip to content

fix: doc from dataclass with list#1018

Merged
JohannesMessner merged 4 commits intomainfrom
fix-doc-from-dataclass-with-list
Jan 16, 2023
Merged

fix: doc from dataclass with list#1018
JohannesMessner merged 4 commits intomainfrom
fix-doc-from-dataclass-with-list

Conversation

@anna-charlotte
Copy link
Copy Markdown
Contributor

@anna-charlotte anna-charlotte commented Jan 16, 2023

Goals:

If we create a Document instance from a dataclass instance, all attributes are gonna be stored in their own Documents in the chunks. If one of the attributes is an iterable, such as a List[Images], we want to be able to access this DocumentArray as such. If it is of length 1 though, when accesing the images, we get only one Document instead of a DocumentArrray of length 1.

To change this behaviour we need to check if the initial attribute type was an iterable, in which case return a DocumentArray.

from docarray import Document, dataclass
from docarray.typing import Text, Image
from typing import List


@dataclass
class MyDoc:
    title: Text
    images: List[Image]


doc1 = MyDoc(title='doc 1', images=[
    'https://bkimg.cdn.bcebos.com/pic/359b033b5bb5c9ea15ce3757e86fa1003af33a871573?x-bce-process=image/watermark,image_d2F0ZXIvYmFpa2UxNTA=,g_7,xp_5,yp_5',
])
doc1 = Document(doc1)
print(f"type(doc1.images) = {type(doc1.images)}")
print(f"doc1.images = {doc1.images}")

doc2 = MyDoc(title='doc 2', images=[
    'https://bkimg.cdn.bcebos.com/pic/359b033b5bb5c9ea15ce3757e86fa1003af33a871573?x-bce-process=image/watermark,image_d2F0ZXIvYmFpa2UxNTA=,g_7,xp_5,yp_5',
    'https://bkimg.cdn.bcebos.com/pic/359b033b5bb5c9ea15ce3757e86fa1003af33a871573?x-bce-process=image/watermark,image_d2F0ZXIvYmFpa2UxNTA=,g_7,xp_5,yp_5',
])

doc2 = Document(doc2)
print(f"type(doc2.images) = {type(doc2.images)}")
print(f"doc2.images = {doc2.images}")

Output:

type(doc1.images) = <class 'docarray.document.Document'>
doc1.images = <Document ('id', 'parent_id', 'granularity', 'tensor', 'uri', '_metadata', 'modality') at 8ad7d1ac2dfd5541a297d7378b0106cc>
type(doc2.images) = <class 'docarray.array.chunk.ChunkArray'>
doc2.images = <DocumentArray (length=2) at 4766922160>

Expected:

type(doc1.images) = <class 'docarray.array.chunk.ChunkArray'>
doc1.images = <DocumentArray (length=1) at 7483917480>
type(doc2.images) = <class 'docarray.array.chunk.ChunkArray'>
doc2.images = <DocumentArray (length=2) at 4766922160>
  • check and update documentation, if required. See guide

@anna-charlotte anna-charlotte force-pushed the fix-doc-from-dataclass-with-list branch from b878081 to 7aff87c Compare January 16, 2023 10:46
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 16, 2023

Codecov Report

Base: 88.55% // Head: 88.56% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (c27b5fa) compared to base (3940c5b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1018   +/-   ##
=======================================
  Coverage   88.55%   88.56%           
=======================================
  Files         155      155           
  Lines        8058     8061    +3     
=======================================
+ Hits         7136     7139    +3     
  Misses        922      922           
Flag Coverage Δ
docarray 88.56% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/document/mixins/multimodal.py 94.11% <100.00%> (+0.15%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@anna-charlotte anna-charlotte marked this pull request as ready for review January 16, 2023 10:58
anna-charlotte added 2 commits January 16, 2023 12:11
@JohannesMessner JohannesMessner merged commit a924709 into main Jan 16, 2023
@JohannesMessner JohannesMessner deleted the fix-doc-from-dataclass-with-list branch January 16, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

自定义多模态Document报错

3 participants