-
Notifications
You must be signed in to change notification settings - Fork 238
bug: _ipython_display_ stacking tensors/embeddings #130
Description
When printing DocumentArray information in a jupyter notebook, which ends up calling _ipython_display which calls summary currently the codebase stacks embeddings/ tensors.
This does not work and provides ValueError: all input arrays must have the same shape
from docarray import DocumentArray,Document
import numpy as np
da = DocumentArray([Document(tensor=np.zeros(3)), Document(tensor=np.zeros(4))])
da._ipython_display_()but this works as expected
In [4]: from docarray import DocumentArray,Document
...: import numpy as np
...: da = DocumentArray([Document(tensor=np.zeros(3)), Document(tensor=np.zeros(3))])
...: da._ipython_display_()
...:
...:
Documents Summary
Length 2
Homogenous Documents True
Common Attributes ('id', 'tensor')
Attributes Summary
Attribute Data type #Unique values Has empty value
─────────────────────────────────────────────────────────────
id ('str',) 2 False
tensor ('ndarray',) 2 False
Storage Summary
Class DocumentArrayInMemory
Backend In Memory Why this happens
When plotting to a jupyter notebook _ipython_display_ is called which calls summary which calls
all_attrs_values = self._get_attributes(*all_attrs_names). If there are tensor or embedding fields then all_attrs_names contains them. This implies .tensors or .emdeddings can be called which will break since data can't be stacked.
Workaround
Never call .tensors or .emdeddings which is actually quite dangerous for big datasets because it will allocate the memory for all the vectors.