test(sqlite): add more test to cover sqlite backend by hanxiao · Pull Request #81 · docarray/docarray

hanxiao · 2022-01-26T13:48:01Z

No description provided.

codecov · 2022-01-26T13:51:24Z

Codecov Report

Merging #81 (459b6c8) into main (2d5e93b) will increase coverage by 0.43%.
The diff coverage is 88.37%.

@@            Coverage Diff             @@
##             main      #81      +/-   ##
==========================================
+ Coverage   83.75%   84.18%   +0.43%     
==========================================
  Files          92       93       +1     
  Lines        4117     4147      +30     
==========================================
+ Hits         3448     3491      +43     
+ Misses        669      656      -13

Flag	Coverage Δ
docarray	`84.18% <88.37%> (+0.43%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/array/storage/sqlite/getsetdel.py	`100.00% <ø> (+1.66%)`	⬆️
docarray/array/storage/weaviate/getsetdel.py	`80.00% <ø> (-19.02%)`	⬇️
docarray/array/mixins/setitem.py	`81.30% <80.88%> (-4.84%)`	⬇️
docarray/array/mixins/group.py	`87.17% <81.81%> (-2.48%)`	⬇️
docarray/math/ndarray.py	`90.52% <83.33%> (-0.30%)`	⬇️
docarray/array/storage/sqlite/binary.py	`89.65% <89.65%> (ø)`
docarray/array/storage/base/getsetdel.py	`81.25% <92.85%> (+27.72%)`	⬆️
docarray/array/mixins/content.py	`98.52% <100.00%> (-0.11%)`	⬇️
docarray/array/mixins/embed.py	`90.90% <100.00%> (ø)`
docarray/array/mixins/traverse.py	`94.44% <100.00%> (+1.90%)`	⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d5e93b...459b6c8. Read the comment docs.

alaeddine-13 · 2022-01-26T14:46:34Z

tests/unit/array/mixins/test_empty.py

+@pytest.mark.parametrize('da_cls', [DocumentArray, DocumentArraySqlite])
+def test_empty_non_zero(da_cls):


although empty is possible to DocumentArraySqlite (basicly because we allow config-free construction via temp files), I don't think it should be supported for storage backends in general (and no backend in particular).
If we think about it, where does DocumentArrayWeaviate.empty(10) store ? Should docs go to the default localhost:8080 server configuration ?

ref1: https://github.com/RaRe-Technologies/sqlitedict/blob/master/sqlitedict.py
ref2: https://github.com/osoken/sqlitecollections

they all support default connection as temp file, so it's not invented by us. let's follow this behavior

tests/unit/array/mixins/test_embed.py

hanxiao · 2022-01-27T14:00:53Z

docarray/array/mixins/group.py

+    def batch_indices(
+        self,
+        batch_size: int,
+        shuffle: bool = False,
+    ) -> Generator[list, None, None]:


this function seems never used anywhere?

It's not, it can be removed or kept as an alternative tobatch_ids since docarray suports both indexing by ids and indexing by indices. Probably for in_memory it's faster to do it by index and in sqlite to do it by ids.

removed, we can restore it later if needed

docarray/array/mixins/group.py

docarray/array/mixins/io/common.py

hanxiao · 2022-01-31T14:54:39Z

docarray/array/storage/sqlite/binary.py

+class SqliteBinaryIOMixin(BinaryIOMixin):
+    """Save/load an array to a binary file."""
+
+    @classmethod
+    def load_binary(
+        cls: Type['T'],
+        file: Union[str, BinaryIO, bytes],
+        protocol: str = 'protobuf-array',
+        compress: Optional[str] = None,
+        _show_progress: bool = False,
+        streaming: bool = False,
+    ) -> Union['DocumentArray', Generator['Document', None, None]]:
+        """Load array elements from a compressed binary file.
+
+        :param file: File or filename or serialized bytes where the data is stored.
+        :param protocol: protocol to use. 'pickle-array' is not supported for DocumentArraySqlite
+        :param compress: compress algorithm to use
+        :param _show_progress: show progress bar, only works when protocol is `pickle` or `protobuf`
+        :param streaming: if `True` returns a generator over `Document` objects.
+        In case protocol is pickle the `Documents` are streamed from disk to save memory usage
+        :return: a DocumentArray object
+        """
+        _check_protocol(protocol)


i think a better way to do this is the following:

moving _check_protocol(protocol) to DocumentArray level mixin

add class variable available_protocols , e.g.

class IOMixin: available_protocols = ['a', 'b', 'c', 'd'] def _check_protocols(self, value: str): if value not in self.available_protocols: raise ... def from_bytes(): self._check_protocols

Finally, all DA subclass just need to specify a white list by overriding available_protocols class variable

@hanxiao I'll still need to subclass the mix and override methods like load_binary because I need to change the default protocol. Is it okay to implement your suggestion while still overriding the methods ?

There's another option which is adding attribute default_protocol in the same way, and modifying the protocol parameter to be None (will bad for autocompletion).
Your thoughts ?

you are right, okay then let's leave it like this.

hanxiao

implement available_protocols on the top-level

samsja · 2022-01-31T15:05:56Z

docarray/array/mixins/setitem.py

-                    for si, _val in zip(index, value):
-                        self[si] = _val  # leverage existing setter
+                for si, _val in zip(index, value):
+                    self[si] = _val  # leverage existing setter


Invalid types in the sequence should be caught and not fail silently; particularly things like da[1.0, 2.0] coming from np or torch might be a pitfall.

Suggested change

self[si] = _val # leverage existing setter

else:

raise IndexError(f"{index} should be either a sequence of bool, int or str")

single quote btw

oups not used to it yet 😕

docarray/array/mixins/setitem.py

samsja · 2022-01-31T15:19:49Z

docarray/array/mixins/setitem.py

+                if _a in ('tensor', 'embedding'):
+                    if _a == 'tensor':
+                        _docs.tensors = _v
+                    elif _a == 'embedding':
+                        _docs.embeddings = _v
+                    for _d in _docs:
+                        self._set_doc_by_id(_d.id, _d)


Suggested change

if _a in ('tensor', 'embedding'):

if _a == 'tensor':

_docs.tensors = _v

elif _a == 'embedding':

_docs.embeddings = _v

for _d in _docs:

self._set_doc_by_id(_d.id, _d)

if _a == 'tensor':

_docs.tensors = _v

elif _a == 'embedding':

_docs.embeddings = _v

for _d in _docs:

self._set_doc_by_id(_d.id, _d)

for _d in _docs: self._set_doc_by_id(_d.id, _d)

should be common to both branches

docarray/array/mixins/setitem.py

docarray/math/ndarray.py

test(sqlite): add more test to cover sqlite backend

ae41e6d

github-actions bot added size/m area/core area/testing component/array labels Jan 26, 2022

alaeddine-13 added 2 commits January 26, 2022 15:25

fix: adapt embedding setters for storage backends

b208bc4

test: cover embeddings setter

863d864

github-actions bot added the component/math label Jan 26, 2022

alaeddine-13 added 3 commits January 26, 2022 15:33

fix: texts setter

c8c0c99

fix: tensors and blob setters

26b5edc

fix: linting

8683901

alaeddine-13 reviewed Jan 26, 2022

View reviewed changes

alaeddine-13 and others added 7 commits January 26, 2022 17:00

fix: embed for sqlite backend

a3ca29f

refactor: delegate to __setitem__ in content setters

aef8979

fix: linting

8bae8d5

test: cover set attributes with size 1

be13b62

fix: fix set attributes with size 1

bb97b2a

feat: add batching by id

caa2b9d

feat: add batching by id

bf0e769

alaeddine-13 reviewed Jan 27, 2022

View reviewed changes

tests/unit/array/mixins/test_embed.py Outdated Show resolved Hide resolved

alaeddine-13 reviewed Jan 27, 2022

View reviewed changes

tests/unit/array/mixins/test_embed.py Outdated Show resolved Hide resolved

davidbp and others added 3 commits January 27, 2022 11:41

fix: change protocol to protobuf in sqlite

bccd9ab

fix: fix setter by sequences

99da09f

test: text type should be string not integer

be82921

hanxiao commented Jan 27, 2022

View reviewed changes

alaeddine-13 added 4 commits January 27, 2022 16:07

test: cover ellipsis getter

ceebf56

feat: raise index error when mask size is not equal to length

728dc56

fix: setitem raise IndexError properly

bfcf784

test: cover mask with incorrect length

ec6e642

hanxiao commented Jan 31, 2022

View reviewed changes

docarray/array/mixins/group.py Outdated Show resolved Hide resolved

hanxiao commented Jan 31, 2022

View reviewed changes

docarray/array/mixins/group.py Outdated Show resolved Hide resolved

alaeddine-13 added 10 commits January 31, 2022 10:15

test: fix tests

f1e07b9

test: fix tests

9b8bd3c

feat: handle set traversal paths

7a8abd3

test: fix tests

8712de6

test: fix tests

b1cad20

test: fix tests

883139f

test: fix tests

71b8102

refactor: remove _default_protocol

8b92a1f

chore: apply suggestions

7b380f4

fix: protobuf-array as default protocol for sqlite

c45a632