Skip to content

v0.31.0 release note draft #1456

@samsja

Description

@samsja

Release Note

This release contains 4 new features, 11 bug fixes, and several documentation improvements.

💥 Breaking changes

Return type of DocVec Optional Tensor (#1472)

Optional tensor fields in a DocVec will return None instead of a list of Nan if the column does not hold any tensor.

This code snippet shows the breaking change:

from typing import Optional

from docarray import BaseDoc, DocVec
from docarray.typing import NdArray

class MyDoc(BaseDoc):
    tensor: Optional[NdArray[10]]

docs = DocVec[MyDoc]([MyDoc() for j in range(2)])

print(docs.tensor)
Version Return type
0.30.0 [nan nan]
0.31.0 None

🆕 Features

Add InMemoryDocIndex (#1441)

In this version we have introduced the InMemoryDocIndex Document Index which allows you to perform in-memory exact vector search (as opposed to approximate nearest neighbor search in vector databases).

The InMemoryDocIndex can be used for prototyping and is suitable for dealing with small-scale documents (1k-10k), as opposed to a vector database that is suitable for larger scales but comes with a performance overhead at smaller scales.

from docarray import BaseDoc, DocList
from docarray.index.backends.in_memory import InMemoryDocIndex
from docarray.typing import NdArray

import numpy as np

class MyDoc(BaseDoc):
    tensor: NdArray[512]

docs = DocList[MyDoc](MyDoc(tensor=i*np.ones(512)) for i in range(10))

doc_index = InMemoryDocIndex[MyDoc]()
doc_index.index(docs)

print(doc_index.find(3*np.ones(512), search_field='tensor', top_k=3))
FindResult(documents=<DocList[MyDoc] (length=10)>, scores=array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))

DocList inherits from Python list (#1457)

DocList is now a subclass of Python's list. This means that you can now use all the methods that are available to Python lists on DocList objects. For example, you can now use len on DocList objects and tools like Pydantic or FastAPI will be able to work with it more easily.

Add len to DocIndex (#1454)

You can now perform len(vector_index) which is equivalent to vector_index.num_docs().

Other minor features

🐞 Bug Fixes

Point to older versions when importing Document or Documentarray (#1422)

Trying to load Document or DocumentArray from DocArray would previously raise an error, saying that you needed to downgrade your version of DocArray if you wanted to use these two objects. This behavior has been fixed.

Fix AnyDoc from_protobuf (#1437)

AnyDoc can now read any BaseDoc protobuf file. The same applies to DocList.

Other bug fixes

📗 Documentation Improvements

🤟 Contributors

We would like to thank all contributors to this release:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions