-
Notifications
You must be signed in to change notification settings - Fork 238
v0.31.0 release note draft #1456
Description
Release Note
This release contains 4 new features, 11 bug fixes, and several documentation improvements.
💥 Breaking changes
Return type of DocVec Optional Tensor (#1472)
Optional tensor fields in a DocVec will return None instead of a list of Nan if the column does not hold any tensor.
This code snippet shows the breaking change:
from typing import Optional
from docarray import BaseDoc, DocVec
from docarray.typing import NdArray
class MyDoc(BaseDoc):
tensor: Optional[NdArray[10]]
docs = DocVec[MyDoc]([MyDoc() for j in range(2)])
print(docs.tensor)| Version | Return type |
|---|---|
| 0.30.0 | [nan nan] |
| 0.31.0 | None |
🆕 Features
Add InMemoryDocIndex (#1441)
In this version we have introduced the InMemoryDocIndex Document Index which allows you to perform in-memory exact vector search (as opposed to approximate nearest neighbor search in vector databases).
The InMemoryDocIndex can be used for prototyping and is suitable for dealing with small-scale documents (1k-10k), as opposed to a vector database that is suitable for larger scales but comes with a performance overhead at smaller scales.
from docarray import BaseDoc, DocList
from docarray.index.backends.in_memory import InMemoryDocIndex
from docarray.typing import NdArray
import numpy as np
class MyDoc(BaseDoc):
tensor: NdArray[512]
docs = DocList[MyDoc](MyDoc(tensor=i*np.ones(512)) for i in range(10))
doc_index = InMemoryDocIndex[MyDoc]()
doc_index.index(docs)
print(doc_index.find(3*np.ones(512), search_field='tensor', top_k=3))FindResult(documents=<DocList[MyDoc] (length=10)>, scores=array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))DocList inherits from Python list (#1457)
DocList is now a subclass of Python's list. This means that you can now use all the methods that are available to Python lists on DocList objects. For example, you can now use len on DocList objects and tools like Pydantic or FastAPI will be able to work with it more easily.
Add len to DocIndex (#1454)
You can now perform len(vector_index) which is equivalent to vector_index.num_docs().
Other minor features
- Add a
to_jsonalias toBaseDoc(feat: add to_json alias #1494)
🐞 Bug Fixes
Point to older versions when importing Document or Documentarray (#1422)
Trying to load Document or DocumentArray from DocArray would previously raise an error, saying that you needed to downgrade your version of DocArray if you wanted to use these two objects. This behavior has been fixed.
Fix AnyDoc from_protobuf (#1437)
AnyDoc can now read any BaseDoc protobuf file. The same applies to DocList.
Other bug fixes
- Fix
extendtoDocList(fix: fix extend with itself infinite recursion #1493) - Fix bug when calling
dict()onBaseDoc(fix: fix to dict exclude #1481) - Fix bug when calling
json()onBaseDoc(fix: fix to dict exclude #1481) - Support Pandas 2.0 by using
pd.concat()instead ofdf.append()into_dataframe()to avoid warning (fix: usepd.concat()insteaddf.append()into_dataframe()to avoid warning #1478) - Add logs to Elasticsearch index (Add logs to elasticsearch index #1427)
- Fix a bug in Document Index where Torch tensors that required grad were not able to be converted to
ndarray(fix: torch tensor with grad to numpy #1429) - Fix a bug with HNSW (fix: passes max_element when load index in hnswlib #1426)
- Hubble Binary format version bump (fix: binary format version bump #1414)
- Save index during creation for
hnswlib(fix: save index during creation for hnswlib #1424)
📗 Documentation Improvements
- Fix FastAPI docs (docs: fix fastapi #1453)
- Index predefined Documents (docs: index predefined documents #1434)
- Clean up data types section (docs: clean up data types section #1412)
- Remove duplicate API reference section (docs: remove duplicate api reference section #1408)
DocindexURLs (fix: docindex urls #1433)- Fix Install commands hint (fix: install commands after removing of common #1421)
- Add Google Analytics (feat: add google analytics #1432)
- Add install instructions for
hnswlibandelasticdocument indexes (feat: add install instructions for hnswlib and elastic doc index #1431) - Various fixes (docs: fix typos #1436, docs(migration-guide): fix issues #1417, docs(storage): proofread #1423, docs(contributing): basic fixes #1418, docs: Copy-edit README.md #1411, docs: consistent wording #1419)
🤟 Contributors
We would like to thank all contributors to this release:
- Alex Cureton-Griffiths (@alexcg1)
- samsja (@samsja)
- Johannes Messner (@JohannesMessner)
- Anne Yang (@AnneYang720)
- Scott Martens (@scott-martens)
- カレン (@RStar2022)
- Aman Agarwal (@agaraman0)
- Yanlong Wang (@nomagick)
- Charlotte Gerhaher (@anna-charlotte)