Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
0d00c3f
docs: add hnswDocumentIndex
nan-wang Apr 7, 2023
4ea4b7d
docs: add the crud operations
nan-wang Apr 7, 2023
1429e4c
docs: add docs
nan-wang Apr 7, 2023
8cd8d81
docs: rename the file
nan-wang Apr 7, 2023
de8b0cb
docs: complete the hnswlib index
nan-wang Apr 8, 2023
b151c52
docs: add the index api
nan-wang Apr 8, 2023
3d14556
docs: add elastic index
nan-wang Apr 8, 2023
bffd2b5
docs: complete the elastic index
nan-wang Apr 8, 2023
4538559
docs: add geolocation filter example
nan-wang Apr 9, 2023
6aaa21e
docs: update elastic index
AnneYang720 Apr 12, 2023
cc1719d
Merge branch 'feat-rewrite-v2' into docs-index
AnneYang720 Apr 12, 2023
2c7146f
docs: update es index filter and todo
AnneYang720 Apr 12, 2023
0eba062
docs: es index querybuilder and runtimeconfig
AnneYang720 Apr 12, 2023
f89a710
docs: add doc index docs
JohannesMessner Apr 12, 2023
840078f
Merge branch 'feat-rewrite-v2' into docs-index
AnneYang720 Apr 13, 2023
b3bc25d
ci: fix elastic in documentation
AnneYang720 Apr 13, 2023
d374df4
Merge branch 'feat-rewrite-v2' into docs-index
AnneYang720 Apr 13, 2023
d971e95
docs: fix elastic code examples
AnneYang720 Apr 13, 2023
43c4122
docs: add more stuff
JohannesMessner Apr 13, 2023
135c0df
Merge remote-tracking branch 'origin/docs-index' into docs-index
JohannesMessner Apr 13, 2023
714b14b
fix: test docs
AnneYang720 Apr 13, 2023
c4b5ea7
fix: import es fixture for docs test
AnneYang720 Apr 13, 2023
0cb7f9d
docs: add back code snippet
JohannesMessner Apr 13, 2023
bbd4f18
fix: minor fix
AnneYang720 Apr 13, 2023
d1ad967
docs: add info about config stuff
JohannesMessner Apr 13, 2023
3428220
Merge remote-tracking branch 'origin/docs-index' into docs-index
JohannesMessner Apr 13, 2023
f6b7546
Merge branch 'feat-rewrite-v2' into docs-index
AnneYang720 Apr 14, 2023
28aa3ed
fix: mypy
AnneYang720 Apr 14, 2023
effd8dd
docs: explain advanced configs
JohannesMessner Apr 14, 2023
b05bc09
Merge remote-tracking branch 'origin/docs-index' into docs-index
JohannesMessner Apr 14, 2023
7d28c02
docs: add missing import
JohannesMessner Apr 14, 2023
2a96868
docs: add backend specific docs
JohannesMessner Apr 14, 2023
15f660d
docs: remove unneeded snippets
JohannesMessner Apr 14, 2023
96216f4
docs: polishing
JohannesMessner Apr 17, 2023
99038ca
docs: add qdrant
JohannesMessner Apr 17, 2023
921fefc
docs: fix code snippets
AnneYang720 Apr 17, 2023
0fbb24e
docs: tweak es docs
JohannesMessner Apr 17, 2023
70557a1
Merge remote-tracking branch 'origin/docs-index' into docs-index
JohannesMessner Apr 17, 2023
95df90a
docs: add explanation of vector search
JohannesMessner Apr 17, 2023
5c41d20
docs: add nested stuff
JohannesMessner Apr 17, 2023
c73324b
Merge branch 'main' into docs-index
JohannesMessner Apr 17, 2023
988c177
docs: fix rendering and links
JohannesMessner Apr 17, 2023
bbdab4d
docs: fix rendering of tabs
JohannesMessner Apr 17, 2023
8ff9979
test: exclude index docs from doctests
JohannesMessner Apr 17, 2023
69ea7f3
Merge branch 'main' into docs-index
JohannesMessner Apr 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ jobs:
python -m pip install --upgrade pip
python -m pip install poetry
poetry install --all-extras
poetry run pip install elasticsearch==8.6.2
sudo apt-get update
sudo apt-get install --no-install-recommends ffmpeg

Expand Down Expand Up @@ -147,6 +148,7 @@ jobs:
python -m pip install poetry
rm poetry.lock
poetry install --all-extras
poetry run pip install elasticsearch==8.6.2
sudo apt-get update
sudo apt-get install --no-install-recommends ffmpeg

Expand Down
8 changes: 4 additions & 4 deletions docarray/index/backends/elastic.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class ElasticDocIndex(BaseDocIndex, Generic[TSchema]):
def __init__(self, db_config=None, **kwargs):
"""Initialize ElasticDocIndex"""
super().__init__(db_config=db_config, **kwargs)
self._db_config = cast(self.DBConfig, self._db_config)
self._db_config = cast(ElasticDocIndex.DBConfig, self._db_config)

# ElasticSearch client creation
if self._db_config.index_name is None:
Expand Down Expand Up @@ -406,7 +406,7 @@ def execute_query(self, query: Dict[str, Any], *args, **kwargs) -> Any:
resp = self._client.search(index=self._index_name, **query)
docs, scores = self._format_response(resp)

return _FindResult(documents=docs, scores=scores)
return _FindResult(documents=docs, scores=parse_obj_as(NdArray, scores))

def _find(
self, query: np.ndarray, limit: int, search_field: str = ''
Expand All @@ -417,7 +417,7 @@ def _find(

docs, scores = self._format_response(resp)

return _FindResult(documents=docs, scores=scores)
return _FindResult(documents=docs, scores=parse_obj_as(NdArray, scores))

def _find_batched(
self,
Expand Down Expand Up @@ -576,7 +576,7 @@ def _form_text_search_body(
}
return body

def _format_response(self, response: Any) -> Tuple[List[Dict], NdArray]:
def _format_response(self, response: Any) -> Tuple[List[Dict], List[Any]]:
docs = []
scores = []
for result in response['hits']['hits']:
Expand Down
4 changes: 3 additions & 1 deletion docarray/index/backends/elasticv7.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
from typing import Any, Dict, List, Optional, Sequence, TypeVar, Union

import numpy as np
from pydantic import parse_obj_as

from docarray import BaseDoc
from docarray.index import ElasticDocIndex
from docarray.index.abstract import BaseDocIndex, _ColumnInfo
from docarray.typing import AnyTensor
from docarray.typing.tensor.ndarray import NdArray
from docarray.utils.find import _FindResult

TSchema = TypeVar('TSchema', bound=BaseDoc)
Expand Down Expand Up @@ -120,7 +122,7 @@ def execute_query(self, query: Dict[str, Any], *args, **kwargs) -> Any:
resp = self._client.search(index=self._index_name, body=query)
docs, scores = self._format_response(resp)

return _FindResult(documents=docs, scores=scores)
return _FindResult(documents=docs, scores=parse_obj_as(NdArray, scores))

###############################################
# Helpers #
Expand Down
2 changes: 1 addition & 1 deletion docarray/index/backends/hnswlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
_raise_not_supported,
)
from docarray.proto import DocProto
from docarray.typing import NdArray
from docarray.typing.tensor.abstract_tensor import AbstractTensor
from docarray.typing.tensor.ndarray import NdArray
from docarray.utils._internal.misc import import_library, is_np_int
from docarray.utils.filter import filter_docs
from docarray.utils.find import _FindResult, _FindResultBatched
Expand Down
5 changes: 3 additions & 2 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
api/*
proto/*

../README.md
index.md
<<<<<<< HEAD
README.md
#index.md
CONTRIBUTING.md
3 changes: 3 additions & 0 deletions docs/api_references/index/backends.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Backends

::: docarray.index.backends
21 changes: 19 additions & 2 deletions docs/user_guide/storing/first_step.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Intro
# Overview

In the previous sections we saw how to use [`BaseDoc`][docarray.base_doc.doc.BaseDoc], [`DocList`][docarray.array.doc_list.doc_list.DocList] and [`DocVec`][docarray.array.doc_vec.doc_vec.DocVec] to represent multi-modal data and send it over the wire.
In this section we will see how to store and persist this data.

DocArray offers to ways of storing your data:
DocArray offers to ways of storing your data, each of which have their own documentation sections:

1. In a **[Document Store](#document-store)** for simple long-term storage
2. In a **[Document Index](#document-index)** for fast retrieval using vector similarity
Expand All @@ -24,3 +24,20 @@ This section covers the following three topics:
- [Store on S3](doc_store/store_s3.md)

## Document Index

A Document Index lets you store your Documents and search through them using vector similarity.

This is useful if you want to store a bunch of data, and at a later point retrieve Documents that are similar to
some query that you provide.
Concrete examples where this is relevant are neural search application, Augmenting LLMs and Chatbots with domain knowledge ([Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401))]),
or recommender systems.

DocArray's Document Index concept achieves this by providing a unified interface to a number of [vector databases](https://learn.microsoft.com/en-us/semantic-kernel/concepts-ai/vectordb).
In fact, you can think of Document Index as an **[ORM](https://sqlmodel.tiangolo.com/db-to-code/) for vector databases**.

Currently, DocArray supports the following vector databases:

- [Weaviate](https://weaviate.io/) | [Docs](index_weaviate.md)
- [Qdrant](https://qdrant.tech/) | [Docs](index_qdrant.md)
- [Elasticsearch](https://www.elastic.co/elasticsearch/) v7 and v8 | [Docs](index_elastic.md)
- [HNSWlib](https://github.com/nmslib/hnswlib) | [Docs](index_hnswlib.md)
Loading