Skip to content
6 changes: 3 additions & 3 deletions docs/advanced/document-store/annlite.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(annlite)=
# Annlite

One can use [Annlite](https://github.com/jina-ai/annlite) as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. `.match()`, `.find()`.
You can use [Annlite](https://github.com/jina-ai/annlite) as a document store for DocumentArray. It's suitable for faster Document retrieval on embeddings, i.e. `.match()`, `.find()`.

````{tip}
This feature requires `annlite`. You can install it via `pip install "docarray[annlite]".`
Expand All @@ -10,7 +10,7 @@ This feature requires `annlite`. You can install it via `pip install "docarray[a

## Usage

One can instantiate a DocumentArray with Annlite storage like so:
You can instantiate a DocumentArray with Annlite storage like so:

```python
from docarray import DocumentArray
Expand All @@ -20,7 +20,7 @@ da = DocumentArray(storage='annlite', config={'n_dim': 10})

The usage would be the same as the ordinary DocumentArray.

To access a DocumentArray formerly persisted, one can specify the `data_path` in `config`.
To access a DocumentArray formerly persisted, you can specify the `data_path` in `config`.

```python
from docarray import DocumentArray
Expand Down
10 changes: 5 additions & 5 deletions docs/advanced/document-store/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Elasticsearch

One can use [Elasticsearch](https://www.elastic.co) as the document store for DocumentArray. It is useful when one wants to have faster Document retrieval on embeddings, i.e. `.match()`, `.find()`.
You can use [Elasticsearch](https://www.elastic.co) as a document store for DocumentArray. It's suitable for faster Document retrieval on embeddings, i.e. `.match()`, `.find()`.

````{tip}
This feature requires `elasticsearch`. You can install it via `pip install "docarray[elasticsearch]".`
Expand Down Expand Up @@ -41,7 +41,7 @@ docker-compose up

### Create DocumentArray with Elasticsearch backend

Assuming service is started using the default configuration (i.e. server address is `http://localhost:9200`), one can instantiate a DocumentArray with Elasticsearch storage as such:
Assuming service is started using the default configuration (i.e. server address is `http://localhost:9200`), you can instantiate a DocumentArray with Elasticsearch storage as such:

```python
from docarray import DocumentArray
Expand Down Expand Up @@ -70,7 +70,7 @@ da = DocumentArray(

Here is [the official Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#elasticsearch-security-certificates) for you to get certificate, password etc.

To access a DocumentArray formerly persisted, one can specify `index_name` and the hosts.
To access a DocumentArray formerly persisted, you can specify `index_name` and the hosts.

The following example will build a DocumentArray with previously stored data from `old_stuff` on `http://localhost:9200`:

Expand Down Expand Up @@ -160,7 +160,7 @@ You can read more about parallel bulk config and their default values [here](htt

### Vector search with filter query

One can perform Approximate Nearest Neighbor Search and pre-filter results using a filter query that follows [ElasticSearch's DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html).
You can perform Approximate Nearest Neighbor Search and pre-filter results using a filter query that follows [ElasticSearch's DSL](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html).

Consider Documents with embeddings `[0,0,0]` up to `[9,9,9]` where the document with embedding `[i,i,i]`
has as tag `price` with value `i`. We can create such example with the following code:
Expand Down Expand Up @@ -238,7 +238,7 @@ You can read more about approximate kNN tuning [here](https://www.elastic.co/gui

### Search by filter query

One can search with user-defined query filters using the `.find` method. Such queries can be constructed following the
You can search with user-defined query filters using the `.find` method. Such queries can be constructed following the
guidelines in [ElasticSearch's Documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html).

Consider you store Documents with a certain tag `price` into ElasticSearch and you want to retrieve all Documents
Expand Down
18 changes: 9 additions & 9 deletions docs/advanced/document-store/extend.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Let's get started!

## Step 1: create the folder

Go to `docarray/array/storage` folder, create a sub-folder for your document store. Let's call it `mydocstore`. You will need to create four empty files in that folder:
Go to `docarray/array/storage` folder, create a sub-folder for your document store. Let's call it `mydocstore`. You need to create four empty files in that folder:

```{code-block}
---
Expand Down Expand Up @@ -80,7 +80,7 @@ class GetSetDelMixin(BaseGetSetDelMixin):
...
```

You will need to implement the above five functions, which correspond to the logics of get/set/delete items via a string `.id`. They are essential to ensure DocumentArray works.
You need to implement the above five functions, which correspond to the logics of get/set/delete items via a string `.id`. They are essential to ensure DocumentArray works.

Note that DocumentArray maintains an `offset2ids` mapping to allow a list-like behaviour. This mapping is
inherited from the `BaseGetSetDelMixin`. Therefore, you need to implement methods to persist this mapping, in case you
Expand Down Expand Up @@ -111,9 +111,9 @@ upper level. Also, make sure that `_set_doc_by_id` performs an **upsert operatio
```{tip}
Let's call the above five functions as **the essentials**.

If you aim for high performance, it is recommeneded to implement other methods *without* leveraging your essentials. They are: `_get_docs_by_ids`, `_del_docs_by_ids`, `_clear_storage`, `_set_doc_value_pairs`, `_set_doc_value_pairs_nested`, `_set_docs_by_ids`. One can get their full signatures from {class}`~docarray.array.storage.base.getsetdel.BaseGetSetDelMixin`. These functions define more fine-grained get/set/delete logics that are frequently used in DocumentArray.
If you aim for high performance, it is recommeneded to implement other methods *without* leveraging your essentials. They are: `_get_docs_by_ids`, `_del_docs_by_ids`, `_clear_storage`, `_set_doc_value_pairs`, `_set_doc_value_pairs_nested`, `_set_docs_by_ids`. You can get their full signatures from {class}`~docarray.array.storage.base.getsetdel.BaseGetSetDelMixin`. These functions define more fine-grained get/set/delete logics that are frequently used in DocumentArray.

Implementing them is fully optional, and you can only implement some of them not all of them. If you are not implementing them, those methods will use a generic-but-slow version that is based on your five essentials.
Implementing them is fully optional, and you can only implement some of them not all of them. If you are not implementing them, those methods use a generic-but-slow version based on your five essentials.
```

```{seealso}
Expand Down Expand Up @@ -149,7 +149,7 @@ class SequenceLikeMixin(BaseSequenceLikeMixin):
...

def insert(self, index: int, value: 'Document'):
# Optional. By default, this will add a new item and update offset2id
# Optional. By default, this adds a new item and update offset2id
# if you want to customize this, make sure to handle offset2id
...

Expand All @@ -162,7 +162,7 @@ class SequenceLikeMixin(BaseSequenceLikeMixin):
...

def __iter__(self) -> Iterator['Document']:
# Optional. By default, this will rely on offset2id to iterate
# Optional. By default, this relies on offset2id to iterate
...
```

Expand Down Expand Up @@ -244,7 +244,7 @@ By default, this should be set to `True`.
Further, you have to store the value of this flag in `self._list_like`. Some methods that are handled outside of your control will take the value form there and use it appropriately.

`_init_storage` is a very important function to be called during the DocumentArray construction.
You will need to handle different construction & copy behaviors in this function.
You need to handle different construction and copy behaviors in this function.

`_ensure_unique_config` is needed to support DocArray's subindex feature.
A subindex inherits its configuration from the root index, unless a field of the configuration is explicitly provided to the subindex.
Expand Down Expand Up @@ -308,7 +308,7 @@ class StorageMixins(BackendMixin, GetSetDelMixin, SequenceLikeMixin, ABC):
...
```

Just copy-paste it will do the work.
Just copying and pasting it should work.

If you have implemented a `find.py` module, make sure to also inherit the `FindMixin`:
```python
Expand Down Expand Up @@ -391,7 +391,7 @@ Done! Now you should be able to use it like `DocumentArrayMyDocStore`!

## On pull request: add tests and type-hint

Welcome to contribute your extension back to DocArray. You will need to include `DocumentArrayMyDocStore` in at least the following tests:
You are welcome to contribute your extension back to DocArray. You need to include `DocumentArrayMyDocStore` in at least the following tests:

```text
tests/unit/array/test_advance_indexing.py
Expand Down
Loading