Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions docs/advanced/document-store/annlite.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,16 @@ Other functions behave the same as in-memory DocumentArray.

The following configs can be set:

| Name | Description | Default |
|-------------------|---------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| `n_dim` | Number of dimensions of embeddings to be stored and retrieved | **This is always required** |
| `data_path` | The data folder where the data is located | **A random temp folder** |
| `metric` | Distance metric to be used during search. Can be 'cosine', 'dot' or 'euclidean' | 'cosine' |
| `ef_construction` | The size of the dynamic list for the nearest neighbors (used during the construction) | `None`, defaults to the default value in the AnnLite package* |
| `ef_search` | The size of the dynamic list for the nearest neighbors (used during the search) | `None`, defaults to the default value in the AnnLite package* |
| `max_connection` | The number of bi-directional links created for every new element during construction. | `None`, defaults to the default value in the AnnLite package* |
| `n_components` | The output dimension of PCA model. Should be a positive number and less than `n_dim` if it's not `None` | `None`, defaults to the default value in the AnnLite package* |
| Name | Description | Default |
|-------------------|----------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------|
| `n_dim` | Number of dimensions of embeddings to be stored and retrieved | **This is always required** |
| `data_path` | The data folder where the data is located | **A random temp folder** |
| `metric` | Distance metric to be used during search. Can be 'cosine', 'dot' or 'euclidean' | 'cosine' |
| `ef_construction` | The size of the dynamic list for the nearest neighbors (used during the construction) | `None`, defaults to the default value in the AnnLite package* |
| `ef_search` | The size of the dynamic list for the nearest neighbors (used during the search) | `None`, defaults to the default value in the AnnLite package* |
| `max_connection` | The number of bi-directional links created for every new element during construction. | `None`, defaults to the default value in the AnnLite package* |
| `n_components` | The output dimension of PCA model. Should be a positive number and less than `n_dim` if it's not `None` | `None`, defaults to the default value in the AnnLite package* |
| `list_like` | Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance. | True |

*You can check the default values in [the AnnLite source code](https://github.com/jina-ai/annlite/blob/main/annlite/core/index/hnsw/index.py)

Expand Down
25 changes: 13 additions & 12 deletions docs/advanced/document-store/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,18 +391,19 @@ results = da.find('cheap', index='price')

The following configs can be set:

| Name | Description | Default |
|-------------------|-------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| `hosts` | Hostname of the Elasticsearch server | `http://localhost:9200` |
| `es_config` | Other ES configs in a Dict and pass to `Elasticsearch` client constructor, e.g. `cloud_id`, `api_key` | None |
| `index_name` | Elasticsearch index name; the class name of Elasticsearch index object to set this DocumentArray | None |
| `n_dim` | Dimensionality of the embeddings | None |
| `distance` | Similarity metric in Elasticsearch | `cosine` |
| `ef_construction` | The size of the dynamic list for the nearest neighbors. | `None`, defaults to the default value in ElasticSearch* |
| `m` | Similarity metric in Elasticsearch | `None`, defaults to the default value in ElasticSearch* |
| `index_text` | Boolean flag indicating whether to index `.text` or not | False |
| `tag_indices` | List of tags to index | False |
| `batch_size` | Batch size used to handle storage refreshes/updates | 64 |
| Name | Description | Default |
|-------------------|----------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|
| `hosts` | Hostname of the Elasticsearch server | `http://localhost:9200` |
| `es_config` | Other ES configs in a Dict and pass to `Elasticsearch` client constructor, e.g. `cloud_id`, `api_key` | None |
| `index_name` | Elasticsearch index name; the class name of Elasticsearch index object to set this DocumentArray | None |
| `n_dim` | Dimensionality of the embeddings | None |
| `distance` | Similarity metric in Elasticsearch | `cosine` |
| `ef_construction` | The size of the dynamic list for the nearest neighbors. | `None`, defaults to the default value in ElasticSearch* |
| `m` | Similarity metric in Elasticsearch | `None`, defaults to the default value in ElasticSearch* |
| `index_text` | Boolean flag indicating whether to index `.text` or not | False |
| `tag_indices` | List of tags to index | False |
| `batch_size` | Batch size used to handle storage refreshes/updates | 64 |
| `list_like` | Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance. | True |

```{tip}
You can read more about HNSW parameters and their default values [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params)
Expand Down
3 changes: 2 additions & 1 deletion docs/advanced/document-store/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -564,7 +564,8 @@ The solution is simple: use {ref}`column-selector<bulk-access>`:
da[0, 'text'] = 'hello'
```

### Performance Issue caused by List-like structure
### Performance issue caused by list-like structure

DocArray allows list-like behavior by adding an offset-to-id mapping structure to storage backends. Such feature (adding this structure) means the database stores,
along with documents, meta information about document order.
However, list_like behavior is not useful in indexers where concurrent usage is possible and users do not need information about document location.
Expand Down
35 changes: 18 additions & 17 deletions docs/advanced/document-store/qdrant.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,23 +76,24 @@ Other functions behave the same as in-memory DocumentArray.

The following configs can be set:

| Name | Description | Default |
|-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|
| `n_dim` | Number of dimensions of embeddings to be stored and retrieved | **This is always required** |
| `collection_name` | Qdrant collection name client | **Random collection name generated** |
| `distance` | Distance metric to use during search. Can be 'cosine', 'dot' or 'euclidean' | `'cosine'` |
| `host` | Hostname of the Qdrant server | `'localhost'` |
| `port` | Port of the Qdrant server | `6333` |
| `grpc_port` | Port of the Qdrant gRPC interface | `6334` |
| `prefer_grpc` | Set `True` to use gPRC interface whenever possible in custom methods | `False` |
| `api_key` | API key for authentication in Qdrant Cloud | `None` |
| `https` | Set `True` to use HTTPS(SSL) protocol | `None` |
| `serialize_config` | [Serialization config of each Document](../../../fundamentals/document/serialization.md) | `None` |
| `scroll_batch_size` | Batch size used when scrolling over the storage | `64` |
| `ef_construct` | Number of neighbours to consider during the index building. Larger = more accurate search, more time to build index | `None`, defaults to the default value in Qdrant* |
| `full_scan_threshold` | Minimal size (in KiloBytes) of vectors for additional payload-based indexing | `None`, defaults to the default value in Qdrant* |
| `m` | Number of edges per node in the index graph. Larger = more accurate search, more space required | `None`, defaults to the default value in Qdrant* |
| `columns` | Other fields to store in Document | `None` |
| Name | Description | Default |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|
| `n_dim` | Number of dimensions of embeddings to be stored and retrieved | **This is always required** |
| `collection_name` | Qdrant collection name client | **Random collection name generated** |
| `distance` | Distance metric to use during search. Can be 'cosine', 'dot' or 'euclidean' | `'cosine'` |
| `host` | Hostname of the Qdrant server | `'localhost'` |
| `port` | Port of the Qdrant server | `6333` |
| `grpc_port` | Port of the Qdrant gRPC interface | `6334` |
| `prefer_grpc` | Set `True` to use gPRC interface whenever possible in custom methods | `False` |
| `api_key` | API key for authentication in Qdrant Cloud | `None` |
| `https` | Set `True` to use HTTPS(SSL) protocol | `None` |
| `serialize_config` | [Serialization config of each Document](../../../fundamentals/document/serialization.md) | `None` |
| `scroll_batch_size` | Batch size used when scrolling over the storage | `64` |
| `ef_construct` | Number of neighbours to consider during the index building. Larger = more accurate search, more time to build index | `None`, defaults to the default value in Qdrant* |
| `full_scan_threshold` | Minimal size (in KiloBytes) of vectors for additional payload-based indexing | `None`, defaults to the default value in Qdrant* |
| `m` | Number of edges per node in the index graph. Larger = more accurate search, more space required | `None`, defaults to the default value in Qdrant* |
| `columns` | Other fields to store in Document | `None` |
| `list_like` | Controls if ordering of Documents is persisted in the Database. Disabling this breaks list-like features, but can improve performance. | True |

*You can read more about the HNSW parameters and their default values [here](https://qdrant.tech/documentation/indexing/#vector-index)

Expand Down
Loading