docarray · samsja · Apr 17, 2023 · Apr 14, 2023 · Apr 14, 2023 · Apr 14, 2023
diff --git a/README.md b/README.md
@@ -32,8 +32,9 @@ DocArray handles your data while integrating seamlessly with the rest of your **
 - :chains: DocArray data can be sent as JSON over **HTTP** or as **[Protobuf](https://protobuf.dev/)** over **[gRPC](https://grpc.io/)**
 
 
-> :bulb: **Where are you coming from?** Depending on your use case and background, there are different was to "get" DocArray.
-> You can navigate to the following section for an explanation that should fit your mindest:
+> :bulb: **Where are you coming from?** Depending on your use case and background, there are different ways to "get" DocArray.
+> You can navigate to the following section for an explanation that should fit your mindset:
+> 
 > - [Coming from pure PyTorch or TensorFlow](#coming-from-pytorch)
 > - [Coming from Pydantic](#coming-from-pydantic)
 > - [Coming from FastAPI](#coming-from-fastapi)
@@ -46,7 +47,8 @@ DocArray was released under the open-source [Apache License 2.0](https://github.
 DocArray allows you to **represent your data**, in an ML-native way.
 
 This is useful for different use cases:
-- :running_woman: You are **training a model**, there are myriads of tensors of different shapes and sizes flying around, representing different _things_, and you want to keep a straight head about them
+
+- :woman_running: You are **training a model**, there are myriads of tensors of different shapes and sizes flying around, representing different _things_, and you want to keep a straight head about them
 - :cloud: You are **serving a model**, for example through FastAPI, and you want to specify your API endpoints
 - :card_index_dividers: You are **parsing data** for later use in your ML or DS applications
 
@@ -61,6 +63,7 @@ from docarray import BaseDoc
 from docarray.typing import TorchTensor, ImageUrl
 import torch
 
+
 # Define your data model
 class MyDocument(BaseDoc):
     description: str
@@ -95,6 +98,7 @@ from docarray.typing import TorchTensor, ImageUrl
 from typing import Optional
 import torch
 
+
 # Define your data model
 class MyDocument(BaseDoc):
     description: str
@@ -160,6 +164,7 @@ That's why you can easily collect multiple `Documents`:
 When building or interacting with an ML system, usually you want to process multiple Documents (data points) at once.
 
 DocArray offers two data structures for this:
+
 - **`DocVec`**: A vector of `Documents`. All tensors in the `Documents` are stacked up into a single tensor. **Perfect for batch processing and use inside of ML models**.
 - **`DocList`**: A list of `Documents`. All tensors in the `Documents` are kept as-is. **Perfect for streaming, re-ranking, and shuffling of data**.
 
@@ -185,7 +190,7 @@ vec = DocVec[Image](  # the DocVec is parametrized by your personal schema!
         for _ in range(100)
     ]
 )
-```
+``` 
 
 As you can see in the code snippet above, `DocVec` is **parametrized by the type of Document** you want to use with it: `DocVec[Image]`.
 
@@ -263,6 +268,7 @@ DocArray allows you to **send your data**, in an ML-native way.
 This means there is native support for **Protobuf and gRPC**, on top of **HTTP** and serialization to JSON, JSONSchema, Base64, and Bytes.
 
 This is useful for different use cases:
+
 - :cloud: You are **serving a model**, for example through **[Jina](https://github.com/jina-ai/jina/)** or **[FastAPI](https://github.com/tiangolo/fastapi/)**
 - :spider_web: You **distribute your model** across machines and need to send your data between nodes
 - :gear: You are building a **microservice** architecture and need to send your data between microservices
@@ -278,6 +284,7 @@ from docarray import BaseDoc
 from docarray.typing import ImageTorchTensor
 import torch
 
+
 # model your data
 class MyDocument(BaseDoc):
     description: str
@@ -302,7 +309,7 @@ doc_5 = MyDocument.parse_raw(json)
 ```
 
 Of course, serialization is not all you need.
-So check out how DocArray integrates with FatAPI and Jina.
+So check out how DocArray integrates with FastAPI and Jina.
 
 
 ## Store
@@ -311,6 +318,7 @@ Once you've modelled your data, and maybe sent it around, usually you want to **
 But fret not! DocArray has you covered!
 
 **Document Stores** let you, well, store your Documents, locally or remotely, all with the same user interface:
+
 - :cd: **On disk** as a file in your local file system
 - :bucket: On **[AWS S3](https://aws.amazon.com/de/s3/)**
 - :cloud: On **[Jina AI Cloud](https://cloud.jina.ai/)**
@@ -348,6 +356,7 @@ dl_2 = DocList[ImageDoc].pull('s3://my-bucket/my-documents', show_progress=True)
 **Document Indexes** let you index your Documents into a **vector database**, for efficient similarity-based retrieval.
 
 This is useful for:
+
 - :left_speech_bubble: Augmenting **LLMs and Chatbots** with domain knowledge ([Retrieval Augmented Generation](https://arxiv.org/abs/2005.11401))
 - :mag: **Neural search** applications
 - :bulb: **Recommender systems**

diff --git a/docarray/array/doc_list/io.py b/docarray/array/doc_list/io.py
@@ -760,22 +760,22 @@ def save_binary(
         """Save DocList into a binary file.
 
         It will use the protocol to pick how to save the DocList.
-        If used 'picke-doc_list` and `protobuf-array` the DocList will be stored
+        If used `picke-doc_list` and `protobuf-array` the DocList will be stored
         and compressed at complete level using `pickle` or `protobuf`.
         When using `protobuf` or `pickle` as protocol each Document in DocList
         will be stored individually and this would make it available for streaming.
 
-        :param file: File or filename to which the data is saved.
-        :param protocol: protocol to use. It can be 'pickle-array', 'protobuf-array', 'pickle' or 'protobuf'
-        :param compress: compress algorithm to use between `lz4`, `bz2`, `lzma`, `zlib`, `gzip`
-        :param show_progress: show progress bar, only works when protocol is `pickle` or `protobuf`
-
          !!! note
             If `file` is `str` it can specify `protocol` and `compress` as file extensions.
             This functionality assumes `file=file_name.$protocol.$compress` where `$protocol` and `$compress` refer to a
             string interpolation of the respective `protocol` and `compress` methods.
             For example if `file=my_docarray.protobuf.lz4` then the binary data will be created using `protocol=protobuf`
             and `compress=lz4`.
+
+        :param file: File or filename to which the data is saved.
+        :param protocol: protocol to use. It can be 'pickle-array', 'protobuf-array', 'pickle' or 'protobuf'
+        :param compress: compress algorithm to use between `lz4`, `bz2`, `lzma`, `zlib`, `gzip`
+        :param show_progress: show progress bar, only works when protocol is `pickle` or `protobuf`
         """
         if isinstance(file, io.BufferedWriter):
             file_ctx = nullcontext(file)

diff --git a/docarray/array/doc_list/pushpull.py b/docarray/array/doc_list/pushpull.py
@@ -38,7 +38,9 @@ def __len__(self) -> int:
 
     @staticmethod
     def resolve_url(url: str) -> Tuple[PUSH_PULL_PROTOCOL, str]:
-        """Resolve the URL to the correct protocol and name."""
+        """Resolve the URL to the correct protocol and name.
+        :param url: url to resolve
+        """
         protocol, name = url.split('://', 2)
         if protocol in SUPPORTED_PUSH_PULL_PROTOCOLS:
             protocol = cast(PUSH_PULL_PROTOCOL, protocol)

diff --git a/docarray/array/doc_list/sequence_indexing_mixin.py b/docarray/array/doc_list/sequence_indexing_mixin.py
@@ -41,12 +41,16 @@ class IndexingSequenceMixin(Iterable[T_item]):
 
     You can index into, delete from, and set items in a IndexingSequenceMixin like a numpy doc_list or torch tensor:
 
-    .. code-block:: python
-        docs[0]  # index by position
-        docs[0:5:2]  # index by slice
-        docs[[0, 2, 3]]  # index by list of indices
-        docs[True, False, True, True, ...]  # index by boolean mask
+    ---
 
+    ```python
+    docs[0]  # index by position
+    docs[0:5:2]  # index by slice
+    docs[[0, 2, 3]]  # index by list of indices
+    docs[True, False, True, True, ...]  # index by boolean mask
+    ```
+
+    ---
 
     """
 

diff --git a/docarray/array/doc_vec/list_advance_indexing.py b/docarray/array/doc_vec/list_advance_indexing.py
@@ -11,12 +11,16 @@ class ListAdvancedIndexing(IndexingSequenceMixin[T_item]):
 
     You can index into a ListAdvanceIndex like a numpy array or torch tensor:
 
-    .. code-block:: python
-        docs[0]  # index by position
-        docs[0:5:2]  # index by slice
-        docs[[0, 2, 3]]  # index by list of indices
-        docs[True, False, True, True, ...]  # index by boolean mask
+    ---
 
+    ```python
+    docs[0]  # index by position
+    docs[0:5:2]  # index by slice
+    docs[[0, 2, 3]]  # index by list of indices
+    docs[True, False, True, True, ...]  # index by boolean mask
+    ```
+
+    ---
 
     """
 

diff --git a/docarray/base_doc/docarray_response.py b/docarray/base_doc/docarray_response.py
@@ -15,15 +15,20 @@ class DocArrayResponse(JSONResponse):
     This is a custom Response class for FastAPI and starlette. This is needed
     to handle serialization of the Document types when using FastAPI
 
-      EXAMPLE USAGE
-        .. code-block:: python
-            from docarray.documets import Text
-            from docarray.base_doc import DocResponse
+    ---
 
+    ```python
+    from docarray.documets import Text
+    from docarray.base_doc import DocResponse
+
+
+    @app.post("/doc/", response_model=Text, response_class=DocResponse)
+    async def create_item(doc: Text) -> Text:
+        return doc
+    ```
+
+    ---
 
-            @app.post("/doc/", response_model=Text, response_class=DocResponse)
-            async def create_item(doc: Text) -> Text:
-                return doc
     """
 
     def render(self, content: Any) -> bytes:

diff --git a/docarray/base_doc/mixins/update.py b/docarray/base_doc/mixins/update.py
@@ -28,12 +28,12 @@ def update(self, other: T):
          - Setting data properties of the second Document to the first Document
          if they are not None
          - Concatenating lists and updating sets
-         - Updating recursively Documents and DocArrays
+         - Updating recursively Documents and DocLists
          - Updating Dictionaries of the left with the right
 
         It behaves as an update operation for Dictionaries, except that since
         it is applied to a static schema type, the presence of the field is
-        given by the field not having a None value and that DocArrays,
+        given by the field not having a None value and that DocLists,
         lists and sets are concatenated. It is worth mentioning that Tuples
         are not merged together since they are meant to be immutable,
         so they behave as regular types and the value of `self` is updated

diff --git a/docarray/computation/abstract_comp_backend.py b/docarray/computation/abstract_comp_backend.py
@@ -144,7 +144,7 @@ def minmax_normalize(
         `tensor` can be a 1D array or a 2D array. When `tensor` is a 2D array, then
         normalization is row-based.
 
-        .. note::
+        !!! note
             - with `t_range=(0, 1)` will normalize the min-value of data to 0, max to 1;
             - with `t_range=(1, 0)` will normalize the min-value of data to 1, max value
               of the data to 0.

diff --git a/docarray/computation/numpy_backend.py b/docarray/computation/numpy_backend.py
@@ -91,7 +91,8 @@ def minmax_normalize(
         `tensor` can be a 1D array or a 2D array. When `tensor` is a 2D array, then
         normalization is row-based.
 
-        .. note::
+        !!! note
+
             - with `t_range=(0, 1)` will normalize the min-value of data to 0, max to 1;
             - with `t_range=(1, 0)` will normalize the min-value of data to 1, max value
               of the data to 0.

diff --git a/docarray/computation/torch_backend.py b/docarray/computation/torch_backend.py
@@ -147,7 +147,8 @@ def minmax_normalize(
         `tensor` can be a 1D array or a 2D array. When `tensor` is a 2D array, then
         normalization is row-based.
 
-        .. note::
+        !!! note
+
             - with `t_range=(0, 1)` will normalize the min-value of data to 0, max to 1;
             - with `t_range=(1, 0)` will normalize the min-value of data to 1, max value
               of the data to 0.

diff --git a/docarray/documents/helper.py b/docarray/documents/helper.py
@@ -25,16 +25,6 @@ def create_doc(
 ) -> Type['T_doc']:
     """
     Dynamically create a subclass of BaseDoc. This is a wrapper around pydantic's create_model.
-    :param __model_name: name of the created model
-    :param __config__: config class to use for the new model
-    :param __base__: base class for the new model to inherit from, must be BaseDoc or its subclass
-    :param __module__: module of the created model
-    :param __validators__: a dict of method names and @validator class methods
-    :param __cls_kwargs__: a dict for class creation
-    :param __slots__: Deprecated, `__slots__` should not be passed to `create_model`
-    :param field_definitions: fields of the model (or extra fields if a base is supplied)
-        in the format `<name>=(<type>, <default default>)` or `<name>=<default value>`
-    :return: the new Document class
 
     ```python
     from docarray.documents import Audio
@@ -51,6 +41,17 @@ def create_doc(
     assert issubclass(MyAudio, BaseDoc)
     assert issubclass(MyAudio, Audio)
     ```
+
+    :param __model_name: name of the created model
+    :param __config__: config class to use for the new model
+    :param __base__: base class for the new model to inherit from, must be BaseDoc or its subclass
+    :param __module__: module of the created model
+    :param __validators__: a dict of method names and @validator class methods
+    :param __cls_kwargs__: a dict for class creation
+    :param __slots__: Deprecated, `__slots__` should not be passed to `create_model`
+    :param field_definitions: fields of the model (or extra fields if a base is supplied)
+        in the format `<name>=(<type>, <default default>)` or `<name>=<default value>`
+    :return: the new Document class
     """
 
     if not issubclass(__base__, BaseDoc):
@@ -76,32 +77,34 @@ def create_doc_from_typeddict(
 ):
     """
     Create a subclass of BaseDoc based on the fields of a `TypedDict`. This is a wrapper around pydantic's create_model_from_typeddict.
-    :param typeddict_cls: TypedDict class to use for the new Document class
-    :param kwargs: extra arguments to pass to `create_model_from_typeddict`
-    :return: the new Document class
 
-    EXAMPLE USAGE
+    ---
 
-    .. code-block:: python
+    ```python
+    from typing_extensions import TypedDict
 
-        from typing_extensions import TypedDict
+    from docarray import BaseDoc
+    from docarray.documents import Audio
+    from docarray.documents.helper import create_doc_from_typeddict
+    from docarray.typing.tensor.audio import AudioNdArray
 
-        from docarray import BaseDoc
-        from docarray.documents import Audio
-        from docarray.documents.helper import create_doc_from_typeddict
-        from docarray.typing.tensor.audio import AudioNdArray
 
+    class MyAudio(TypedDict):
+        title: str
+        tensor: AudioNdArray
 
-        class MyAudio(TypedDict):
-            title: str
-            tensor: AudioNdArray
 
+    Doc = create_doc_from_typeddict(MyAudio, __base__=Audio)
 
-        Doc = create_doc_from_typeddict(MyAudio, __base__=Audio)
+    assert issubclass(Doc, BaseDoc)
+    assert issubclass(Doc, Audio)
+    ```
 
-        assert issubclass(Doc, BaseDoc)
-        assert issubclass(Doc, Audio)
+    ---
 
+    :param typeddict_cls: TypedDict class to use for the new Document class
+    :param kwargs: extra arguments to pass to `create_model_from_typeddict`
+    :return: the new Document class
     """
 
     if '__base__' in kwargs:
@@ -122,24 +125,25 @@ def create_doc_from_dict(model_name: str, data_dict: Dict[str, Any]) -> Type['T_
     In case the example contains None as a value,
     corresponding field will be viewed as the type Any.
 
-    :param model_name: Name of the new Document class
-    :param data_dict: Dictionary of field types to their corresponding values.
-    :return: the new Document class
-
-    EXAMPLE USAGE
+    ---
 
-    .. code-block:: python
+    ```python
+    import numpy as np
+    from docarray.documents import ImageDoc
+    from docarray.documents.helper import create_doc_from_dict
 
-        import numpy as np
-        from docarray.documents import ImageDoc
-        from docarray.documents.helper import create_doc_from_dict
+    data_dict = {'image': ImageDoc(tensor=np.random.rand(3, 224, 224)), 'author': 'me'}
 
-        data_dict = {'image': ImageDoc(tensor=np.random.rand(3, 224, 224)), 'author': 'me'}
+    MyDoc = create_doc_from_dict(model_name='MyDoc', data_dict=data_dict)
 
-        MyDoc = create_doc_from_dict(model_name='MyDoc', data_dict=data_dict)
+    assert issubclass(MyDoc, BaseDoc)
+    ```
 
-        assert issubclass(MyDoc, BaseDoc)
+    ---
 
+    :param model_name: Name of the new Document class
+    :param data_dict: Dictionary of field types to their corresponding values.
+    :return: the new Document class
     """
     if not data_dict:
         raise ValueError('`data_dict` should contain at least one item')