docarray · samsja · Apr 3, 2023 · Mar 29, 2023 · Mar 29, 2023 · Mar 29, 2023
diff --git a/README.md b/README.md
@@ -482,13 +482,13 @@ INFO - docarray - HnswDocumentIndex[SimpleDoc] has been initialized
 To try out the alpha you can install it via git:
 
 ```shell
-pip install "git+https://github.com/docarray/[email protected]#egg=docarray[common,torch,image]"
+pip install "git+https://github.com/docarray/[email protected]#egg=docarray[proto,torch,image]"
 ```
 
 ...or from the latest development branch
 
 ```shell
-pip install "git+https://github.com/docarray/docarray@feat-rewrite-v2#egg=docarray[common,torch,image]"
+pip install "git+https://github.com/docarray/docarray@feat-rewrite-v2#egg=docarray[proto,torch,image]"
 ```
 
 ## See also

diff --git a/docs/user_guide/first_step.md b/docs/user_guide/first_step.md
diff --git a/docs/user_guide/intro.md b/docs/user_guide/intro.md
@@ -1 +1,51 @@
-# User Guide - Intro
+# User Guide - Introduction
+
+This user guide shows you how to use `DocArray` with most of its features.
+
+There are three main sections:
+
+- [Representing Data](representing/first_step.md): This section will show you how to use `DocArray` to represent your data. This is a great starting point if you want to better organize the data in your ML models, or if you are looking for a "pydantic for ML".
+- [Sending Data](sending/first_step.md): This section will show you how to use `DocArray` to send your data. This is a great starting point if you want to serve your ML model, for example through FastAPI.
+- [Storing Data](storing/first_step.md): This section will show you how to use `DocArray` to store your data. This is a great starting point if you are looking for an "ORM for vector databases".
+
+You should start by reading the [Representing Data](representing/first_step.md) section, and then the [Sending Data](sending/first_step.md) and [Storing Data](storing/first_step.md) sections can be read in any order.
+
+You will first need to install `DocArray` in your Python environment. 
+
+## Install DocArray
+
+To install `DocArray`, you can use the following command:
+
+```console
+pip install "docarray[full]"
+```
+
+This will install the main dependencies of `DocArray` and will work with all the supported data modalities.
+
+!!! note 
+    To install a very light version of `DocArray` with only the core dependencies, you can use the following command:
+    ```
+    pip install "docarray"
+    ``` 
+
+    If you want to use `protobuf` and `DocArray`, you can run:
+
+    ```
+    pip install "docarray[proto]"
+    ``` 
+
+Depending on your usage you might want to use `DocArray` with only a couple of specific modalities and their dependencies. 
+For instance, let's say you only want to work with images, you can install `DocArray` using the following command:
+
+```
+pip install "docarray[image]"
+```
+
+...or with images and audio:
+
+```
+pip install "docarray[image, audio]"
+```
+
+!!! warning 
+    This way of installing `DocArray` is only valid starting with version `0.30`
diff --git a/docs/user_guide/representing/first_step.md b/docs/user_guide/representing/first_step.md
@@ -0,0 +1,135 @@
+# Representing
+
+At the heart of `DocArray` lies the concept of [`BaseDoc`][docarray.base_doc.doc.BaseDoc].
+
+A [BaseDoc][docarray.base_doc.doc.BaseDoc] is very similar to a [Pydantic](https://docs.pydantic.dev/)
+[`BaseModel`](https://docs.Pydantic.dev/usage/models) - in fact it _is_ a specialized Pydantic `BaseModel`. It allows you to define custom `Document` schemas (or `Model` in
+the Pydantic world) to represent your data.
+
+## Basic `Doc` usage.
+
+Before going into detail about what we can do with [BaseDoc][docarray.base_doc.doc.BaseDoc] and how to use it, let's
+see what it looks like in practice.
+
+The following Python code defines a `BannerDoc` class that can be used to represent the data of a website banner.
+
+```python
+from docarray import BaseDoc
+from docarray.typing import ImageUrl
+
+
+class BannerDoc(BaseDoc):
+    image_url: ImageUrl
+    title: str
+    description: str
+```
+
+You can then instantiate a `BannerDoc` object and access its attributes.
+
+```python
+banner = BannerDoc(
+    image_url='https://example.com/image.png',
+    title='Hello World',
+    description='This is a banner',
+)
+
+assert banner.image_url == 'https://example.com/image.png'
+assert banner.title == 'Hello World'
+assert banner.description == 'This is a banner'
+```
+
+## `BaseDoc` is a Pydantic `BaseModel`
+
+The class [BaseDoc][docarray.base_doc.doc.BaseDoc] inherits from Pydantic [BaseModel](https://docs.pydantic.dev/usage/models). So you can use
+all the features of `BaseModel` in your `Doc` class. 
+
+This means that `BaseDoc`:
+
+* Will perform data validation: `BaseDoc` will check that the data you pass to it is valid. If not, it will raise an 
+error. Data being "valid" is actually defined by the type used in the type hint itself, but we will come back to this concept later. (TODO add typing section)
+* Can be configured using a nested `Config` class, see Pydantic [documentation](https://docs.pydantic.dev/usage/model_config/) for more detail on what kind of config pydantic offers.
+* Can be used as a drop-in replacement for `BaseModel` in your code and is compatible with tools that use Pydantic like [FastAPI]('https://fastapi.tiangolo.com/').
+
+###  What is the difference with Pydantic `BaseModel`? (INCOMPLETE)
+
+LINK TO THE VERSUS (not ready)
+
+[BaseDoc][docarray.base_doc.doc.BaseDoc] is not only a [BaseModel](https://docs.pydantic.dev/usage/models), 
+
+* You can use it with DocArray [Typed](docarray.typing) that are oriented toward MultiModal (image, audio, ...) data and for 
+Machine Learning use case TODO link the type section. 
+
+Another difference is that [BaseDoc][docarray.base_doc.doc.BaseDoc] has an `id` field that is generated by default that is used to uniquely identify a Document.
+
+## `BaseDoc` allows representing multimodal and nested data
+
+Let's say you want to represent a YouTube video in your application, perhaps to build a search system for YouTube videos.
+A YouTube video is not only composed of a video, but also has a title, description, thumbnail (and more, but let's keep it simple).
+
+All of these elements are from different `modalities` LINK TO MODALITIES SECTION (not ready): the title and description are text, the thumbnail is an image, and the video in itself is, well, a video.
+
+DocArray allows to represent all of this multimodal data in a single object. 
+
+Let's first create an `BaseDoc` for each of the elements that compose the YouTube video.
+
+First for the thumbnail which is an image:
+
+```python
+from docarray import BaseDoc
+from docarray.typing import ImageUrl, ImageBytes
+
+
+class ImageDoc(BaseDoc):
+    url: ImageUrl
+    bytes: ImageBytes = (
+        None  # bytes are not always loaded in memory, so we make it optional
+    )
+```
+
+Then for the video itself:
+
+```python
+from docarray import BaseDoc
+from docarray.typing import VideoUrl, VideoBytes
+
+
+class VideoDoc(BaseDoc):
+    url: VideoUrl
+    bytes: VideoBytes = (
+        None  # bytes are not always loaded in memory, so we make it optional
+    )
+``` 
+
+Then for the title and description (which are text) we will just use a `str` type.
+
+All the elements that compose a YouTube video are ready:
+
+```python
+from docarray import BaseDoc
+
+
+class YouTubeVideoDoc(BaseDoc):
+    title: str
+    description: str
+    thumbnail: ImageDoc
+    video: VideoDoc
+```
+
+You now have `YouTubeVideoDoc` which is a pythonic representation of a YouTube video. 
+
+This representation can now be used to send (LINK) or to store (LINK) data. You can even use it directly to [train a machine learning](../../how_to/multimodal_training_and_serving.md) [Pytorch](https://pytorch.org/docs/stable/index.html) model on this representation. 
+
+!!! note
+
+    You see here that `ImageDoc` and `VideoDoc` are also [BaseDoc][docarray.base_doc.doc.BaseDoc], and they later used inside another [BaseDoc][docarray.base_doc.doc.BaseDoc]`.
+    This is what we call nested data representation. 
+
+    [BaseDoc][docarray.base_doc.doc.BaseDoc] can be nested to represent any kind of data hierarchy.
+
+See also:
+
+* [BaseDoc][docarray.base_doc.doc.BaseDoc] API Reference
+* DOCUMENT_ARARY REF
+* DOCUMENT INDEX REF
+* DOCUMENT STORE REF
+* ...
diff --git a/docs/user_guide/sending/first_step.md b/docs/user_guide/sending/first_step.md
@@ -0,0 +1 @@
+# Sending
diff --git a/docs/user_guide/storing/first_step.md b/docs/user_guide/storing/first_step.md
@@ -0,0 +1 @@
+# Storing
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -74,7 +74,9 @@ nav:
   - Home: README.md
   - Tutorial - User Guide:
     - user_guide/intro.md
-    - user_guide/first_step.md
+    - user_guide/representing/first_step.md
+    - user_guide/sending/first_step.md
+    - user_guide/storing/first_step.md
 
   - How-to:
     - how_to/add_doc_index.md

diff --git a/poetry.lock b/poetry.lock
diff --git a/pyproject.toml b/pyproject.toml
@@ -29,7 +29,7 @@ smart-open = {version = ">=6.3.0", extras = ["s3"], optional = true}
 jina-hubble-sdk = {version = ">=0.34.0", optional = true}
 
 [tool.poetry.extras]
-common = ["protobuf", "lz4"]
+proto = ["protobuf", "lz4"]
 pandas = ["pandas"]
 image = ["pillow", "types-pillow"]
 video = ["av"]