Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/datatypes/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Multimodal Data
# Multimodal data

Whether you’re working with text, image, video, audio, 3D meshes or the nested or the combined of them, you can always represent them as Documents and process them as DocumentArray. Here are some motivate examples:
Whether you’re working with text, image, video, audio, 3D meshes, nested data, or some combination of these, you can always represent them as Documents and process them as DocumentArrays. Here are some motivating examples:


```{toctree}
Expand Down
6 changes: 3 additions & 3 deletions docs/fundamentals/document/construct.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(construct-doc)=
# Construct

Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Document, filled Document. One can also construct Document from bytes, JSON, Protobuf message as introduced {ref}`in the next chapter<serialize>`.
Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.
This section introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.

We should avoid this language, there is always something that for some users will not be easy.


## Construct an empty Document

Expand All @@ -15,7 +15,7 @@ d = Document()
<Document ('id',) at 5dd542406d3f11eca3241e008a366d49>
```

Every Document will have a unique random `id` that helps you identify this Document. It can be used to {ref}`access this Document inside a DocumentArray<access-elements>`.
Every Document has a unique random `id` that helps you identify the Document. It can be used to {ref}`access this Document inside a DocumentArray<access-elements>`.

````{tip}
The random `id` is the hex value of [UUID1](https://docs.python.org/3/library/uuid.html#uuid.uuid1). To convert it into the string of UUID:
Expand Down Expand Up @@ -230,4 +230,4 @@ world

## What's next?

One can also construct Document from bytes, JSON, Protobuf message. These methods are introduced {ref}`in the next chapter<serialize>`.
You can also construct Documents from bytes, JSON, and Protobuf messages. These methods are introduced {ref}`in the next chapter<serialize>`.
20 changes: 10 additions & 10 deletions docs/fundamentals/documentarray/construct.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ da = DocumentArray()
<DocumentArray (length=0) at 4453362704>
```

Now you can use list-like interfaces such as `.append()` and `.extend()` as you would add elements to a Python List.
Now you can use list-like interfaces such as `.append()` and `.extend()` as you would to add elements to a Python List.

```python
da.append(Document(text='hello world!'))
Expand All @@ -24,7 +24,7 @@ da.extend([Document(text='hello'), Document(text='world!')])
<DocumentArray (length=3) at 4446140816>
```

Directly printing a DocumentArray does not show you too much useful information, you can use {meth}`~docarray.array.mixins.plot.PlotMixin.summary`.
Directly printing a DocumentArray doesn't show much useful information. Instead, you can use {meth}`~docarray.array.mixins.plot.PlotMixin.summary`.


```python
Expand All @@ -49,7 +49,7 @@ da.summary()

## Construct with empty Documents

Like `numpy.zeros()`, you can quickly build a DocumentArray with only empty Documents:
You can quickly build a DocumentArray with only empty Documents, similar to `numpy.zeros()`:

```python
from docarray import DocumentArray
Expand All @@ -63,7 +63,7 @@ da = DocumentArray.empty(10)

## Construct from list-like objects

You can construct DocumentArray from a `Sequence`, `List`, `Tuple` or `Iterator` that yields `Document` object.
You can construct a DocumentArray from a `Sequence`, `List`, `Tuple`, or an `Iterator` that yields `Document` objects.

````{tab} From list of Documents
```python
Expand All @@ -90,15 +90,15 @@ da = DocumentArray((Document() for _ in range(10)))
````


As DocumentArray itself is also a "list-like object that yields `Document`", you can also construct DocumentArray from another DocumentArray:
As DocumentArray itself is also a "list-like object that yields `Document` objects", you can also construct a DocumentArray from another DocumentArray:

```python
da = DocumentArray(...)
da1 = DocumentArray(da)
```


## Construct from multiple DocumentArray
## Construct from multiple DocumentArrays

You can use `+` or `+=` to concatenate DocumentArrays together:

Expand Down Expand Up @@ -135,7 +135,7 @@ da = DocumentArray(d1)

## Deep copy on elements

Note that, as in Python list, adding Document object into DocumentArray only adds its memory reference. The original Document is *not* copied. If you change the original Document afterwards, then the one inside DocumentArray will also change. Here is an example,
Note that, as in Python list, adding a Document object into DocumentArray only adds its memory reference. The original Document is *not* copied. If you change the original Document afterwards, then the one inside the DocumentArray will also change. Here is an example:

```python
from docarray import DocumentArray, Document
Expand Down Expand Up @@ -189,7 +189,7 @@ hello

## Construct from local files

You may recall the common pattern that {ref}`I mentioned here<content-uri>`. With {meth}`~docarray.document.generators.from_files` One can easily construct a DocumentArray object with all file paths defined by a glob expression.
You may recall the common pattern that {ref}`I mentioned here<content-uri>`. With {meth}`~docarray.document.generators.from_files`, one can easily construct a DocumentArray object with all file paths defined by a glob expression.

```python
from docarray import DocumentArray
Expand All @@ -199,11 +199,11 @@ da_png = DocumentArray.from_files('images/*.png')
da_all = DocumentArray.from_files(['images/**/*.png', 'images/**/*.jpg', 'images/**/*.jpeg'])
```

This will scan all filenames that match the expression and construct Documents with filled `.uri` attribute. You can control if to read each as text or binary with `read_mode` argument.
This will scan all filenames that match the expression and construct Documents with filled `.uri` attributes. You can specify whether to read each as text or binary with the `read_mode` argument.




## What's next?

In the next chapter, we will see how to construct DocumentArray from binary bytes, JSON, CSV, dataframe, Protobuf message.
In the next chapter, we will see how to construct DocumentArrays from binary bytes, JSON, CSV, dataframe, and Protobuf message.
13 changes: 7 additions & 6 deletions docs/fundamentals/documentarray/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
(documentarray)=
# DocumentArray

This is a Document, we already know it can be a mix in data types and nested in structure:
This is a Document. We already know it can be a mix of data types and nested in structure:

```{figure} images/docarray-single.svg
:width: 30%
Expand All @@ -14,15 +14,15 @@ Then this is a DocumentArray:
```


{class}`~docarray.array.document.DocumentArray` is a list-like container of {class}`~docarray.document.Document` objects. It is **the best way** when working with multiple Documents.
{class}`~docarray.array.document.DocumentArray` is a list-like container of {class}`~docarray.document.Document` objects. It is **the best way** of working with multiple Documents.

In a nutshell, you can simply consider it as a Python `list`, as it implements **all** list interfaces. That is, if you know how to use Python `list`, you already know how to use DocumentArray.
In a nutshell, you can simply think of it as a Python `list`, as it implements **all** list interfaces. That is, if you know how to use a Python `list`, you already know how to use DocumentArray.

It is also powerful as Numpy `ndarray` and Pandas `DataFrame`, allowing you to efficiently [access elements](access-elements.md) and [attributes](access-attributes.md) of contained Documents.
It is also as powerful as Numpy's `ndarray` and Pandas's `DataFrame`, allowing you to efficiently access [elements](access-elements.md) and [attributes](access-attributes.md) of contained Documents.

What makes it more exciting is those advanced features of DocumentArray. These features greatly accelerate data scientists work on accessing nested elements, evaluating, visualizing, parallel computing, serializing, matching etc.
What makes it more exciting is the advanced features of DocumentArray. These features greatly accelerate data scientists' work on accessing nested elements, evaluating, visualizing, parallel computing, serializing, matching etc.

Finally, if your data is too big to fit into memory, you can simply switch to an {ref}`on-disk/remote document store<doc-store>`. All API and user experiences remain the same. No need to learn anything else.
Finally, if your data is too big to fit into memory, you can simply switch to an {ref}`on-disk/remote document store<doc-store>`. All APIs and user experiences remain the same. No need to learn anything else.

## What's next?

Expand All @@ -43,4 +43,5 @@ embedding
matching
subindex
evaluation
interaction-cloud
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we alread have a cloud-support section

Suggested change
interaction-cloud

```
41 changes: 41 additions & 0 deletions docs/fundamentals/documentarray/interaction-cloud.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
(interaction-cloud)=
Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicate
We already added a cloud support section in this PR
#697

Let's remove this file

# Interaction with Jina AI Cloud

```{important}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsja I believe this note is not needed right? they come when u install docarray with pip install docarray?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes not needed

This feature requires the `rich` and `requests` dependencies. You can do `pip install "docarray[full]"` to install them.
```

The {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.push` and {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.pull` methods allow you to serialize a DocumentArray object to Jina AI Cloud and share it across machines.

Imagine you're working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you have everything you need in a DocumentArray. You can easily store it to the cloud via:

```python
from docarray import DocumentArray

da = DocumentArray(...) # heavy lifting, processing, GPU tasks...
da.push('myda123', show_progress=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we require login for this now?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's add it

```

```{figure} images/da-push.png

```

Then on your local laptop, simply pull it:

```python
from docarray import DocumentArray

da = DocumentArray.pull('myda123', show_progress=True)
```

Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
Now you can continue your work locally, analyzing `da` or visualizing it. Your friends and colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues and friends.


The maximum size of an upload is 4GB under the `protocol='protobuf'` and `compress='gzip'` settings. The lifetime of an upload is one week after its creation.

To avoid unnecessary downloads when the upstream DocumentArray is unchanged, you can add `DocumentArray.pull(..., local_cache=True)`.

```{seealso}
DocArray allows pushing, pulling, and managing your DocumentArrays in Jina AI Cloud.
Read more about how to manage your data in Jina AI Cloud, using either the console or the DocArray Python API, in the
{ref}`Data Management section <data-management>`.
```
Loading