-
Notifications
You must be signed in to change notification settings - Fork 238
docs: refactor getting started section #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -1,7 +1,7 @@ | ||||
| (documentarray)= | ||||
| # DocumentArray | ||||
|
|
||||
| This is a Document, we already know it can be a mix in data types and nested in structure: | ||||
| This is a Document. We already know it can be a mix of data types and nested in structure: | ||||
|
|
||||
| ```{figure} images/docarray-single.svg | ||||
| :width: 30% | ||||
|
|
@@ -14,15 +14,15 @@ Then this is a DocumentArray: | |||
| ``` | ||||
|
|
||||
|
|
||||
| {class}`~docarray.array.document.DocumentArray` is a list-like container of {class}`~docarray.document.Document` objects. It is **the best way** when working with multiple Documents. | ||||
| {class}`~docarray.array.document.DocumentArray` is a list-like container of {class}`~docarray.document.Document` objects. It is **the best way** of working with multiple Documents. | ||||
|
|
||||
| In a nutshell, you can simply consider it as a Python `list`, as it implements **all** list interfaces. That is, if you know how to use Python `list`, you already know how to use DocumentArray. | ||||
| In a nutshell, you can simply think of it as a Python `list`, as it implements **all** list interfaces. That is, if you know how to use a Python `list`, you already know how to use DocumentArray. | ||||
|
|
||||
| It is also powerful as Numpy `ndarray` and Pandas `DataFrame`, allowing you to efficiently [access elements](access-elements.md) and [attributes](access-attributes.md) of contained Documents. | ||||
| It is also as powerful as Numpy's `ndarray` and Pandas's `DataFrame`, allowing you to efficiently access [elements](access-elements.md) and [attributes](access-attributes.md) of contained Documents. | ||||
|
|
||||
| What makes it more exciting is those advanced features of DocumentArray. These features greatly accelerate data scientists work on accessing nested elements, evaluating, visualizing, parallel computing, serializing, matching etc. | ||||
| What makes it more exciting is the advanced features of DocumentArray. These features greatly accelerate data scientists' work on accessing nested elements, evaluating, visualizing, parallel computing, serializing, matching etc. | ||||
|
|
||||
| Finally, if your data is too big to fit into memory, you can simply switch to an {ref}`on-disk/remote document store<doc-store>`. All API and user experiences remain the same. No need to learn anything else. | ||||
| Finally, if your data is too big to fit into memory, you can simply switch to an {ref}`on-disk/remote document store<doc-store>`. All APIs and user experiences remain the same. No need to learn anything else. | ||||
|
|
||||
| ## What's next? | ||||
|
|
||||
|
|
@@ -43,4 +43,5 @@ embedding | |||
| matching | ||||
| subindex | ||||
| evaluation | ||||
| interaction-cloud | ||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we alread have a cloud-support section
Suggested change
|
||||
| ``` | ||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,41 @@ | ||||||
| (interaction-cloud)= | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is duplicate Let's remove this file |
||||||
| # Interaction with Jina AI Cloud | ||||||
|
|
||||||
| ```{important} | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @samsja I believe this
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes not needed |
||||||
| This feature requires the `rich` and `requests` dependencies. You can do `pip install "docarray[full]"` to install them. | ||||||
| ``` | ||||||
|
|
||||||
| The {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.push` and {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.pull` methods allow you to serialize a DocumentArray object to Jina AI Cloud and share it across machines. | ||||||
|
|
||||||
| Imagine you're working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you have everything you need in a DocumentArray. You can easily store it to the cloud via: | ||||||
|
|
||||||
| ```python | ||||||
| from docarray import DocumentArray | ||||||
|
|
||||||
| da = DocumentArray(...) # heavy lifting, processing, GPU tasks... | ||||||
| da.push('myda123', show_progress=True) | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't we require login for this now?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, let's add it |
||||||
| ``` | ||||||
|
|
||||||
| ```{figure} images/da-push.png | ||||||
|
|
||||||
| ``` | ||||||
|
|
||||||
| Then on your local laptop, simply pull it: | ||||||
|
|
||||||
| ```python | ||||||
| from docarray import DocumentArray | ||||||
|
|
||||||
| da = DocumentArray.pull('myda123', show_progress=True) | ||||||
| ``` | ||||||
|
|
||||||
| Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| The maximum size of an upload is 4GB under the `protocol='protobuf'` and `compress='gzip'` settings. The lifetime of an upload is one week after its creation. | ||||||
|
|
||||||
| To avoid unnecessary downloads when the upstream DocumentArray is unchanged, you can add `DocumentArray.pull(..., local_cache=True)`. | ||||||
|
|
||||||
| ```{seealso} | ||||||
| DocArray allows pushing, pulling, and managing your DocumentArrays in Jina AI Cloud. | ||||||
| Read more about how to manage your data in Jina AI Cloud, using either the console or the DocArray Python API, in the | ||||||
| {ref}`Data Management section <data-management>`. | ||||||
| ``` | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid this language, there is always something that for some users will not be easy.