Skip to content

docs: refactor getting started section#728

Closed
NicholasDunham wants to merge 3 commits intodocarray:mainfrom
NicholasDunham:docs-revise-getting-started
Closed

docs: refactor getting started section#728
NicholasDunham wants to merge 3 commits intodocarray:mainfrom
NicholasDunham:docs-revise-getting-started

Conversation

@NicholasDunham
Copy link
Copy Markdown
Contributor

Goals:

  • Simplify Getting Started section
  • Move "Interaction with Jina AI Cloud" to a separate page
  • Fix grammar

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 8, 2022

Codecov Report

Base: 87.58% // Head: 62.13% // Decreases project coverage by -25.45% ⚠️

Coverage data is based on head (6e6da38) compared to base (cc037a7).
Patch has no changes to coverable lines.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #728       +/-   ##
===========================================
- Coverage   87.58%   62.13%   -25.46%     
===========================================
  Files         133      133               
  Lines        6703     6703               
===========================================
- Hits         5871     4165     -1706     
- Misses        832     2538     +1706     
Flag Coverage Δ
docarray 62.13% <ø> (-25.46%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/document/strawberry_type.py 0.00% <0.00%> (-100.00%) ⬇️
docarray/document/mixins/rich_embedding.py 0.00% <0.00%> (-100.00%) ⬇️
docarray/math/evaluation.py 0.00% <0.00%> (-94.83%) ⬇️
docarray/array/mixins/embed.py 9.89% <0.00%> (-82.42%) ⬇️
docarray/document/mixins/strawberry.py 16.27% <0.00%> (-79.07%) ⬇️
docarray/array/mixins/evaluation.py 9.09% <0.00%> (-79.03%) ⬇️
docarray/document/mixins/mesh.py 24.44% <0.00%> (-75.56%) ⬇️
docarray/document/mixins/text.py 24.00% <0.00%> (-74.00%) ⬇️
docarray/array/mixins/reduce.py 26.92% <0.00%> (-73.08%) ⬇️
docarray/array/mixins/io/pushpull.py 23.27% <0.00%> (-69.83%) ⬇️
... and 81 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@JoanFM JoanFM requested review from JohannesMessner, alaeddine-13, hanxiao and samsja and removed request for hanxiao November 8, 2022 08:36
Co-authored-by: Joan Fontanals <[email protected]>
Signed-off-by: Nicholas Dunham <[email protected]>
(interaction-cloud)=
# Interaction with Jina AI Cloud

```{important}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsja I believe this note is not needed right? they come when u install docarray with pip install docarray?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes not needed

da = DocumentArray.from_dataframe(df)
```

## From/to cloud
Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this section is breaking for some of our content
In this PR I chose to keep it and document cloud support further in a separate section

@@ -0,0 +1,41 @@
(interaction-cloud)=
Copy link
Copy Markdown
Member

@alaeddine-13 alaeddine-13 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicate
We already added a cloud support section in this PR
#697

Let's remove this file

# Construct

Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Document, filled Document. One can also construct Document from bytes, JSON, Protobuf message as introduced {ref}`in the next chapter<serialize>`.
Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.
This section introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.

We should avoid this language, there is always something that for some users will not be easy.

da = DocumentArray.pull('myda123', show_progress=True)
```

Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
Now you can continue your work locally, analyzing `da` or visualizing it. Your friends and colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues and friends.

from docarray import DocumentArray

da = DocumentArray(...) # heavy lifting, processing, GPU tasks...
da.push('myda123', show_progress=True)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we require login for this now?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let's add it

@@ -358,43 +357,3 @@ To build a DocumentArray from dataframe,
df = ...
da = DocumentArray.from_dataframe(df)
```
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, removing this section is breaking for content that rely on push/pull briefly discussed here.
We also have a cloud support section. so this content should not be moved to docs/fundamentals/documentarray/interaction-cloud.md

Suggested change
```
```
## From/to cloud
```{important}
This feature requires `rich` and `requests` dependency. You can do `pip install "docarray[full]"` to install it.
```
{meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.push` and {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.pull` allows you to serialize a DocumentArray object to Jina Cloud and share it across machines.
Considering you are working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you got everything you need in a DocumentArray. You can easily store it to the cloud via:
```python
from docarray import DocumentArray
da = DocumentArray(...) # heavylifting, processing, GPU task, ...
da.push('myda123', show_progress=True)
```
```{figure} images/da-push.png
```
Then on your local laptop, simply pull it:
```python
from docarray import DocumentArray
da = DocumentArray.pull('myda123', show_progress=True)
```
Now you can continue the work at local, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
The maximum size of an upload is 4GB under the `protocol='protobuf'` and `compress='gzip'` setting. The lifetime of an upload is one week after its creation.
To avoid unnecessary download when upstream DocumentArray is unchanged, you can add `DocumentArray.pull(..., local_cache=True)`.
```{seealso}
DocArray allows pushing, pulling, and managing your DocumentArrays in Jina AI Cloud.
Read more about how to manage your data in Jina AI Cloud, using either the console or the DocArray Python API, in the
{ref}`Data Management section <data-management>`.
```

matching
subindex
evaluation
interaction-cloud
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we alread have a cloud-support section

Suggested change
interaction-cloud

@CatStark
Copy link
Copy Markdown

The part of the integration with Jina Cloud should have more concrete examples.
These are users that will not read all the documentation, they just need to quickly know what to do with their data.

Some example of creating a docarray from pdf, image, video should be nice.

I know it's explained on the other sections, but we need an easy onboarding for new-comers that don't need to know the technical details

@JoanFM JoanFM closed this Nov 11, 2022
@NicholasDunham NicholasDunham deleted the docs-revise-getting-started branch November 11, 2022 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants