docs: refactor getting started section by NicholasDunham · Pull Request #728 · docarray/docarray

NicholasDunham · 2022-11-08T04:52:26Z

Goals:

Simplify Getting Started section
Move "Interaction with Jina AI Cloud" to a separate page
Fix grammar

codecov-commenter · 2022-11-08T04:58:11Z

Codecov Report

Base: 87.58% // Head: 62.13% // Decreases project coverage by -25.45% ⚠️

Coverage data is based on head (6e6da38) compared to base (cc037a7).
Patch has no changes to coverable lines.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #728       +/-   ##
===========================================
- Coverage   87.58%   62.13%   -25.46%     
===========================================
  Files         133      133               
  Lines        6703     6703               
===========================================
- Hits         5871     4165     -1706     
- Misses        832     2538     +1706

Flag	Coverage Δ
docarray	`62.13% <ø> (-25.46%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/document/strawberry_type.py	`0.00% <0.00%> (-100.00%)`	⬇️
docarray/document/mixins/rich_embedding.py	`0.00% <0.00%> (-100.00%)`	⬇️
docarray/math/evaluation.py	`0.00% <0.00%> (-94.83%)`	⬇️
docarray/array/mixins/embed.py	`9.89% <0.00%> (-82.42%)`	⬇️
docarray/document/mixins/strawberry.py	`16.27% <0.00%> (-79.07%)`	⬇️
docarray/array/mixins/evaluation.py	`9.09% <0.00%> (-79.03%)`	⬇️
docarray/document/mixins/mesh.py	`24.44% <0.00%> (-75.56%)`	⬇️
docarray/document/mixins/text.py	`24.00% <0.00%> (-74.00%)`	⬇️
docarray/array/mixins/reduce.py	`26.92% <0.00%> (-73.08%)`	⬇️
docarray/array/mixins/io/pushpull.py	`23.27% <0.00%> (-69.83%)`	⬇️
... and 81 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

docs/fundamentals/documentarray/serialization.md

Co-authored-by: Joan Fontanals <[email protected]> Signed-off-by: Nicholas Dunham <[email protected]>

JoanFM · 2022-11-10T10:06:26Z

docs/fundamentals/documentarray/interaction-cloud.md

+(interaction-cloud)=
+# Interaction with Jina AI Cloud
+
+```{important}


@samsja I believe this note is not needed right? they come when u install docarray with pip install docarray?

yes not needed

alaeddine-13 · 2022-11-10T10:52:55Z

docs/fundamentals/documentarray/serialization.md

 da = DocumentArray.from_dataframe(df)
 ```
-
-## From/to cloud


I think removing this section is breaking for some of our content
In this PR I chose to keep it and document cloud support further in a separate section

alaeddine-13 · 2022-11-10T10:53:47Z

docs/fundamentals/documentarray/interaction-cloud.md

@@ -0,0 +1,41 @@
+(interaction-cloud)=


This is duplicate
We already added a cloud support section in this PR
#697

Let's remove this file

JohannesMessner · 2022-11-10T10:57:00Z

docs/fundamentals/document/construct.md

 # Construct

-Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Document, filled Document. One can also construct Document from bytes, JSON, Protobuf message as introduced {ref}`in the next chapter<serialize>`.
+Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.


Suggested change

Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.

This section introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.

We should avoid this language, there is always something that for some users will not be easy.

JohannesMessner · 2022-11-10T10:58:13Z

docs/fundamentals/documentarray/interaction-cloud.md

+da = DocumentArray.pull('myda123', show_progress=True)
+```
+
+Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.


Suggested change

Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.

Now you can continue your work locally, analyzing `da` or visualizing it. Your friends and colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues and friends.

JohannesMessner · 2022-11-10T10:58:35Z

docs/fundamentals/documentarray/interaction-cloud.md

+from docarray import DocumentArray
+
+da = DocumentArray(...)  # heavy lifting, processing, GPU tasks...
+da.push('myda123', show_progress=True)


Don't we require login for this now?

yes, let's add it

alaeddine-13 · 2022-11-10T14:05:20Z

docs/fundamentals/documentarray/serialization.md

@@ -358,43 +357,3 @@ To build a DocumentArray from dataframe,
 df = ...
 da = DocumentArray.from_dataframe(df)
 ```


As I said, removing this section is breaking for content that rely on push/pull briefly discussed here.
We also have a cloud support section. so this content should not be moved to docs/fundamentals/documentarray/interaction-cloud.md

Suggested change

```

```

## From/to cloud

```{important}

This feature requires `rich` and `requests` dependency. You can do `pip install "docarray[full]"` to install it.

```

{meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.push` and {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.pull` allows you to serialize a DocumentArray object to Jina Cloud and share it across machines.

Considering you are working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you got everything you need in a DocumentArray. You can easily store it to the cloud via:

```python

from docarray import DocumentArray

da = DocumentArray(...) # heavylifting, processing, GPU task, ...

da.push('myda123', show_progress=True)

```

```{figure} images/da-push.png

```

Then on your local laptop, simply pull it:

```python

from docarray import DocumentArray

da = DocumentArray.pull('myda123', show_progress=True)

```

Now you can continue the work at local, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.

The maximum size of an upload is 4GB under the `protocol='protobuf'` and `compress='gzip'` setting. The lifetime of an upload is one week after its creation.

To avoid unnecessary download when upstream DocumentArray is unchanged, you can add `DocumentArray.pull(..., local_cache=True)`.

```{seealso}

DocArray allows pushing, pulling, and managing your DocumentArrays in Jina AI Cloud.

Read more about how to manage your data in Jina AI Cloud, using either the console or the DocArray Python API, in the

{ref}`Data Management section <data-management>`.

```

alaeddine-13 · 2022-11-10T14:06:02Z

docs/fundamentals/documentarray/index.md

 matching
 subindex
 evaluation
+interaction-cloud


we alread have a cloud-support section

Suggested change

interaction-cloud

CatStark · 2022-11-11T10:36:56Z

The part of the integration with Jina Cloud should have more concrete examples.
These are users that will not read all the documentation, they just need to quickly know what to do with their data.

Some example of creating a docarray from pdf, image, video should be nice.

I know it's explained on the other sections, but we need an easy onboarding for new-comers that don't need to know the technical details

docs: refactor getting started section

df30de8

JoanFM requested review from JohannesMessner, alaeddine-13, hanxiao and samsja and removed request for hanxiao November 8, 2022 08:36

JoanFM requested changes Nov 8, 2022

View reviewed changes

docs/fundamentals/documentarray/serialization.md Outdated Show resolved Hide resolved

docs: fix typo

5ae9c77

Co-authored-by: Joan Fontanals <[email protected]> Signed-off-by: Nicholas Dunham <[email protected]>

JoanFM reviewed Nov 10, 2022

View reviewed changes

alaeddine-13 reviewed Nov 10, 2022

View reviewed changes

JohannesMessner requested changes Nov 10, 2022

View reviewed changes

alaeddine-13 reviewed Nov 10, 2022

View reviewed changes

Merge branch 'main' into docs-revise-getting-started

6e6da38

JoanFM closed this Nov 11, 2022

NicholasDunham deleted the docs-revise-getting-started branch November 11, 2022 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: refactor getting started section#728

docs: refactor getting started section#728
NicholasDunham wants to merge 3 commits intodocarray:mainfrom
NicholasDunham:docs-revise-getting-started

NicholasDunham commented Nov 8, 2022

Uh oh!

codecov-commenter commented Nov 8, 2022 •

edited

Loading

Uh oh!

Uh oh!

JoanFM Nov 10, 2022

Uh oh!

samsja Nov 10, 2022

Uh oh!

alaeddine-13 Nov 10, 2022 •

edited

Loading

Uh oh!

alaeddine-13 Nov 10, 2022 •

edited

Loading

Uh oh!

JohannesMessner Nov 10, 2022

Uh oh!

JohannesMessner Nov 10, 2022

Uh oh!

JohannesMessner Nov 10, 2022

Uh oh!

JoanFM Nov 10, 2022

Uh oh!

alaeddine-13 Nov 10, 2022

Uh oh!

alaeddine-13 Nov 10, 2022

Uh oh!

CatStark commented Nov 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	Initializing a Document object is super easy. This chapter introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.
	This section introduces the ways of constructing empty Documents and filled Documents. One can also construct Documents from bytes, JSON, and Protobuf messages, as introduced {ref}`in the next chapter<serialize>`.

	Now you can continue your work locally, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
	Now you can continue your work locally, analyzing `da` or visualizing it. Your friends and colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues and friends.

-```
+```
+## From/to cloud
+```{important}
+This feature requires `rich` and `requests` dependency. You can do `pip install "docarray[full]"` to install it.
+```
+{meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.push` and {meth}`~docarray.array.mixins.io.pushpull.PushPullMixin.pull` allows you to serialize a DocumentArray object to Jina Cloud and share it across machines.
+Considering you are working on a GPU machine via Google Colab/Jupyter. After preprocessing and embedding, you got everything you need in a DocumentArray. You can easily store it to the cloud via:
+```python
+from docarray import DocumentArray
+da = DocumentArray(...)  # heavylifting, processing, GPU task, ...
+da.push('myda123', show_progress=True)
+```
+```{figure} images/da-push.png
+```
+Then on your local laptop, simply pull it:
+```python
+from docarray import DocumentArray
+da = DocumentArray.pull('myda123', show_progress=True)
+```
+Now you can continue the work at local, analyzing `da` or visualizing it. Your friends & colleagues who know the token `myda123` can also pull that DocumentArray. It's useful when you want to quickly share the results with your colleagues & friends.
+The maximum size of an upload is 4GB under the `protocol='protobuf'` and `compress='gzip'` setting. The lifetime of an upload is one week after its creation.
+To avoid unnecessary download when upstream DocumentArray is unchanged, you can add `DocumentArray.pull(..., local_cache=True)`.
+```{seealso}
+DocArray allows pushing, pulling, and managing your DocumentArrays in Jina AI Cloud.
+Read more about how to manage your data in Jina AI Cloud, using either the console or the DocArray Python API, in the
+{ref}`Data Management section <data-management>`.
+```

Conversation

NicholasDunham commented Nov 8, 2022

Uh oh!

codecov-commenter commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alaeddine-13 Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alaeddine-13 Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CatStark commented Nov 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-commenter commented Nov 8, 2022 •

edited

Loading

alaeddine-13 Nov 10, 2022 •

edited

Loading

alaeddine-13 Nov 10, 2022 •

edited

Loading