From 92abbfe4eb0098e9a0b124b087620b6b5bdf9615 Mon Sep 17 00:00:00 2001 From: Alex C-G Date: Fri, 3 Mar 2023 13:57:07 +0100 Subject: [PATCH] docs: fixes for graphql, torch pages Signed-off-by: Alex C-G --- docs/advanced/graphql-support/index.md | 19 +++++++++---------- docs/advanced/torch-support/index.md | 23 ++++++++++++----------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/docs/advanced/graphql-support/index.md b/docs/advanced/graphql-support/index.md index 45a1ce5f888..1f784c00882 100644 --- a/docs/advanced/graphql-support/index.md +++ b/docs/advanced/graphql-support/index.md @@ -1,9 +1,9 @@ # GraphQL -DocArray supports GraphQL. You can use GraphQL to query a DocumentArray and get exactly the fields you need: `.embedding` is too big and too verbose, then don't query it. Comparing to the REST API, clients using GraphQL are fast and stable because they control the data they get, not the server. +DocArray supports GraphQL for querying a DocumentArray and getting exactly the fields you need: So, if `.embedding` is too big and verbose, you don't need to query it. Compared to the REST API, clients using GraphQL are fast and stable because they are in control the data they get, not the server. -When integrating DocArray in a GraphQL app, you only need to implement the *query* (in GraphQL idiom, this is like the API endpoint that your server allows). The *schema* part is provided by DocArray and can be used out of the box. +When integrating DocArray into a GraphQL app, you only need to implement the *query* (in GraphQL idiom, this is like the API endpoint that your server allows). The *schema* part is provided by DocArray and can be used out of the box. ````{tip} @@ -11,7 +11,7 @@ This feature requires `strawberry`. You can install it via `pip install "docarra ```` ```{seealso} -This article does *not* serve as the introduction to GraphQL. If you don't have GraphQL background, it is stronly recommended to learn more about GraphQL in the [official GraphQL documentation](https://graphql.org/). You may also want to learn more about [Strawberry](https://strawberry.rocks/). Otherwise, you may get confused by the GraphQL idioms, e.g. query, schema. +This article does *not* serve as the introduction to GraphQL. If you don't have a GraphQL background, we strongly recommend learning more about GraphQL in the [official GraphQL documentation](https://graphql.org/). You may also want to learn more about [Strawberry](https://strawberry.rocks/). Otherwise, you may get confused by the GraphQL idioms, e.g. query, schema. ``` ## Basic example @@ -73,11 +73,11 @@ class Query: schema = strawberry.Schema(query=Query) ``` -Notice how I leverage {class}`~docarray.document.strawberry_type.StrawberryDocument` and use {meth}`~docarray.array.mixins.pydantic.StrawberryMixin.to_strawberry_type` to convert the type in the resolver before returning the result. +Notice how we leverage {class}`~docarray.document.strawberry_type.StrawberryDocument` and use {meth}`~docarray.array.mixins.pydantic.StrawberryMixin.to_strawberry_type` to convert the type in the resolver before returning the result. -In practice, `da` could be your final search results, or some DocumentArray after embedding or preprocessing. Here I just use the dummy matches I created before to serve as the results. +In practice, `da` could be your final search results, or some DocumentArray after embedding or preprocessing. Here we just use the dummy matches we created before to serve as the results. -Finally, save all code snippets above into `toy.py` and run it from the console via: +Finally, save all code snippets above into `toy.py` and run it from the terminal: ```bash strawberry server toy @@ -107,11 +107,11 @@ Try the following query :width: 90% ``` -Now we have one endpoint that allows user to selectively read fields from a DocumentArray. Additional endpoints can be added to `Query` class, to support advance filtering and selecting, but this is beyond the scope of this tutorial. It is also your responsibility as the app/service provider to decide what API you want to expose to users. +Now we have one endpoint that allows user to selectively read fields from a DocumentArray. We can add additional endpoints to the `Query` class to support advanced filtering and selecting, but this is beyond the scope of this tutorial. It is also your responsibility as the app/service provider to decide what API you want to expose to users. ## Integrate with FastAPI -Strawberry's built-in server is perfect for prototyping an API. When it comes to production, you can use FastAPI. Here is a short example how you can wrap the above snippet it in a FastAPI app: +Strawberry's built-in server is perfect for prototyping an API. When it comes to production, you can use FastAPI. Here's a short example to show how to wrap the above snippet in a FastAPI app: ```python from strawberry.asgi import GraphQL @@ -124,5 +124,4 @@ app.add_route('/graphql', graphql_app) app.add_websocket_route('/graphql', graphql_app) ``` -You can learn more about [FastAPI GraphQL support from here](https://fastapi.tiangolo.com/advanced/graphql/). - +Learn more about [FastAPI GraphQL support](https://fastapi.tiangolo.com/advanced/graphql/). diff --git a/docs/advanced/torch-support/index.md b/docs/advanced/torch-support/index.md index 6fadc51d815..ec0e6112108 100644 --- a/docs/advanced/torch-support/index.md +++ b/docs/advanced/torch-support/index.md @@ -1,10 +1,10 @@ # PyTorch/Deep Learning Frameworks -DocArray can be easily integrated into PyTorch, Tensorflow, PaddlePaddle frameworks. +DocArray can be easily integrated into the PyTorch, Tensorflow and PaddlePaddle frameworks. -The `.embedding` and `.tensor` attributes in Document class can contain PyTorch sparse/dense tensor, Tensorflow sparse/dense tensor or PaddlePaddle dense tensor. +The `.embedding` and `.tensor` attributes in Document class can contain a PyTorch sparse/dense tensor, Tensorflow sparse/dense tensor or PaddlePaddle dense tensor. -It means that if you store the Document on disk in `pickle` or `protobuf` with/o compression, or transit the Document over the network in `pickle` or `protobuf` with/o compression, the data type of `.embedding` and `.tensor` is preserved. +It means that if you store the Document on disk in `pickle` or `protobuf` with/o compression, or transmit the Document over the network in `pickle` or `protobuf` without compression, the data type of `.embedding` and `.tensor` is preserved. ```python import numpy as np @@ -28,6 +28,7 @@ da.save_binary('test.protobuf.gz') ``` Now let's load them again and check the data type: + ```python from docarray import DocumentArray @@ -44,20 +45,20 @@ for d in DocumentArray.load_binary('test.protobuf.gz'): ## Load, map, batch in one-shot -There is a very common pattern in the deep learning engineering: loading big data, mapping it via some function for preprocessing on CPU, and batching it to GPU for intensive deep learning stuff. +There is a very common pattern in deep learning engineering: loading big data, mapping it via some function for preprocessing on CPU, and batching it to GPU for intensive deep learning stuff. There are many pitfalls in this pattern when not implemented correctly, to name a few: -- data may not fit into memory; -- mapping via CPU only utilizes a single-core; -- data-draining problem: GPU is not fully utilized as data is blocked by the slow CPU preprocessing step. +- Data may not fit into memory. +- Mapping via CPU only utilizes a single-core. +- Data-draining problem: GPU is not fully utilized as data is blocked by the slow CPU preprocessing step. -DocArray provides a high-level function {meth}`~docarray.array.mixins.dataloader.DataLoaderMixin.dataloader` that allows you to do this in one-shot, avoiding all pitfalls. The following figure illustrates this function: +DocArray provides a high-level function {meth}`~docarray.array.mixins.dataloader.DataLoaderMixin.dataloader` that allows you to do this in one-shot, avoiding all these pitfalls. The following figure illustrates this function: ```{figure} dataloader.svg :width: 80% ``` -Say we have a one million 32 x 32 color images, which takes 3.14GB on the disk with `protocol='protobuf'` and `compress='gz'`. To process it: +Say we have one million 32x32 color images, which takes up 3.14GB on the disk with `protocol='protobuf'` and `compress='gz'`. To process it: ```python import time @@ -96,6 +97,6 @@ cpu job done GPU job done cpu job done GPU job done -cpu job donecpu job done +cpu job done +cpu job done ``` -