Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions docs/advanced/graphql-support/index.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# GraphQL


DocArray supports GraphQL. You can use GraphQL to query a DocumentArray and get exactly the fields you need: `.embedding` is too big and too verbose, then don't query it. Comparing to the REST API, clients using GraphQL are fast and stable because they control the data they get, not the server.
DocArray supports GraphQL for querying a DocumentArray and getting exactly the fields you need: So, if `.embedding` is too big and verbose, you don't need to query it. Compared to the REST API, clients using GraphQL are fast and stable because they are in control the data they get, not the server.

When integrating DocArray in a GraphQL app, you only need to implement the *query* (in GraphQL idiom, this is like the API endpoint that your server allows). The *schema* part is provided by DocArray and can be used out of the box.
When integrating DocArray into a GraphQL app, you only need to implement the *query* (in GraphQL idiom, this is like the API endpoint that your server allows). The *schema* part is provided by DocArray and can be used out of the box.


````{tip}
This feature requires `strawberry`. You can install it via `pip install "docarray[full]"` or `pip install "strawberry-graphql[debug-server]"`.
````

```{seealso}
This article does *not* serve as the introduction to GraphQL. If you don't have GraphQL background, it is stronly recommended to learn more about GraphQL in the [official GraphQL documentation](https://graphql.org/). You may also want to learn more about [Strawberry](https://strawberry.rocks/). Otherwise, you may get confused by the GraphQL idioms, e.g. query, schema.
This article does *not* serve as the introduction to GraphQL. If you don't have a GraphQL background, we strongly recommend learning more about GraphQL in the [official GraphQL documentation](https://graphql.org/). You may also want to learn more about [Strawberry](https://strawberry.rocks/). Otherwise, you may get confused by the GraphQL idioms, e.g. query, schema.
```

## Basic example
Expand Down Expand Up @@ -73,11 +73,11 @@ class Query:
schema = strawberry.Schema(query=Query)
```

Notice how I leverage {class}`~docarray.document.strawberry_type.StrawberryDocument` and use {meth}`~docarray.array.mixins.pydantic.StrawberryMixin.to_strawberry_type` to convert the type in the resolver before returning the result.
Notice how we leverage {class}`~docarray.document.strawberry_type.StrawberryDocument` and use {meth}`~docarray.array.mixins.pydantic.StrawberryMixin.to_strawberry_type` to convert the type in the resolver before returning the result.

In practice, `da` could be your final search results, or some DocumentArray after embedding or preprocessing. Here I just use the dummy matches I created before to serve as the results.
In practice, `da` could be your final search results, or some DocumentArray after embedding or preprocessing. Here we just use the dummy matches we created before to serve as the results.

Finally, save all code snippets above into `toy.py` and run it from the console via:
Finally, save all code snippets above into `toy.py` and run it from the terminal:

```bash
strawberry server toy
Expand Down Expand Up @@ -107,11 +107,11 @@ Try the following query
:width: 90%
```

Now we have one endpoint that allows user to selectively read fields from a DocumentArray. Additional endpoints can be added to `Query` class, to support advance filtering and selecting, but this is beyond the scope of this tutorial. It is also your responsibility as the app/service provider to decide what API you want to expose to users.
Now we have one endpoint that allows user to selectively read fields from a DocumentArray. We can add additional endpoints to the `Query` class to support advanced filtering and selecting, but this is beyond the scope of this tutorial. It is also your responsibility as the app/service provider to decide what API you want to expose to users.

## Integrate with FastAPI

Strawberry's built-in server is perfect for prototyping an API. When it comes to production, you can use FastAPI. Here is a short example how you can wrap the above snippet it in a FastAPI app:
Strawberry's built-in server is perfect for prototyping an API. When it comes to production, you can use FastAPI. Here's a short example to show how to wrap the above snippet in a FastAPI app:

```python
from strawberry.asgi import GraphQL
Expand All @@ -124,5 +124,4 @@ app.add_route('/graphql', graphql_app)
app.add_websocket_route('/graphql', graphql_app)
```

You can learn more about [FastAPI GraphQL support from here](https://fastapi.tiangolo.com/advanced/graphql/).

Learn more about [FastAPI GraphQL support](https://fastapi.tiangolo.com/advanced/graphql/).
23 changes: 12 additions & 11 deletions docs/advanced/torch-support/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# PyTorch/Deep Learning Frameworks

DocArray can be easily integrated into PyTorch, Tensorflow, PaddlePaddle frameworks.
DocArray can be easily integrated into the PyTorch, Tensorflow and PaddlePaddle frameworks.

The `.embedding` and `.tensor` attributes in Document class can contain PyTorch sparse/dense tensor, Tensorflow sparse/dense tensor or PaddlePaddle dense tensor.
The `.embedding` and `.tensor` attributes in Document class can contain a PyTorch sparse/dense tensor, Tensorflow sparse/dense tensor or PaddlePaddle dense tensor.

It means that if you store the Document on disk in `pickle` or `protobuf` with/o compression, or transit the Document over the network in `pickle` or `protobuf` with/o compression, the data type of `.embedding` and `.tensor` is preserved.
It means that if you store the Document on disk in `pickle` or `protobuf` with/o compression, or transmit the Document over the network in `pickle` or `protobuf` without compression, the data type of `.embedding` and `.tensor` is preserved.

```python
import numpy as np
Expand All @@ -28,6 +28,7 @@ da.save_binary('test.protobuf.gz')
```

Now let's load them again and check the data type:

```python
from docarray import DocumentArray

Expand All @@ -44,20 +45,20 @@ for d in DocumentArray.load_binary('test.protobuf.gz'):

## Load, map, batch in one-shot

There is a very common pattern in the deep learning engineering: loading big data, mapping it via some function for preprocessing on CPU, and batching it to GPU for intensive deep learning stuff.
There is a very common pattern in deep learning engineering: loading big data, mapping it via some function for preprocessing on CPU, and batching it to GPU for intensive deep learning stuff.

There are many pitfalls in this pattern when not implemented correctly, to name a few:
- data may not fit into memory;
- mapping via CPU only utilizes a single-core;
- data-draining problem: GPU is not fully utilized as data is blocked by the slow CPU preprocessing step.
- Data may not fit into memory.
- Mapping via CPU only utilizes a single-core.
- Data-draining problem: GPU is not fully utilized as data is blocked by the slow CPU preprocessing step.

DocArray provides a high-level function {meth}`~docarray.array.mixins.dataloader.DataLoaderMixin.dataloader` that allows you to do this in one-shot, avoiding all pitfalls. The following figure illustrates this function:
DocArray provides a high-level function {meth}`~docarray.array.mixins.dataloader.DataLoaderMixin.dataloader` that allows you to do this in one-shot, avoiding all these pitfalls. The following figure illustrates this function:

```{figure} dataloader.svg
:width: 80%
```

Say we have a one million 32 x 32 color images, which takes 3.14GB on the disk with `protocol='protobuf'` and `compress='gz'`. To process it:
Say we have one million 32x32 color images, which takes up 3.14GB on the disk with `protocol='protobuf'` and `compress='gz'`. To process it:

```python
import time
Expand Down Expand Up @@ -96,6 +97,6 @@ cpu job done
GPU job done
cpu job done
GPU job done
cpu job donecpu job done
cpu job done
cpu job done
```