diff --git a/README.md b/README.md index cbb79461ed8..9253bc06040 100644 --- a/README.md +++ b/README.md @@ -14,21 +14,21 @@ -DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer the multi-modal data with a Pythonic API. +DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. -πͺ **Door to cross-/multi-modal world**: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of [Jina](https://github.com/jina-ai/jina), [CLIP-as-service](https://github.com/jina-ai/clip-as-service), [DALLΒ·E Flow](https://github.com/jina-ai/dalle-flow), [DiscoArt](https://github.com/jina-ai/discoart) etc. +πͺ **Door to multimodal world**: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of [Jina](https://github.com/jina-ai/jina), [CLIP-as-service](https://github.com/jina-ai/clip-as-service), [DALLΒ·E Flow](https://github.com/jina-ai/dalle-flow), [DiscoArt](https://github.com/jina-ai/discoart) etc. π§βπ¬ **Data science powerhouse**: greatly accelerate data scientists' work on embedding, k-NN matching, querying, visualizing, evaluating via Torch/TensorFlow/ONNX/PaddlePaddle on CPU/GPU. π‘ **Data in transit**: optimized for network communication, ready-to-wire at anytime with fast and compressed serialization in Protobuf, bytes, base64, JSON, CSV, DataFrame. Perfect for streaming and out-of-memory data. -π **One-stop k-NN**: Unified and consistent API for mainstream vector databases that allows nearest neighboour search including Elasticsearch, Redis, ANNLite, Qdrant, Weaviate. +π **One-stop k-NN**: Unified and consistent API for mainstream vector databases that allows nearest neighbor search including Elasticsearch, Redis, AnnLite, Qdrant, Weaviate. -π **For modern apps**: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable webservices. +π **For modern apps**: GraphQL support makes your server versatile on request and response; built-in data validation and JSON Schema (OpenAPI) help you build reliable web services. -π **Pythonic experience**: designed to be as easy as a Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write. +π **Pythonic experience**: as easy as a Python list. If you can Python, you can DocArray. Intuitive idioms and type annotation simplify the code you write. -πΈ **Integrate with IDE**: pretty-print and visualization on Jupyter notebook & Google Colab; comprehensive auto-complete and type hint in PyCharm & VS Code. +πΈ **IDE integration**: pretty-print and visualization on Jupyter notebook and Google Colab; comprehensive autocomplete and type hints in PyCharm and VS Code. Read more on [why should you use DocArray](https://docarray.jina.ai/get-started/what-is/) and [comparison to alternatives](https://docarray.jina.ai/get-started/what-is/#comparing-to-alternatives). @@ -61,9 +61,9 @@ DocArray consists of three simple concepts: Let's see DocArray in action with some examples. -### Example 1: represent multimodal data in dataclass +### Example 1: represent multimodal data in a dataclass -The following news article card can be easily represented via `docarray.dataclass` and type annotation: +You can easily represent the following news article card with `docarray.dataclass` and type annotation:
-Fun is fun, but recall our goal is to match left images against right images and so far we have only handled the left. Let's repeat the same procedure for the right:
-
+Fun is fun, but our goal is to match left images against right images, and so far we have only handled the left. Let's repeat the same procedure for the right:
-What we did here is revert the preprocessing steps (i.e. switching axis and normalizing) on the copied matches, so that you can visualize them using image sprites.
+Here we reversed the preprocessing steps (i.e. switching axis and normalizing) on the copied matches, so you can visualize them using image sprites.
### Quantitative evaluation
@@ -350,7 +347,7 @@ groundtruth = DocumentArray(
)
```
-Here we create a new DocumentArray with real matches by simply replacing the filename, e.g. `left/00001.jpg` to `right/00001.jpg`. That's all we need: if the predicted match has the identical `uri` as the groundtruth match, then it is correct.
+Here we created a new DocumentArray with real matches by simply replacing the filename, e.g. `left/00001.jpg` to `right/00001.jpg`. That's all we need: if the predicted match has the identical `uri` as the groundtruth match, then it is correct.
Now let's check recall rate from 1 to 5 over the full dataset:
@@ -372,25 +369,23 @@ recall@4 0.052194148936170214
recall@5 0.0573470744680851
```
-More metrics can be used such as `precision_at_k`, `ndcg_at_k`, `hit_at_k`.
+You can also use other metrics like `precision_at_k`, `ndcg_at_k`, `hit_at_k`.
-If you think a pretrained ResNet50 is good enough, let me tell you with [Finetuner](https://github.com/jina-ai/finetuner) you could do much better in just 10 extra lines of code. [Here is how](https://finetuner.jina.ai/notebooks/image_to_image/).
+If you think a pretrained ResNet50 is good enough, let me tell you with [Finetuner](https://github.com/jina-ai/finetuner) you can do much better with [just another ten lines of code](https://finetuner.jina.ai/notebooks/image_to_image/).
### Save results
-You can save a DocumentArray to binary, JSON, dict, DataFrame, CSV or Protobuf message with/without compression. In its simplest form,
+You can save a DocumentArray to binary, JSON, dict, DataFrame, CSV or Protobuf message with/without compression. In its simplest form:
```python
left_da.save('left_da.bin')
```
-To reuse it, do `left_da = DocumentArray.load('left_da.bin')`.
-
+To reuse that DocumentArray's data, use `left_da = DocumentArray.load('left_da.bin')`.
If you want to transfer a DocumentArray from one machine to another or share it with your colleagues, you can do:
-
```python
left_da.push('my_shared_da')
```
@@ -406,7 +401,7 @@ Intrigued? That's only scratching the surface of what DocArray is capable of. [R
## Support
-- Join our [Slack community](https://slack.jina.ai) and chat with other community members about ideas.
+- Join our [Slack community](https://jina.ai/slack) and chat with other community members about ideas.
> DocArray is a trademark of LF AI Projects, LLC