From a959710c0a2c23ad44af0bef1d87754c0d375c53 Mon Sep 17 00:00:00 2001 From: Joan Fontanals Martinez Date: Wed, 28 Sep 2022 18:00:48 +0200 Subject: [PATCH 1/4] docs: adjust readme according to suggestions --- README.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index bd476cb5556..b346292b00a 100644 --- a/README.md +++ b/README.md @@ -195,9 +195,13 @@ Or you can simply pull it from Jina Cloud: left_da = DocumentArray.pull('demo-leftda', show_progress=True) ``` +**Note** +If you have at least XGB of RAM and want to try using the whole dataset instead of just the first 1000 images, remove [:1000] when loading the files into the DocumentArrays left_da and right_da. + + You will see a running progress bar to indicate the downloading process. -To get a feeling of the data you will handle, plot them in one sprite image: +To get a feeling of the data you will handle, plot them in one sprite image, you will need to have matplotlib and torch installed to run this snippet: ```python left_da.plot_image_sprites() @@ -243,10 +247,10 @@ This step takes ~30 seconds on GPU. Beside PyTorch, you can also use TensorFlow, ### Visualize embeddings -You can visualize the embeddings via tSNE in an interactive embedding projector: +You can visualize the embeddings via tSNE in an interactive embedding projector, you will need to have pydantic, uvicorn and fastapi installed to run this snippet:: ```python -left_da.plot_embeddings() +left_da.plot_embeddings(image_sprites=True) ```

@@ -268,7 +272,7 @@ Fun is fun, but recall our goal is to match left images against right images and right_da = ( DocumentArray.pull('demo-rightda', show_progress=True) .apply(preproc) - .embed(model, device='cuda') + .embed(model, device='cuda')[:1000] ) ``` @@ -277,7 +281,7 @@ right_da = ( ```python right_da = ( - DocumentArray.from_files('right/*.jpg').apply(preproc).embed(model, device='cuda') + DocumentArray.from_files('right/*.jpg')[:1000].apply(preproc).embed(model, device='cuda') ) ``` @@ -296,9 +300,8 @@ left_da.match(right_da, limit=9) Let's inspect what's inside `left_da` matches now: ```python -for d in left_da: - for m in d.matches: - print(d.uri, m.uri, m.scores['cosine'].value) +for m in left_da[0].matches: + print(d.uri, m.uri, m.scores['cosine'].value) ``` ```text From 7528f4ba913819ddb46ae042446a3d9aa29be631 Mon Sep 17 00:00:00 2001 From: Joan Fontanals Date: Thu, 29 Sep 2022 10:35:56 +0200 Subject: [PATCH 2/4] fix: apply suggestions from code review Co-authored-by: Nicholas Dunham <11730795+NicholasDunham@users.noreply.github.com> --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b346292b00a..4a22063c47d 100644 --- a/README.md +++ b/README.md @@ -201,7 +201,7 @@ If you have at least XGB of RAM and want to try using the whole dataset instead You will see a running progress bar to indicate the downloading process. -To get a feeling of the data you will handle, plot them in one sprite image, you will need to have matplotlib and torch installed to run this snippet: +To get a feeling of the data you will handle, plot them in one sprite image. You will need to have matplotlib and torch installed to run this snippet: ```python left_da.plot_image_sprites() @@ -247,7 +247,7 @@ This step takes ~30 seconds on GPU. Beside PyTorch, you can also use TensorFlow, ### Visualize embeddings -You can visualize the embeddings via tSNE in an interactive embedding projector, you will need to have pydantic, uvicorn and fastapi installed to run this snippet:: +You can visualize the embeddings via tSNE in an interactive embedding projector. You will need to have pydantic, uvicorn and fastapi installed to run this snippet: ```python left_da.plot_embeddings(image_sprites=True) From fa9594aa43f571edf1c6de2e973fc8152a503c6b Mon Sep 17 00:00:00 2001 From: Joan Fontanals Martinez Date: Thu, 29 Sep 2022 22:17:34 +0200 Subject: [PATCH 3/4] docs: set amount of RAM needed for example --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index b346292b00a..6521cf44e22 100644 --- a/README.md +++ b/README.md @@ -196,7 +196,7 @@ left_da = DocumentArray.pull('demo-leftda', show_progress=True) ``` **Note** -If you have at least XGB of RAM and want to try using the whole dataset instead of just the first 1000 images, remove [:1000] when loading the files into the DocumentArrays left_da and right_da. +You will need more than 15GB of RAM if you want to try using the whole dataset instead of just the first 1000 images, remove [:1000] when loading the files into the DocumentArrays left_da and right_da. You will see a running progress bar to indicate the downloading process. From e6cb3b0f657d2bd79b8e84a0ab1f2113f5d3ba18 Mon Sep 17 00:00:00 2001 From: Joan Fontanals Date: Thu, 29 Sep 2022 22:27:51 +0200 Subject: [PATCH 4/4] fix: update README.md Co-authored-by: Nicholas Dunham <11730795+NicholasDunham@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 33ae363a2a2..2746e36546a 100644 --- a/README.md +++ b/README.md @@ -196,7 +196,7 @@ left_da = DocumentArray.pull('demo-leftda', show_progress=True) ``` **Note** -You will need more than 15GB of RAM if you want to try using the whole dataset instead of just the first 1000 images, remove [:1000] when loading the files into the DocumentArrays left_da and right_da. +If you have more than 15GB of RAM and want to try using the whole dataset instead of just the first 1000 images, remove [:1000] when loading the files into the DocumentArrays left_da and right_da. You will see a running progress bar to indicate the downloading process.