-
Notifications
You must be signed in to change notification settings - Fork 238
docs: add DocList and DocVec section #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
58bca52
docs : wip add AnyDocArray docs
samsja a9c7051
docs : add array section
samsja f30593b
docs : add array section
samsja 0375443
docs : aadd glossary
samsja c303cdb
feat: apply johannes suggestion
samsja bc34b2c
feat: apply johannes suggestion
samsja 67c6275
feat: apply johannes suggestion
samsja c3468e3
feat: apply johannes suggestion
samsja 25cf163
docs : fix johannes suggestion
samsja eee36d4
docs : fix typo
samsja 53f0f4c
docs : fix typo
samsja 4440257
docs : reove pydantic stuff
samsja 4f60d02
docs : fix title
samsja 5ba6d43
feat: apply gammarly
samsja f01e9e1
fix: fix apply grammarly on glossary
samsja 8fc808b
fix: fix doc test
samsja b6dc7d9
docs: fix english
alexcg1 dcaa575
fix: fix add gpt to generative ai
samsja 5dee5e9
fix: fix sentence
samsja 7f3de31
docs: fix english of fixed sentence
alexcg1 0eee05f
feat: apply alex suggestion
samsja bae09f9
fix: fix link
samsja File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # AnyDocArray | ||
|
|
||
| ::: docarray.array.doc_vec.doc_vec.DocVec |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| # DocVec | ||
|
|
||
| ::: docarray.array.doc_vec.doc_vec.DocVec | ||
| ::: docarray.array.any_array.AnyDocArray |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Glossary | ||
|
|
||
| DocArray's scope is at the edge of different fields, from AI to web apps. To make it easier to understand, we have created a glossary of terms used in the documentation. | ||
|
|
||
| ## Concept | ||
|
|
||
| ### `Multimodal Data` | ||
| Multimodal data is data that is composed of different modalities, like Image, Text, Video, Audio, etc. | ||
| For example, a YouTube video is composed of a video, a title, a description, a thumbnail, etc. | ||
|
|
||
| Actually, most of the data we have in the world is multimodal. | ||
|
|
||
| ### `Multimodal AI` | ||
|
|
||
| Multimodal AI is the field of AI that focuses on multimodal data. | ||
|
|
||
| Most of the recent breakthroughs in AI are multimodal AI. | ||
|
|
||
| * [StableDiffusion](https://stability.ai/blog/stable-diffusion-public-release), [Midjourney](https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F), [DALL-E 2](https://openai.com/product/dall-e-2) generate *images* from *text*. | ||
| * [Whisper](https://openai.com/research/whisper) generates *text* from *speech*. | ||
| * [GPT-4](https://openai.com/product/gpt-4) and [Flamingo](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) are MLLMs (Multimodal Large Language Models) that understand both *text* and *images*. | ||
|
|
||
| One of the reasons that AI labs are focusing on multimodal AI is that it can solve a lot of practical problems and that it actually might be | ||
| a requirement to build a strong AI system as argued by Yann Lecun in [this article](https://www.noemamag.com/ai-and-the-limits-of-language/) where he stated that "a system trained on language alone will never approximate human intelligence." | ||
|
|
||
| ### `Generative AI` | ||
|
|
||
| Generative AI is also in the epicenter of the latest AI revolution. These tools allow us to *generate* data. | ||
|
|
||
| * [StableDiffusion](https://stability.ai/blog/stable-diffusion-public-release), [MidJourney](https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F), [Dalle-2](https://openai.com/product/dall-e-2) generate *images* from *text*. | ||
samsja marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * LLM: Large Language Model, (GPT, Flan, LLama, Bloom). These models generate *text*. | ||
|
|
||
| ### `Neural Search` | ||
|
|
||
| Neural search is search powered by neural networks. Unlike traditional keyword-based search methods, neural search understands the context and semantic meaning of a user's query, allowing it to find relevant results even when the exact keywords are not present. | ||
|
|
||
| ### `Vector Database` | ||
|
|
||
| A vector database is a specialized storage system designed to handle high-dimensional vectors, which are common representations of data in machine learning and AI applications. It enables efficient storage, indexing, and querying of these vectors, and typically supports operations like nearest neighbor search, similarity search, and clustering. | ||
|
|
||
| ## Tools | ||
|
|
||
| ### `Jina` | ||
|
|
||
| [Jina](https://jina.ai) is a framework to build multimodal applications. It relies heavily on DocArray to represent and send data. | ||
|
|
||
| DocArray was originally part of Jina but it became a standalone project that is now independent of Jina. | ||
|
|
||
| ### `Pydantic` | ||
|
|
||
| [Pydantic](https://github.com/pydantic/pydantic/) is a Python library that allows data validation using Python type hints. | ||
| DocArray relies on Pydantic. | ||
|
|
||
| ### `FastAPI` | ||
|
|
||
| [FastAPI](https://fastapi.tiangolo.com/) is a Python library that allows building API using Python type hints. | ||
|
|
||
| It is built on top of Pydantic and nicely extends to DocArray. | ||
|
|
||
| ### `Weaviate` | ||
|
|
||
| [Weaviate](https://weaviate.io/) is an open-source vector database that is supported in DocArray. | ||
|
|
||
| ### `Weaviate` | ||
|
|
||
| [Qdrant](https://qdrant.tech/) is an open-source vector database that is supported in DocArray. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stick to "multimodal" (what we use on jina.ai), not "multi modal" or "multi-modal" (or any other variation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thx