Skip to content

docs: index predefined documents#1434

Merged
JohannesMessner merged 1 commit intomainfrom
docs-predefined-index
Apr 24, 2023
Merged

docs: index predefined documents#1434
JohannesMessner merged 1 commit intomainfrom
docs-predefined-index

Conversation

@JohannesMessner
Copy link
Copy Markdown
Member

Explains how to index predefined documents into a document index

Signed-off-by: Johannes Messner <[email protected]>
@github-actions
Copy link
Copy Markdown

📝 Docs are deployed on https://ft-docs-predefined-index--jina-docs.netlify.app 🎉

@JohannesMessner JohannesMessner merged commit fad1290 into main Apr 24, 2023
@JohannesMessner JohannesMessner deleted the docs-predefined-index branch April 24, 2023 10:06

### Using a predefined Document as schema

DocArray offers a number of predefined Documents, like [ImageDoce][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DocArray offers a number of predefined Documents, like [ImageDoce][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].
DocArray offers a number of predefined Documents, like [ImageDoc][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].


DocArray offers a number of predefined Documents, like [ImageDoce][docarray.documents.ImageDoc] and [TextDoc][docarray.documents.TextDoc].
If you try to use these directly as a schema for a Document Index, you will get unexpected behavior:
Depending on the backend, and exception will be raised, or no vector index for ANN lookup will be built.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Depending on the backend, and exception will be raised, or no vector index for ANN lookup will be built.
Depending on the backend, an exception will be raised, or no vector index for ANN lookup will be built.

```

Once the schema of your Document Index is defined in this way, the data that you are indexing can be either of the
predefined Document type, or of your custom Document type.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predefined Document type, or of your custom Document type.
predefined Document types, or your custom Document type.

- A and B have the same field names and field types
- A and B have the same field names, and, for every field, the type of B is a subclass of the type of A

In particular this means that you can easily [index predefined Documents](#using-a-predefined-document-as-schema) into a Document Index.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In particular this means that you can easily [index predefined Documents](#using-a-predefined-document-as-schema) into a Document Index.
In particular, this means that you can easily [index predefined Documents](#using-a-predefined-document-as-schema) into a Document Index.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the policy on capitalizing Document now that we don't use that class name? I think @samsja mentioned on Discord we don't do that any more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still capitalize it, since it is a concept in our library. Lowercased it looks a bit weird and "unofficial" to me. Plus, I think the rule of thumb was always that "concepts" are capitalized, whereas classes go in between backticks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feeling here. But tehcnically speaking Document is not a concept in term of code in the library

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a concept but just not a class, otherwise "concept" and "class" would be synonyms. But I just checked the pydantic documentation, they don't capitalize "model". So no strong feeling either

@JohannesMessner
Copy link
Copy Markdown
Member Author

@alexcg1 I was a bit fast on the trigger there, your suggested fixes are here: #1436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants