QueryQuack

Help command in action
Querying a pdf in english
Querying a pdf in french
Querying a 30+ philosophy paper
Being able to clear namespaces

Inspiration

Sometimes, ctrl-F is not that helpful when you are trying to find specific/implicit information in a PDF. Or sometimes, we would like to convert the extracted information from a document into something more meaningful. We even want to Q&A the pdf through a chatbot. So we decided to make a Cohere Discord bot to solve these issues.

What it does

The Discord bot accepts pdf files in which the user can chat with the bot based on the given document. You can ask it question, tell it to give a brief summary of the document, or anything that relates to the background knowledge it has.

Since this bot supports multilingual embedding (100+ languages), pdf and query does not have to be in English.

Discord Commands

Commands	Description
!help	Returns a message containing the commands and their description
!help [command]	!help but more info on a command
!load [namespace]	Will load the PDF file that is attached to be queried in a namespace. QueryQuack will query PDFs in the same namespace, if no namespace is given, a new namespace will be generated or the latest namespace will be used.
!ask [prompt]	Requires a pdf to be loaded beforehand.
!clearPDF	Will clear all the pdfs in storage. Warning! It will not clear the Namespace.
!clearNamespace [namespace*]	Deletes and clears out the given namespace name
!listNamespaces	Will list all namespaces
!listPDFs	Will list pdfs currently saved in the data folder
!setNamespace [namespace]	Set the namespace to query from

Setting different namespaces is great for keeping loaded documents separated and avoid bleeding their content onto other documents. Example: Providing two different letters to the bot then asking who wrote the letter may provide an unexpected answer. However, storing documents that covers an identical topic in the same namespace may be beneficial to the user's query.

How we built it

When the user uploads a pdf file to the bot with a Discord command, the document will be split into small enough chunks and each chunk embedding (with the help of Cohere Embeddings multilingual-22-12) will be stored into a namespace (partition) of Pinecone vector database. With the namespace containing some vectors, the bot now has background knowledge based on the given pdf and can now be queried. Inputing the question with the ask command, it finds the most similar vector chunks (in a selected namespace) to the query. The bot is prepped with Cohere LLM (command-xlarge-nightly), some memory, and a prompt template. Furthermore, LangChain helps pass the user's query and similar context chunks into the prompt, then into Cohere. It outputs a response on Discord and updates the memory (appends user's query and its own reply).

Challenges we ran into

We had troubles being able to access pdf files from discord in a way that it could be read through langchain. We managed to troubleshoot by saving the file locally and accessing it through there. There were also lots of unexpected behaviours that we had to solve. We also made a lot of edits throughout our code and adopted an object-oriented programming view since we ran into issues with keeping global variables throughout our bot.

Accomplishments that we're proud of

We made a Discord bot that can answer questions from a PDF. This is our first time making a bot this advanced so we had to spend a lot of time looking at documentation and debugging. Being able to finish something is a huge accomplishment for us as well!

What we learned

We learned how to use CoHere, more specifically how to use the LLMs and cohere embeddings through LangChain. On the discord side, we learned how to specify bot commands, arguments, and object-oriented programming.

What's next for QueryQuack

Allow the bot to accept other extensions such as Word, txt, PPT (powerpoint), etc
Accept larger files and faster process time
Hosting it 24/7 and using a non-local database to hold the files

Built With

cohere
discord
langchain
pinecone
python

Updates

Nina . started this project — May 07, 2023 03:38 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.