quire.ink

We've been thinking a lot about the future of the humanities. Literature and philosophy offer us entry into the minds of other humans, deepen our thought, and tie us to our cultural heritage; we believe they're extremely important, and worth investing in. Yet despite having ~instantaneous access to ~any major work in history, these days we seem to be getting farther and farther from these texts that made us the way we are. Instead of our unprecedented access coming with more engagement, it paradoxically comes with less.

We believe this is an interfaces problem: compared to what we're used to on the web, historical texts are much harder to discover, contextualize, and navigate. As technologists, then, the question naturally arises: what new affordances could all these huge advances, particularly in NLP, offer us? For Treehacks this year, we wanted to explore this question. We created Quire, an interactive embedding-based exploration tool for interacting with classic works in the humanities.

A quire, in bookbinding, is a set of leaves stitched together. Likewise, quire.ink stitches together the Project Gutenberg archive — more than 70k books* dating from the past century to all the way back before 1105 BC — in a creative, experimental search interface, designed for navigating and connecting a huge library. You can use Quire for all kinds of purposes, like a philosophy research tool, a novel discovery engine, or a poetry comparison mechanism; above all it is designed to bring us closer to the texts that shape us, at a time when our tools are encouraging us more and more to abstract away from them.

In this devpost there's a bunch of detail about how we built it, but we think the best thing to do to understand a new type of tool is to use it! Try it out at https://quire.ink !

How we built Quire

We downloaded (most of*) the Project Gutenberg digital archive, and embedded it paragraph-by-paragraph using ElasticSearch's jina-embeddings-v3 into a custom database. We implemented an efficient custom cosine similarity search algorithm that reduced query time by 50x (10s -> 50ms), then built a ton of features in React for interacting with the archive through the embeddings, serving everything from a single Flask server.

What Quire does

The fundamental affordance of Quire is moving laterally through texts, "stitching" them together and finding passages with common semantic contents by searching in the embedding space. You start out with an embedding search bar, which takes in any text query, embeds it, and then runs our fast search algorithm to find the most similar passages across the Gutenberg archive. You can then open any of these passages to see the entire book it's contained within, see other similar passages, and, most importantly, save it to your collection.

The collection is a sidebar space that accumulates passages of text. These can be passages from books, but can also be custom text, or text that you altered using a builtin LLM rewriter (e.g. you can "rewrite this passage with a male main character" if you want to search for similar passages with characters of a different gender). Once you've accumulated text in your collection, you can average across a group of passages to create a combined average embedding vector, which you can use to search — this allows you to search for passages that are similar to a whole group of passages, rather than just a single passage at a time, thereby averaging out the particular textual details and getting a more abstract representation.

What opportunities does this open up?

We think that Quire is the first prototype for a new way of interacting with texts in the humanities. In particular, Quire lets you ask all sorts of new questions that were much harder to ask before, and takes out the middle layers between you and the text. Rather than relying only on scholarship that already exists, you can now search passage-by-passage, semantically, through this huge corpus of major works, and move through the embedding space using custom alterations and search vectors.

Some examples:

You can look at how historical works across forms and genres engaged with themes or questions you're interested in: what's a good life? What's my role in history? How did people think about the times they lived in? Quire will point you to the exact passages, and let you explore onwards from there.
You can find the passages in human literature that are maximally romantic, or maximally philosophical, or both of those at once, by collecting passages that have those characteristics, combining their embeddings, and searching.
You can find primary sources for specific research questions, e.g. how people thought about 'sovereignty' throughout time, just by searching in natural language (e.g. 'philosophical passages on the question of sovereignty').
You can discover precursor works and possible influences in poetry by embedding a corpus of a poet's work.

But to be honest, we think nobody will really get what Quire is until they've used it. Try it out at https://quire.ink/! (works great on both mobile + desktop)

Source code is https://github.com/nicholascc/gutemgrep/. (old pun name based on gutenberg + mgrep)

* (Due to rate limits we couldn't download and embed the whole corpus in the time we had, so we're actually working with a significant fraction of the Gutenberg archive. The infrastructure we built would of course support the whole archive if we had enough time to process all the data.)