Note: The thumbnail image was made using this project!
Inspiration
Visum stands for Visual Summary.
Wanted to combine two awesome AI models to do something.
The idea of turning a paragraph into an image for use in a blog post popped into my head and was too cool not to try.
What it does
Given a paragraph, it turns it into an image by first summarizing the paragraph in one line and inputting that line to a https://github.com/openai/glide-text2im.
How we built it
Front-end: Svelte Back-end: Flask Summarization: "snrspeaks/t5-one-line-summary" from HuggingFace Text to Image: GLIDE
Challenges we ran into
Model weights are large. Code uses a lot of RAM and GPU Memory. I know nothing about front-end let alone Svelte (literally Googled how to center a div, still no idea) How to make front-end talk to back-end, back-end load model and get predictions, and then back-end communicate with front-end.
Accomplishments that we're proud of
It works!
What we learned
First time used Svelte. Making a UI is hard. Getting even a simple front-end with a back-end is hard enough, adding a deep learning model is too much (I'm still going to keep trying though) It's hard to find an ML model that works out-of-the-box (HuggingFace saved me!)
What's next for Visum
No idea!

Log in or sign up for Devpost to join the conversation.