Inspiration
My inspiration is to bridge the gap between drawing and AI image generation. I was inspired by the Renaissance artists who start base their art on a perspective grid. I would love to find more methods of controlling image generation. Prompting is difficult and imprecise, while other methods such as controlnets or reference images are very exact - This project sits in between: giving the user precise control over perspective, while still enabling complete freedom of prompting anything they want to see.
What it does
This is a Kontext LoRa trained on pairs of images with a dominant perspective point, and a control image where this point is indicated. It allows the user to specify the exact desired perspective. I have added a web app for easy construction of the control images, and fast inference using the FAL API.
How we built it
Vibe coded the data pipeline. LoRa training with the AI toolkit on Runpod. Made a web app using JS.
Challenges we ran into
Finding adequate data was difficult. There are many perspective estimation algorithms but most of them are not great. Luckily I found a dataset of +1k images with known perspective lines that I could use.
Accomplishments that we're proud of
I'm proud that this novel idea seems to work and that it's quite fun to play with
What we learned
Data quality needs to be better - higher resolution and a bit more balancing of extreme cases (far vanishing points). Two-point perspective is also an obvious next step.
What's next for Vanishing Points
Might be fun to combine with other LoRas (e.g. scene rotation)
Try it out on https://kontext-perspective.jasperschoormans.nl/
Built With
- aitoolkit
- javascript
- python
Log in or sign up for Devpost to join the conversation.