Perspective Control using Vanishing Points

Web app with responsive perspective grid editor and image generation tool
Image generated with the perspective lora and a given perspective grid
Example of control image uses for training and during inference
Before and after

Inspiration

My inspiration is to bridge the gap between drawing and AI image generation. I was inspired by the Renaissance artists who start base their art on a perspective grid. I would love to find more methods of controlling image generation. Prompting is difficult and imprecise, while other methods such as controlnets or reference images are very exact - This project sits in between: giving the user precise control over perspective, while still enabling complete freedom of prompting anything they want to see.

What it does

This is a Kontext LoRa trained on pairs of images with a dominant perspective point, and a control image where this point is indicated. It allows the user to specify the exact desired perspective. I have added a web app for easy construction of the control images, and fast inference using the FAL API.

How we built it

Vibe coded the data pipeline. LoRa training with the AI toolkit on Runpod. Made a web app using JS.

Challenges we ran into

Finding adequate data was difficult. There are many perspective estimation algorithms but most of them are not great. Luckily I found a dataset of +1k images with known perspective lines that I could use.

Accomplishments that we're proud of

I'm proud that this novel idea seems to work and that it's quite fun to play with

What we learned

Data quality needs to be better - higher resolution and a bit more balancing of extreme cases (far vanishing points). Two-point perspective is also an obvious next step.