Inspiration
I recently attended a local meetup where the speaker framed the image to video creation process as a spec document to code implementation process using GenAI tools. The framing made me interested in the media creation process and the idea of iteratively refining the image to create the perfect video.
To test this process, I felt that it would be interesting to implement it as a plugin for the MX Creative Console, and other devices using Logi Actions SDK. The MX Creative Console would become a physical shortcut and visual indicator to help facilitate the image to video feedback loop with GenAI tools. I wanted to make the process as seamless as possible with clear visual indicators for processes, abstracting away context transfer between tools, and automatic integration with popular creative suite tools such as Photoshop and Davinci Resolve.
What It Does
SmolCreate maps a multi-model AI generative pipeline onto the MX Creative Console. Nine LCD keys represent different pipeline stages and actions: New Concept, Generate Image, Generate Video, Refine, Approve, Reject, Open in Photoshop, Open in Resolve, and a notification key for watch actionable events. Each key shows color-coded status to indicate whether a stage is idle, processing, ready, or in an error state.
Keys use a two-press interaction model. The first press selects a key and binds the dial to its parameter (e.g. selecting Gen Image lets the dial set the number of variations from 1 to 4). The second press executes the action. After a batch returns, the dial switches to cycling through the results. The key's status color tells you which mode you're in.
A companion dashboard provides the full context for the user: prompt editor, image comparison grid, video cost gate, and quality triage results. For MX Master 4 support, the same actions appear in the Actions Ring overlay with the dashboard as the primary feedback surface.
How I Built It (Phase 1)
This is a Phase 1 concept pitch. The deliverables are mockups, architecture planning, and this writeup.
I started by mapping the pipeline stages and identifying which steps are mechanical (routing files, calling APIs) and which need human judgment (selecting an image, confirming video spend). In this manner, SmolCreate handles the seamless routing and integration of tools while the user is kept in the loop at key decision points.
From there I designed the hardware mapping. Nine LCD keys, one dial, and a set of status colors needed to communicate enough pipeline state that the dashboard is helpful but not required. The dashboard adds depth: editing prompts, comparing batches side by side, reviewing triage diagnostics.
The mockups are interactive React components built to match the Logi Options+ dark palette.
Architecture
The Logi Plugin (C# / Actions SDK) is a thin event processing layer. It receives key presses and dial turns, sends them to the backend, and receives key image updates in response. It runs as a universal plugin with no foreground app requirement.
The SmolCreate Backend (Python / FastAPI) is the orchestration layer running on localhost. It exposes a REST API for the plugin and a WebSocket for dashboard updates. It routes AI calls, manages session state, and tracks costs. Text calls (prompt refinement via Claude, video analysis via Gemini) route through OpenRouter. Media generation (Imagen, Veo) calls Vertex AI directly. Local processing like masking, color extraction, and frame manipulation runs through rembg, colorthief, ffmpeg, and Pillow.
The Dashboard is a localhost web app with the prompt editor, image grid, video triage, cost tracker, and pipeline state visualization.
Key Design Decisions
Video Cost Gate. Video generation costs roughly 40x more than images. Before any video call, the dashboard shows the composed prompt, selected model, audio setting, and estimated cost. The Gen Video key changes color by cost tier (green under $1, yellow $1 to $3, red above $3). Generation requires explicit confirmation.
Quality Triage. After video generation, Gemini Flash analyzes the clip and categorizes the result into one of four types: prompt issue, start frame issue, partial salvage, or drift. Each category routes to a specific re-entry point in the pipeline, and the console keys update to reflect the recommended next action.
Clip Extension. Veo generates a maximum of 8 seconds per clip. SmolCreate chains clips by extracting the last frame of clip N as the start frame of clip N+1, composing a continuity-aware prompt, and concatenating the results.
Challenges
The main design challenge was fitting the pipeline intuitively onto a physical keypad. Every action on the console needs to have value over simple on-screen navigation. The notification key was added after realizing users need a simple way to switch active contexts to the window or app that needs attention.
The two-press interaction model also took some iteration. The dial needs to serve multiple purposes (parameter tuning before generation, result browsing after), and the key's status color needs to make it obvious which mode is active. The seamless context switching for what action the dial and key should take was important for user experience.
Session state management is more involved than it first appears. Every action in the pipeline changes what the next valid action is, and the console keys need to reflect that in real time. I had to carefully consider how to manage the current creative session in terms of keeping track of state, triggering the right visual feedback keys, routing the correct context around, and updating data in the dashboard.
What's Next
Phase 2 is implementation. The C# plugin layer, FastAPI backend, and dashboard are scoped. The quality triage system is the most interesting piece to build, since it requires Gemini Flash to diagnose why a generation failed and route to the correct pipeline re-entry point.
Built With
- applescript
- c#
- claudecode
- fastapi
- logiactionssdk
- logipluginservice
- mxcreativeconsole
- python
Log in or sign up for Devpost to join the conversation.