AI Photoshop

A natural language image editing pipeline with full Photoshop-equivalent capabilities and 3D object manipulation — powered by Claude (intent parsing), SAM2 (segmentation), Depth Anything V2 (depth estimation), Zero123++ / TripoSR (3D novel view synthesis), and FLUX.1-dev (diffusion).

You describe what you want in plain English. The system figures out which model to use.

ps = AiPhotoshop("photo.jpg")

ps.edit("boost contrast 20, add warm white balance")   # instant, no GPU
ps.edit("rotate the chair 45 degrees to the left")     # 3D object edit, ~30s
ps.edit("remove the person on the right")              # FLUX inpaint, ~20s
ps.edit("set blend mode to soft light, opacity 60%")   # layer op, instant

ps.save("output.jpg")

⚠️ FLUX.1-dev license notice This pipeline uses FLUX.1-dev for inpainting and generative edits. FLUX.1-dev is released under a non-commercial license — it may not be used for commercial purposes without a separate agreement with Black Forest Labs. You must accept the license at huggingface.co/black-forest-labs/FLUX.1-dev before downloading weights. All other models in this stack (SAM2, Depth Anything V2, Zero123++, TripoSR, Grounding DINO) are Apache 2.0 or MIT.

Architecture

Every prompt goes through an LLM intent parser (Claude) that extracts a structured operation — op type, target object, and parameters — then routes to the cheapest backend that can do the job correctly.

prompt
  └─► Claude intent parser → {op_type, operation, target_object, params}
        ├─ whole_image_geometric  → OpenCV              (instant, no GPU)
        ├─ object_2d_geometric    → SAM2 + OpenCV + FLUX (~10s)
        ├─ object_3d_geometric    → SAM2 + Depth + Zero123++/TripoSR + FLUX (~30–60s)
        ├─ adjustment             → NumPy / PIL          (instant, no GPU)
        ├─ filter                 → OpenCV / PIL         (instant, no GPU)
        ├─ generative/semantic    → FLUX.1-dev           (~20s)
        └─ layer_operation        → LayerStack           (instant)

Core design principle: geometric math is never delegated to a diffusion model. Rotation, scaling, and translation use OpenCV for exact pixel-level results. Diffusion only handles what it's actually good at — synthesizing novel views, filling inpainted holes, and semantic edits.

3D object editing pipeline

Input image + "rotate the chair 45 degrees left"
  │
  ├─ SAM2 (Grounding DINO → bounding box → precise mask)
  ├─ Depth Anything V2 (metric depth map for scene context)
  │
  ├─ [fast]    Zero123++ — diffusion novel view synthesis at target (pitch, yaw, roll)
  └─ [quality] TripoSR   — 3D mesh reconstruction → render from new camera angle
  │
  ├─ FLUX.1 inpaint — fill the background hole left by the moved object
  └─ Alpha composite — place rotated object onto inpainted background

Use zero123 (default, ~30s) for speed, triposr (~60s) for mesh-accurate results:

ps.edit("rotate the sculpture 60 degrees yaw, use triposr for quality")

Example outputs

Place your own before/after images in assets/ after running the examples. See assets/README.md for instructions.

Operation	Before	After
3D yaw rotation	`assets/demo_3d_before.jpg`	`assets/demo_3d_after.jpg`
Object removal	`assets/demo_remove_before.jpg`	`assets/demo_remove_after.jpg`

To generate your own:

# 3D rotation demo
python examples/3d_editing.py --input photo.jpg --object "the chair" --output_dir outputs/3d/

# Retouching and filters
python examples/basic_edits.py --input photo.jpg --output_dir outputs/basic/

# Layer workflow
python examples/layer_workflow.py --input photo.jpg --output_dir outputs/layers/

Capabilities

Geometric transforms

Prompt	Backend	Speed
"rotate 15 degrees clockwise"	OpenCV	instant
"flip horizontally"	OpenCV	instant
"zoom in 1.5x"	OpenCV	instant
"crop to center square"	OpenCV	instant
"perspective correct the sign"	OpenCV	instant
"flip the car horizontally"	SAM2 + OpenCV + FLUX	~10s
"rotate the chair 45 degrees left"	SAM2 + Depth + Zero123++ + FLUX	~30s
"rotate the sculpture 60° left, high quality"	SAM2 + Depth + TripoSR + FLUX	~60s

Tonal adjustments (instant, no GPU)

brightness, contrast, sharpness
curves — per-channel (R, G, B, or composite), arbitrary control points
levels — black point, white point, gamma, output range
hsl — hue shift, saturation, lightness
vibrance — saturation boost weighted toward less-saturated pixels
white_balance — temperature and tint
shadows_highlights — recover shadow/highlight detail independently
vignette — strength, feather, midpoint

Filters (instant, no GPU)

gaussian_blur, lens_blur, motion_blur
bilateral_denoise — edge-preserving noise reduction
unsharp_mask — amount, radius, threshold
grain — simulated film grain
emboss, posterize, pixelate

Generative operations (FLUX.1-dev, ~20s)

inpaint — fill a region described in natural language
remove_object — erase and fill background
replace_background — swap the entire background
add_object — add a new element to the scene
text_to_image — generate from scratch
style_transfer, change_season, change_weather, colorize, age

Layer system

Named layers with per-layer opacity and visibility
Blend modes: Normal, Multiply, Screen, Overlay, Soft Light, Hard Light, Darken, Lighten, Difference, Hue, Saturation, Luminosity
Per-layer masks (uint8 alpha)
Duplicate, delete, reorder, merge visible, flatten

Installation

Requirements

Python 3.10+
CUDA GPU with 16 GB+ VRAM recommended (24 GB for TripoSR + FLUX simultaneously)
An Anthropic API key for intent parsing

1 — Clone and install core dependencies

git clone https://github.com/yourname/ai-photoshop.git
cd ai-photoshop
pip install -r requirements.txt

Or install as a package:

pip install -e .

2 — SAM2

pip install git+https://github.com/facebookresearch/segment-anything-2.git

# Download weights
wget -P weights/sam2 https://dl.fbaipublicfiles.com/segment_anything_2/sam2_hiera_large.pt
wget -P weights/sam2 https://dl.fbaipublicfiles.com/segment_anything_2/sam2_hiera_large.yaml

3 — Depth Anything V2

pip install depth-anything-v2

wget -P weights/depth \
  https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth

4 — FLUX.1-dev

⚠️ Accept the non-commercial license at huggingface.co/black-forest-labs/FLUX.1-dev first.

huggingface-cli login    # authenticate — weights download automatically on first use (~24 GB)

5 — TripoSR (optional, for high-quality 3D)

pip install git+https://github.com/VAST-AI-Research/TripoSR.git

6 — API key

export ANTHROPIC_API_KEY=sk-ant-...

See weights/README.md for full download instructions, all model sizes, and licensing details.

Usage

Basic session

from ai_photoshop_pipeline import AiPhotoshop

ps = AiPhotoshop("photo.jpg")

# Instant — no GPU
ps.edit("increase brightness by 10")
ps.edit("boost contrast 25")
ps.edit("shift hue +15 degrees, saturation +20")
ps.edit("add subtle film grain")
ps.edit("soft vignette, strength 0.4")

ps.save("output.jpg")

3D object manipulation

# Fast (Zero123++ novel view synthesis, ~30s)
ps.edit("rotate the chair 45 degrees to the left")
ps.edit("tilt the bottle 20 degrees forward")

# High quality (TripoSR mesh, ~60s)
ps.edit("rotate the sculpture 60 degrees left, use triposr for quality")

Object-aware 2D geometry

ps.edit("flip the car horizontally")
ps.edit("scale the dog up by 30%")
ps.edit("move the coffee cup 100 pixels to the right")

Generative edits

ps.edit("remove the person standing on the right")
ps.edit("replace the sky with a dramatic stormy sky")
ps.edit("make it look like a watercolor painting")

Layer workflow

ps.edit("duplicate this layer")
ps.edit("gaussian blur sigma 10")
ps.edit("set blend mode to screen")
ps.edit("set opacity 40%")

ps.show_layers()
# ID         Name                 Blend          Opacity  Visible
# ▶a3f2      Layer copy           screen         40%      True
#  b7c1      Background           normal         100%     True

ps.edit("merge visible layers")
ps.save("final.jpg")

Repo structure

ai-photoshop/
├── ai_photoshop_pipeline.py   — main pipeline (all engines in one file)
├── requirements.txt           — pip dependencies
├── pyproject.toml             — packaging metadata
├── .gitignore
├── examples/
│   ├── basic_edits.py         — adjustments, filters, 2D geometry (no GPU)
│   ├── 3d_editing.py          — 3D rotation and object-aware 2D transforms
│   └── layer_workflow.py      — blend modes, opacity, non-destructive editing
├── weights/
│   ├── README.md              — download instructions for all models
│   ├── sam2/                  — SAM2 weights (manual download)
│   └── depth/                 — Depth Anything V2 weights (manual download)
└── assets/
    ├── pipeline_diagram.svg   — routing architecture (shown in this README)
    └── README.md              — instructions for adding before/after images

Engine routing table

`op_type`	Backend	GPU	Approx. time
`whole_image_geometric`	GeometricEngine (OpenCV)	No	<1s
`object_2d_geometric`	SegmentationEngine + GeometricEngine + DiffusionEngine	Yes	~10s
`object_3d_geometric`	SegmentationEngine + ThreeDEngine (Zero123++ or TripoSR)	Yes	~30–60s
`adjustment`	AdjustmentEngine (NumPy/PIL)	No	<1s
`filter`	FilterEngine (OpenCV/PIL)	No	<1s
`generative`	DiffusionEngine (FLUX.1-dev)	Yes	~20s
`semantic_edit`	DiffusionEngine (FLUX.1-dev img2img)	Yes	~20s
`layer_operation`	LayerStack	No	<1s

All models are lazily loaded — if you only use adjustments and 2D geometry, no diffusion or segmentation weights are ever loaded into memory.

Related work

GeoDiffuser (WACV 2025) — geometric transforms baked into diffusion attention layers
FreeFine (2025) — decoupled geometric editing pipeline with GeoBench
GeoEdit (2026) — Effects-Sensitive Attention for lighting-aware geometric edits
DiT4Edit (2024) — first DiT-based image editing framework

The key difference from those papers: this pipeline never asks a diffusion model to perform geometric math. Affine transforms are always OpenCV (exact, deterministic, sub-millisecond). Diffusion handles only what it's actually good at.

Model credits

Model	Authors	License	Purpose
FLUX.1-dev	Black Forest Labs	Non-commercial	Inpainting, generative, style transfer
SAM2	Meta AI	Apache 2.0	Object segmentation
Grounding DINO	IDEA Research	Apache 2.0	Text-to-bounding-box detection
Depth Anything V2	HK University	Apache 2.0	Monocular depth estimation
Zero123++	Sudo AI	Apache 2.0	Novel view synthesis (fast 3D)
TripoSR	Stability AI + VAST AI	MIT	3D mesh reconstruction
Claude	Anthropic	Commercial API	Intent parsing

License

This project is released under the MIT License.

Model weights carry their own licenses — see the table above. FLUX.1-dev in particular is non-commercial only. If you need a commercial deployment, either replace the FLUX components with a commercially licensed model (e.g. FLUX.1-schnell, Stable Diffusion 3.5) or obtain a license from Black Forest Labs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
mnt/user-data/outputs/ai-photoshop-repo/weights		mnt/user-data/outputs/ai-photoshop-repo/weights
.DS_Store		.DS_Store
3d_editing.py		3d_editing.py
README.md		README.md
ai_photoshop_pipeline.py		ai_photoshop_pipeline.py
basic_edits.py		basic_edits.py
layer_workflow.py		layer_workflow.py
pipeline_diagram.svg		pipeline_diagram.svg
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AI Photoshop

Architecture

3D object editing pipeline

Example outputs

Capabilities

Geometric transforms

Tonal adjustments (instant, no GPU)

Filters (instant, no GPU)

Generative operations (FLUX.1-dev, ~20s)

Layer system

Installation

Requirements

1 — Clone and install core dependencies

2 — SAM2

3 — Depth Anything V2

4 — FLUX.1-dev

5 — TripoSR (optional, for high-quality 3D)

6 — API key

Usage

Basic session

3D object manipulation

Object-aware 2D geometry

Generative edits

Layer workflow

Repo structure

Engine routing table

Related work

Model credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages