.
├── configs/ # configuration files for distinct operational setups.
├── data/ # dataset definitions and data loading pipelines.
├── methods/ # core algorithm implementations and logic.
├── scripts/ # shell scripts for execution of training and sampling.
├── services/ # auxiliary utilities and shared tooling services.
└── steerers/ # primary control flows for training and sampling.TwinFlow-trained model is an any-step model, which supports few-step, any-step, and multi-step sampling at the same time. To achieve this, there are two timestep conditions: timestep and target timestep.
During training, there are key configurations to control the target timestep distribution:
TwinFlow/src/methodes/twinflow/twinflow.py
Line 132 in f123185
Common practices:
-
probs = {"e2e": 1, "mul": 1, "any": 1, "adv": 1}: TwinFlow training -
probs = {"e2e": 0, "mul": 0, "any": 1, "adv": 0}: RCGM training (in theory) -
probs = {"e2e": 1, "mul": 1, "any": 1, "adv": 0}: RCGM training (in practice)- In this case, you need comment these lines to run correctly:
TwinFlow/src/methodes/twinflow/twinflow.py
Lines 358 to 365 in f123185
- In this case, you need comment these lines to run correctly:
-
probs = {"e2e": 0, "mul": 1, "any": 0, "adv": 0}: Flow Matching training- In this case, also set
consistc_ratio,enhanced_ratio,estimate_orderto 0
- In this case, also set
To deploy TwinFlow on OpenUni, please follow the procedure outlined below.
Begin by setting up the environment prerequisites as detailed in the official OpenUni repository. Ensure all dependencies are correctly installed before proceeding.
- Generator Backbone
Download the OpenUni generator backbone checkpoint locally. Once downloaded, update the configuration file configs/openuni_task/openuni_full.yaml to reflect the local path:
model:
type: ./networks/openuni/openuni_l_internvl3_2b_sana_1_6b_512_hf.py
path: path/to/openuni_l_internvl3_2b_sana_1_6b_512_hf_blip3o60k.pth
in_chans: 16- Other Components
- OpenUni Encoder: InternVL3-2B
- SANA 1.6B: Sana_1600M_512px_diffusers
- DC-AE: dc-ae-f32c32-sana-1.1-diffusers
Prior to initiating training, define the environment variables pointing to your downloaded component models. You may modify scripts/openuni/train_ddp.sh directly or export them in your script:
export INTERNVL3_PATH="path/to/InternVL3-2B"
export SANA_1600M_512PX_PATH="path/to/Sana_1600M_512px_diffusers"
export DCAE_PATH="path/to/dc-ae-f32c32-sana-1.1-diffusers"- Standard Training (TwinFlow on OpenUni):
scripts/openuni/train_ddp.sh configs/openuni_task/openuni_full.yaml- Data-Free Training (No Text-Image Pairs Required):
scripts/openuni/train_ddp.sh configs/openuni_task/openuni_full_imgfree.yamlTo directly run sampling:
scripts/openuni/sample_demo.sh configs/openuni_task/openuni_full.yamlAfter training, the trained model supports 3 sampling modes: few-step, any-step, and standard multi-step. We provide different sampling configurations for each mode in the yaml for reference:
# few-step sampling
sample:
ckpt: "700" # <- change to the ckpt
cfg_scale: 0
cfg_interval: [0.00, 0.00]
sampling_steps: 2 # 1
stochast_ratio: 1.0 # 0.8
extrapol_ratio: 0.0
sampling_order: 1
time_dist_ctrl: [1.0, 1.0, 1.0]
rfba_gap_steps: [0.001, 0.7]
sampling_style: few
# any-step sampling
sample:
ckpt: "700"
cfg_scale: 0
cfg_interval: [0.00, 0.00]
sampling_steps: 4 # 8
stochast_ratio: 0.0
extrapol_ratio: 0.0
sampling_order: 1
time_dist_ctrl: [1.0, 1.0, 1.0]
rfba_gap_steps: [0.001, 0.5]
sampling_style: any
# multi-step sampling
sample:
ckpt: "700"
cfg_scale: 0
cfg_interval: [0.00, 0.00]
sampling_steps: 30
stochast_ratio: 0.0
extrapol_ratio: 0.0
sampling_order: 1
time_dist_ctrl: [1.17, 0.8, 1.1]
rfba_gap_steps: [0.001, 0.0]
sampling_style: mulNote
When using 1-order training, any to NOT required, one is required
When using 2-order training, any is required, one is optional
TwinFlow/src/methodes/twinflow/twinflow_lora.py
Lines 134 to 136 in cbfdbae
- LoRA Training (TwinFlow on OpenUni), you need to comment L52 and use original transformer in L51:
# 1 order
scripts/openuni/train_ddp_lora.sh configs/openuni_task/openuni_lora_1order.yaml
# 2 order
scripts/openuni/train_ddp_lora.sh configs/openuni_task/openuni_lora_2order.yaml- LoRA Training (TwinFlow on SD3.5-M):
# 1 order
scripts/sd3/train_ddp_lora.sh configs/sd_task/sd35_lora_1order.yaml
# 2 order
scripts/sd3/train_ddp_lora.sh configs/sd_task/sd35_lora_2order.yaml- Full Training (TwinFlow on QwenImage):
scripts/qwenimage/train_fsdp.sh configs/qwenimage_task/qwenimage_full.yaml- LoRA Training (TwinFlow on QwenImage):
Note
When using 1-order training, any to NOT required, one is required
When using 2-order training, any is required, one is optional
TwinFlow/src/methodes/twinflow/twinflow_lora.py
Lines 134 to 136 in cbfdbae
# 1 order
scripts/qwenimage/train_ddp_lora.sh configs/qwenimage_task/qwenimage_lora_1order.yaml
# 2 order
scripts/qwenimage/train_ddp_lora.sh configs/qwenimage_task/qwenimage_lora_2order.yamlNote
Due to memory limit, we did not add separate EMA for QwenImage, thus, we set ema_decay_rate: 0 in the config. Users could switch FSDP2, which could fully support this.
TwinFlow/src/steerers/qwenimage/sft_fsdp.py
Lines 132 to 136 in ab5808d
Note
Switch to LoRA training is easy, we suggest you to refer to src/steerers/stable_diffusion_3/sft_ddp_lora.py to add LoRA, and set the method config like configs/sd_task/sd35_lora_1order.yaml or configs/sd_task/sd35_lora_2order.yaml
Note
Current modeling supports single reference image input for editing, you need to modify minor code to support edit training.
- In the config, change
QwenImagetoQwenImageEdit. - Modify the dataloader to support loading the reference image, e.g.
text, image, control_image = batch["text"], batch["image"].cuda(), batch["control_img"].cuda() - Prepare inputs like:
with torch.no_grad():
(
prompt_embeds,
prompt_embeds_mask,
uncond_prompt_embeds,
uncond_prompt_embeds_mask,
) = model.encode_prompt(text, control_image, do_cfg=True)
prompt_embeds = prompt_embeds.to(torch.float32)
uncond_prompt_embeds = uncond_prompt_embeds.to(torch.float32)
latents = model.pixels_to_latents(image).to(torch.float32)
control_latents = model.pixels_to_latents(control_image).to(torch.float32)- Pass the inputs to
training_steplike:
loss = method.training_step(
model,
latents,
c=[prompt_embeds, prompt_embeds_mask, control_latents],
e=[uncond_prompt_embeds, uncond_prompt_embeds_mask, control_latents],
step=(global_step - 1),
v=None
)