This repository is the official implementation of FlowAlign, an inversion & training free image editing algorithm.
💡 Recent inversion-free, flow-based editors leverage models like Stable Diffusion 3 to enable text-driven image editing via ODE integration.
🤔 However, skipping latent inversion often leads to unstable trajectories and poor source consistency.
🚀 FlowAlign addresses this by introducing a flow-matching loss—a simple yet effective regularizer that ensures smooth, semantically aligned, and structurally consistent edits.
🌟 Thanks to its ODE-based formulation, FlowAlign naturally supports reverse editing, highlighting its reversible and robust transformation capability.
Clone this repo:
git clone https://github.com/FlowAlign/FlowAlign.git
cd FlowAlign
To install requirements:
conda create -n flowalign python==3.11
conda activate flowalign
pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
For the text-based image editing, run:
Examples 1
python run_edit.py \
--img_path "samples/bicycle.jpg" \
--src_prompt "a slanted mountain bicycle on the road in front of a building" \
--tgt_prompt "a slanted rusty mountain bicycle on the road in front of a building"
The expected result:
Example 2
python run_edit.py \
--img_path "samples/cat.jpg" \
--src_prompt "a opened eyes cat sitting on wooden floor" \
--tgt_prompt "a closed eyes cat sitting on wooden floor"
The expected result:
You can freely change the editing method using arguments:
method
: dual / sdedit / flowedit / flowalign
If you use --efficient_memory
, text encoder will pre-compute text embeddings and is removed from the GPU.
This allows us to run image editing with a single GPU with VRAM 24GB.
All edited images were generated on a single NVIDIA RTX 3090 GPU, using a fixed random seed of 123 and a Classifier-Free Guidance (CFG) scale of 13.5.