LUSD: Localized Update Score Distillation for Text-Guided Image Editing

1VISTEC     2Siam Commercial Bank     3Faculty of Medicine Siriraj Hospital     4Pixiv
*Equal contributions

ICCV 2025

Input

+ drawing of a coffee cup

Output

Input

dog → Dark Soul boss

Output

Input

+ minion

Output

LUSD is a novel score distillation technique for object insertion and image editing tasks.
Based on Stable Diffusion 1.5, without any additional training, and using a single configuration.


Abstract

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

Main Idea

Method Overview.

Overview. Given an input image and a target prompt, we obtain gradient of the SBP loss and an attention-based mask. With spatial regularization, gradient filtering and normalization, we modify the image to match the prompt.

Key Ideas

  1. An implicit editing mask derived from attention features to reduce spatial variation of gradient magnitudes
  2. A normalization and thresholding mechanism to filter out "counterproductive" gradients with low standard deviation.

Goal and Challenge

Given an input source image and a target prompt describing how the image should be modified, our goal is to modify the image to match the prompt. Our method builds upon score distillation sampling with a simple L2 regularization. We address a key challenge: how to modulate variations in gradient magnitudes and their spatial distributions.

Challenge in Gradient Variations Standard Deviation of Gradient Magnitudes

Challenge: gradients from score distillation sampling needs modulation. Gradients vary with different timesteps, noises, prompts, images. Standard deviations of gradient magnitudes can filter less focused and "counterproductive" gradients.

In-the-wild Results

BibTeX

@misc{chinchuthakun2025lusd,
          title={LUSD: Localized Update Score Distillation for Text-Guided Image Editing},
          author={Worameth Chinchuthakun and Tossaporn Saengja and Nontawat Tritrong and Pitchaporn Rewatbowornwong
          and
          Pramook Khungurn and Supasorn Suwajanakorn},
          year={2025},
          eprint={2503.11054},
          archivePrefix={arXiv},
          primaryClass={cs.GR},
          url={https://arxiv.org/abs/2503.11054},
          }