SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting

Abstract

Recent advancements in generative AI have made text-guided image inpainting—adding, removing, or altering image regions using textual prompts—widely accessible. However, generating semantically correct photorealistic imagery, typically requires carefully-crafted prompts and iterative refinement by evaluating the realism of the generated content - tasks commonly performed by humans. To automate the generative process, we propose Semantically Aligned and Uncertainty Guided AI Image Inpainting (SAGI), a model-agnostic pipeline, to sample prompts from a distribution that closely aligns with human perception and to evaluate the generated content and discard those that deviate from such a distribution, which we approximate using pretrained Large Language Models and Vision-Language Models. By applying this pipeline on multiple state-of-the-art inpainting models, we create the SAGI Dataset (SAGI-D), currently the largest and most diverse dataset of AI-generated inpaintings, comprising over 95k inpainted images and a human-evaluated subset. Our experiments show that semantic alignment significantly improves image quality and aesthetics, while uncertainty guidance effectively identifies realistic manipulations — human ability to distinguish inpainted images from real ones drops from 74% to 35% in terms of accuracy, after applying our pipeline. Moreover, using SAGI-D for training several image forensic approaches increases in-domain detection performance on average by 37.4% and out-of-domain generalization by 26.1% in terms of IoU, also demonstrating its utility in countering malicious exploitation of generative AI.

Methodology

We propose SAGI, a unified framework integrating two main components:

Semantically Aligned Object Replacement (SAOR): Leverages image semantics and LLMs to create context-aware prompts, producing higher aesthetic quality than simple object labels and captions.
Uncertainty Guided Deceptiveness Assessment (UGDA): A realism assessment method that uses vision-language models to compare inpainted images with originals, identifying convincing manipulations.

Using the proposed pipeline, we introduce SAGI-D the first semantically aligned deceptive dataset for AI-generated inpainting detection, designed to evaluate the effectiveness of our components. The dataset leverages Semantically Aligned Object Replacement (SAOR) for context-aware prompt generation and Uncertainty-Guided Deceptiveness Assessment (UGDA) for realism evaluation, ensuring high-quality and diverse inpainted images.

SAGI Pipeline Overview: Semantically Aligned Object Replacement (top) and Uncertainty Guided Deceptiveness Assessment (bottom)

SAGI-D Examples

The following examples showcase the high-quality inpainted images generated using our SAGI pipeline across different state-of-the-art inpainting models. Each pair shows the original image (left) and the inpainted result (right), along with the semantically aligned prompt used for generation.

BrushNet

"a juicy orange to add a vibrant pop of color to the composition"

"a majestic snow-capped mountain to create a scenic landscape"

PowerPaint

"a cozy blanket and fluffy pillows to complete the bedroom scene"

"a grand marble fountain surrounded by lush greenery"

HD-Painter

"a playful otter swimming in the river stream"

"a cluster of small red berries growing in the grass"

ControlNet

"a fresh, delicious sandwich to complete the meal"

"a clear blue sky to enhance the mountain landscape"

Inpaint-Anything

"a delicious cheeseburger to make the meal even more tempting"

"a lush green meadow, adding a touch of nature to the serene landscape"

Remove-Anything

BibTeX

        
@misc{giakoumoglou2025sagisemanticallyaligneduncertainty,
      title={SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting}, 
      author={Paschalis Giakoumoglou and Dimitrios Karageorgiou and Symeon Papadopoulos and Panagiotis C. Petrantonakis},
      year={2025},
      eprint={2502.06593},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.06593}, 
}