Any-Resolution AI-Generated Image Detection by Spectral Learning

Dimitrios Karageorgiou1,2, Symeon Papadopoulos1, Ioannis Kompatsiaris1, Efstratios Gavves2,3

1Information Technologies Institute, CERTH, Greece

2University of Amsterdam, The Netherlands
3Archimedes/Athena RC, Greece

Accepted on CVPR 2025

Abstract

Recent works have established that AI models introduce spectral artifacts into generated images and propose approaches for learning to capture them using labeled data. However, the significant differences in such artifacts among different generative models hinder these approaches from generalizing to generators not seen during training. In this work, we build upon the key idea that the spectral distribution of real images constitutes both an invariant and highly discriminative pattern for AI-generated image detection. To model this under a self-supervised setup, we employ masked spectral learning using the pretext task of frequency reconstruction. Since generated images constitute out-of-distribution samples for this model, we propose spectral reconstruction similarity to capture this divergence. Moreover, we introduce spectral context attention, which enables our approach to efficiently capture subtle spectral inconsistencies in images of any resolution. Our spectral AI-generated image detection approach (SPAI) achieves a 5.5% absolute improvement in AUC over the previous state-of-the-art across 13 recent generative approaches, while exhibiting robustness against common online perturbations.

Key Contributions

SPAI Highlights

Our work introduces several key innovations in AI-generated image detection:

  1. Spectral Distribution as an Invariant Pattern: We establish that the spectral distribution of real images serves as a suitable invariant pattern for distinguishing between real and AI-generated images, as it is not directly affected by the introduction of specific generative models while providing significant discriminative power.
  2. Masked Spectral Learning: We propose a self-supervised approach that employs the pretext task of frequency reconstruction to model the spectral distribution of real images, using only real images for training.
  3. Spectral Reconstruction Similarity (SRS): We introduce a novel method for detecting AI-generated images as out-of-distribution samples by measuring the divergence between the reconstructed frequencies and those actually present in the image.
  4. Spectral Context Attention (SCA): We develop a mechanism that enables efficient processing of images at any resolution without prior pre-processing, allowing our approach to capture subtle spectral inconsistencies in high-resolution images.
  5. State-of-the-Art Performance: SPAI achieves a 5.5% absolute improvement in AUC over previous state-of-the-art across 13 recent generative approaches, while exhibiting robustness against common online perturbations.

Method

SPAI Method Overview

Masked Spectral Learning

To build a spectral model of real images, we propose using the pretext task of frequency reconstruction under a self-supervised learning setup, using only real images. We randomly mask the low- or high-frequency component of the input images and train the model under the objective of reconstructing the missing frequencies. This approach allows us to learn the spectral distribution of real images without requiring labeled AI-generated samples.

Spectral Reconstruction Similarity

As our model constitutes a spectral model of real images, it is expected to better reconstruct the missing frequencies of real images compared to AI-generated ones. We introduce Spectral Reconstruction Similarity (SRS) to measure this divergence. SRS calculates the similarity between the features of the original image and its low/high-frequency components, expecting larger distances for AI-generated images compared to real ones.

Spectral Context Attention

Capturing subtle clues in images is crucial for effectively distinguishing between real and AI-generated content. However, most computer vision models cannot efficiently scale to the native resolution of modern photos. Our Spectral Context Attention (SCA) enables processing of high-resolution images without resizing, combining the most discriminative spectral reconstruction similarity values from different patches according to their respective context. This allows our approach to efficiently capture subtle spectral inconsistencies in images of any resolution.

Results

Performance Comparison

We evaluated SPAI against 12 state-of-the-art methods across 13 different generative models, including early approaches like Stable Diffusion 1.3/1.4 and recent high-fidelity generators like Stable Diffusion 3, DALLE-3, and Midjourney v6.1. While competing methods often excel on some generators but fail catastrophically on others, SPAI consistently performs well across all tested generative models.

Approach < 0.5 MPixels 0.5 - 1.0 MPixels > 1.0 MPixels AVG
Glide SD1.3 SD1.4 Flux DALLE2 SD2 SDXL SD3 GigaGAN MJv5 MJv6.1 DALLE3 Firefly
NPR 72.2 89.6 60.5 19.8 3.9 12.5 18.1 60.6 83.2 15.3 19.8 97.1 38.0 45.4
Dire 33.3 59.9 61.3 45.7 52.2 68.5 46.9 49.2 36.3 41.9 50.3 65.2 49.9 50.8
CNNDet. 59.2 59.0 61.2 39.8 71.5 57.5 67.4 30.2 73.4 48.8 56.7 23.5 73.4 55.5
FreqDet. 43.6 92.3 92.7 36.5 47.4 42.5 66.5 69.8 63.2 36.9 27.5 42.2 80.9 57.1
Fusing 63.0 62.8 62.2 57.5 76.7 66.9 62.1 38.8 80.4 64.0 74.0 25.2 76.3 62.3
LGrad 76.5 82.4 83.4 74.9 85.7 60.7 70.2 12.7 89.9 69.2 79.6 30.0 42.0 65.9
UnivFD 63.3 80.8 81.2 36.3 91.4 84.3 78.3 28.6 86.2 57.1 60.5 31.0 95.5 67.3
GramNet 78.2 83.9 84.3 78.6 85.2 66.7 77.8 19.2 85.0 63.8 84.9 42.9 38.0 68.4
DeFake 86.1 64.2 63.6 90.5 41.4 66.2 52.3 87.7 71.7 67.0 87.5 93.3 39.4 70.1
PatchCr. 78.4 95.7 96.2 86.9 81.8 95.7 96.7 33.8 98.0 79.0 96.1 28.1 79.1 80.4
DMID 73.1 100.0 100.0 97.2 54.3 99.7 99.6 67.9 67.9 99.9 94.4 41.3 90.2 83.5
RINE 95.6 99.9 99.9 93.0 93.0 96.6 99.3 39.1 92.9 96.4 81.2 41.8 82.9 85.5
SPAI (Ours) 90.2 99.6 99.6 83.0 91.1 96.5 97.4 75.9 85.4 94.5 84.0 90.2 96.0 91.0

As shown in the table above, SPAI achieves a 5.5% absolute improvement in average AUC over the previous state-of-the-art. While other methods may perform better on specific generators, SPAI consistently performs well across all tested models, demonstrating its superior generalization capability.

Qualitative Analysis

Our Spectral Context Attention mechanism correctly identifies problematic regions in AI-generated images, such as anatomical anomalies (e.g., six-fingered hands) and texture inconsistencies, as shown in the examples below:

Six-finger case correctly spotted
Six-finger case correctly spotted by SPAI
Attending texture-rich regions
SPAI attending to texture-rich regions

Robustness Analysis

We evaluated the robustness of SPAI against common online perturbations that images typically undergo when shared on the internet. The results demonstrate that our approach maintains superior performance even when images are subjected to various types of degradation.

Robustness Analysis Legend

JPEG Compression

JPEG Compression Robustness

WebP Compression

WebP Compression Robustness

Gaussian Blur

Gaussian Blur Robustness

Gaussian Noise

Gaussian Noise Robustness

Resize

Resize Robustness

The robustness evaluation shows that SPAI consistently outperforms competing methods across various types of perturbations, including JPEG and WebP compression, Gaussian blur, Gaussian noise, and image resizing. This demonstrates the practical applicability of our approach in real-world scenarios where images often undergo multiple transformations.

Examples

Limitations

While SPAI exhibits superior generalization performance and robustness, it still faces challenges with derivative AI-generated images (screenshots, memes, photographs of screens, printed material). Compression algorithms and noisy digital/analog channels can corrupt the spectral information needed for detection. These limitations affect any detector relying solely on image signal properties and highlight potential future directions combining spectral learning with semantic context understanding.

BibTeX

@article{karageorgiou2025any,
  title={Any-Resolution AI-Generated Image Detection by Spectral Learning},
  author={Karageorgiou, Dimitrios and Papadopoulos, Symeon and Kompatsiaris, Ioannis and Gavves, Efstratios},
  journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}