Any-Resolution AI-Generated Image Detection by Spectral Learning

Abstract

Recent works have established that AI models introduce spectral artifacts into generated images and propose approaches for learning to capture them using labeled data. However, the significant differences in such artifacts among different generative models hinder these approaches from generalizing to generators not seen during training. In this work, we build upon the key idea that the spectral distribution of real images constitutes both an invariant and highly discriminative pattern for AI-generated image detection. To model this under a self-supervised setup, we employ masked spectral learning using the pretext task of frequency reconstruction. Since generated images constitute out-of-distribution samples for this model, we propose spectral reconstruction similarity to capture this divergence. Moreover, we introduce spectral context attention, which enables our approach to efficiently capture subtle spectral inconsistencies in images of any resolution. Our spectral AI-generated image detection approach (SPAI) achieves a 5.5% absolute improvement in AUC over the previous state-of-the-art across 13 recent generative approaches, while exhibiting robustness against common online perturbations.

Key Contributions

Our work introduces several key innovations in AI-generated image detection:

Spectral Distribution as an Invariant Pattern: We establish that the spectral distribution of real images serves as a suitable invariant pattern for distinguishing between real and AI-generated images, as it is not directly affected by the introduction of specific generative models while providing significant discriminative power.
Masked Spectral Learning: We propose a self-supervised approach that employs the pretext task of frequency reconstruction to model the spectral distribution of real images, using only real images for training.
Spectral Reconstruction Similarity (SRS): We introduce a novel method for detecting AI-generated images as out-of-distribution samples by measuring the divergence between the reconstructed frequencies and those actually present in the image.
Spectral Context Attention (SCA): We develop a mechanism that enables efficient processing of images at any resolution without prior pre-processing, allowing our approach to capture subtle spectral inconsistencies in high-resolution images.
State-of-the-Art Performance: SPAI achieves a 5.5% absolute improvement in AUC over previous state-of-the-art across 13 recent generative approaches, while exhibiting robustness against common online perturbations.

Method

Masked Spectral Learning

To build a spectral model of real images, we propose using the pretext task of frequency reconstruction under a self-supervised learning setup, using only real images. We randomly mask the low- or high-frequency component of the input images and train the model under the objective of reconstructing the missing frequencies. This approach allows us to learn the spectral distribution of real images without requiring labeled AI-generated samples.

Spectral Reconstruction Similarity

As our model constitutes a spectral model of real images, it is expected to better reconstruct the missing frequencies of real images compared to AI-generated ones. We introduce Spectral Reconstruction Similarity (SRS) to measure this divergence. SRS calculates the similarity between the features of the original image and its low/high-frequency components, expecting larger distances for AI-generated images compared to real ones.

Spectral Context Attention

Capturing subtle clues in images is crucial for effectively distinguishing between real and AI-generated content. However, most computer vision models cannot efficiently scale to the native resolution of modern photos. Our Spectral Context Attention (SCA) enables processing of high-resolution images without resizing, combining the most discriminative spectral reconstruction similarity values from different patches according to their respective context. This allows our approach to efficiently capture subtle spectral inconsistencies in images of any resolution.

Results

Performance Comparison

We evaluated SPAI against 12 state-of-the-art methods across 13 different generative models, including early approaches like Stable Diffusion 1.3/1.4 and recent high-fidelity generators like Stable Diffusion 3, DALLE-3, and Midjourney v6.1. While competing methods often excel on some generators but fail catastrophically on others, SPAI consistently performs well across all tested generative models.

Approach	< 0.5 MPixels			0.5 - 1.0 MPixels						> 1.0 MPixels				AVG
Approach	Glide	SD1.3	SD1.4	Flux	DALLE2	SD2	SDXL	SD3	GigaGAN	MJv5	MJv6.1	DALLE3	Firefly	AVG
NPR	72.2	89.6	60.5	19.8	3.9	12.5	18.1	60.6	83.2	15.3	19.8	97.1	38.0	45.4
Dire	33.3	59.9	61.3	45.7	52.2	68.5	46.9	49.2	36.3	41.9	50.3	65.2	49.9	50.8
CNNDet.	59.2	59.0	61.2	39.8	71.5	57.5	67.4	30.2	73.4	48.8	56.7	23.5	73.4	55.5
FreqDet.	43.6	92.3	92.7	36.5	47.4	42.5	66.5	69.8	63.2	36.9	27.5	42.2	80.9	57.1
Fusing	63.0	62.8	62.2	57.5	76.7	66.9	62.1	38.8	80.4	64.0	74.0	25.2	76.3	62.3
LGrad	76.5	82.4	83.4	74.9	85.7	60.7	70.2	12.7	89.9	69.2	79.6	30.0	42.0	65.9
UnivFD	63.3	80.8	81.2	36.3	91.4	84.3	78.3	28.6	86.2	57.1	60.5	31.0	95.5	67.3
GramNet	78.2	83.9	84.3	78.6	85.2	66.7	77.8	19.2	85.0	63.8	84.9	42.9	38.0	68.4
DeFake	86.1	64.2	63.6	90.5	41.4	66.2	52.3	87.7	71.7	67.0	87.5	93.3	39.4	70.1
PatchCr.	78.4	95.7	96.2	86.9	81.8	95.7	96.7	33.8	98.0	79.0	96.1	28.1	79.1	80.4
DMID	73.1	100.0	100.0	97.2	54.3	99.7	99.6	67.9	67.9	99.9	94.4	41.3	90.2	83.5
RINE	95.6	99.9	99.9	93.0	93.0	96.6	99.3	39.1	92.9	96.4	81.2	41.8	82.9	85.5
SPAI (Ours)	90.2	99.6	99.6	83.0	91.1	96.5	97.4	75.9	85.4	94.5	84.0	90.2	96.0	91.0

As shown in the table above, SPAI achieves a 5.5% absolute improvement in average AUC over the previous state-of-the-art. While other methods may perform better on specific generators, SPAI consistently performs well across all tested models, demonstrating its superior generalization capability.

Qualitative Analysis

Our Spectral Context Attention mechanism correctly identifies problematic regions in AI-generated images, such as anatomical anomalies (e.g., six-fingered hands) and texture inconsistencies, as shown in the examples below:

Six-finger case correctly spotted by SPAI

Attending texture-rich regions — SPAI attending to texture-rich regions

Robustness Analysis

We evaluated the robustness of SPAI against common online perturbations that images typically undergo when shared on the internet. The results demonstrate that our approach maintains superior performance even when images are subjected to various types of degradation.

JPEG Compression

WebP Compression

Gaussian Blur

Gaussian Noise

Resize

The robustness evaluation shows that SPAI consistently outperforms competing methods across various types of perturbations, including JPEG and WebP compression, Gaussian blur, Gaussian noise, and image resizing. This demonstrates the practical applicability of our approach in real-world scenarios where images often undergo multiple transformations.

Examples

Flux - Detection score: 1.0

GigaGAN - Detection score: 1.0

Midjourney v6.1 - Detection score: 0.748

Stable Diffusion 3 - Detection score: 0.87

DALLE-2 - Detection score: 0.99

DALLE-3 - Detection score: 0.90

Firefly - Detection score: 1.0

Glide - Detection score: 1.0

Midjourney v5 - Detection score: 1.0

Stable Diffusion 1.3 - Detection score: 1.0

Stable Diffusion 1.4 - Detection score: 1.0

Stable Diffusion 2 - Detection score: 0.98

Stable Diffusion XL - Detection score: 1.0

COCO (Real) - Detection score: 0.0

FODB (Real) - Detection score: 0.0

ImageNet (Real) - Detection score: 0.0

Open Images (Real) - Detection score: 0.09

RAISE (Real) - Detection score: 0.0

Limitations

While SPAI exhibits superior generalization performance and robustness, it still faces challenges with derivative AI-generated images (screenshots, memes, photographs of screens, printed material). Compression algorithms and noisy digital/analog channels can corrupt the spectral information needed for detection. These limitations affect any detector relying solely on image signal properties and highlight potential future directions combining spectral learning with semantic context understanding.

BibTeX

@article{karageorgiou2025any,
  title={Any-Resolution AI-Generated Image Detection by Spectral Learning},
  author={Karageorgiou, Dimitrios and Papadopoulos, Symeon and Kompatsiaris, Ioannis and Gavves, Efstratios},
  journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}