AI explainers

Can AI detectors be fooled?

Updated 2026-05-105 min read

The honest answer is yes, AI detectors can be fooled — both deliberately and accidentally. Understanding the patterns matters because it changes how much weight a clean detection score should carry.

Pattern 1: Recompression

Saving an AI image as JPEG, screenshotting it, posting it to a platform that recompresses again — each step strips signal. By the time the image circulates widely, most detectors will struggle. This is unintentional evasion: the platform compression does the work, not the user.

Pattern 2: Adversarial perturbation

Tools exist that add tiny pixel-level noise to AI images, designed to push them across the decision boundary of common detectors. The image looks identical to a human but scores as 'likely authentic'. Adversarial perturbation is the cat-and-mouse frontier of the field.

Pattern 3: Humanization

For text: paraphrasing tools rewrite LLM output to defeat detectors trained on the original style. For images: pipelines that re-render an AI image through another model (style transfer, super-resolution) shift the signal patterns. For voice: adding ambient noise or compression to clones reduces clone-marker visibility.

Pattern 4: Provenance attacks

Stripping or forging C2PA-style content credentials. Real photos can have their credentials removed (looks suspicious to a credentials-aware verifier but often passes a detector). AI images can have forged credentials added (defeats provenance-only checks). Detectors and provenance need to work together.

What this implies

A 'likely synthetic' result is meaningful evidence, since it is hard to fake in that direction without re-running the generation. A 'likely authentic' result on an unknown source is much weaker evidence, since multiple plausible attacks could produce it.

The practical workflow remains: detection plus provenance plus source verification. No single layer is enough.

Try the tool