Glossary

AI detection glossary.

Plain-language definitions for the terms that come up across AI detection, synthetic media, and content provenance — with links into the relevant tools and longer reading where useful.

Deepfake

Short. Media in which a person's face, voice, or full body is synthetically generated or transferred without their performance.

The umbrella term for face-swap video, voice-clone audio, and full-body synthesis where the depicted person did not perform the action shown. The 'deep' refers to the deep neural networks that made the technique accessible starting in the late 2010s.

Diffusion model

Short. A type of generative AI that creates images by reversing a noise process, used by Stable Diffusion, Midjourney, and DALL·E.

Diffusion models learn by destroying training images with noise, then learning to reverse the process. At inference they start from random noise and 'denoise' toward an image matching a text prompt. The architecture leaves characteristic frequency-domain residue that detectors can pick up.

GAN (Generative Adversarial Network)

Short. An older generative architecture that pits a generator against a discriminator. Dominant before diffusion models took over for images.

Two networks train against each other: the generator produces synthetic outputs, the discriminator tries to tell them from real samples. GANs powered StyleGAN-era 'this person does not exist' portraits. Now largely superseded by diffusion for images, still used in some voice and video pipelines.

Voice clone

Short. Synthesized speech generated to sound like a specific real person, usually trained on a few minutes of reference audio.

Modern voice-clone systems can produce convincing impersonations from very short audio samples. Detection focuses on prosody flatness, breath-gap absence, and frequency-band quirks. Used in CEO-fraud calls, fake panic calls to relatives, and unauthorized celebrity voice content.

Synthetic media

Short. Any image, video, audio, or text generated or substantially altered by an AI model.

The neutral umbrella term covering AI-generated images, deepfakes, voice clones, AI text, and AI music. Useful when a discussion needs to span content types without committing to a specific technique.

C2PA

Short. Coalition for Content Provenance and Authenticity — an open technical standard for cryptographically signing the origin and edit history of media.

C2PA produces 'content credentials' attached to media: who captured or generated it, what edits were applied, and a cryptographic signature. Adoption is uneven but growing in cameras, image editors, and major model providers. Provenance complements detection — proving authenticity directly rather than inferring it from signals.

Content provenance

Short. Documentation of where a piece of media came from and how it was modified along the way.

Provenance asks 'where did this come from?' Detection asks 'does this look generated?' A complete trust workflow uses both: provenance gives positive evidence of origin; detection gives negative evidence about manipulation.

Watermarking

Short. Embedding an imperceptible marker in generated media so the model's output can be identified later.

Some model providers embed statistical watermarks in images, audio, or text they generate. In principle this lets verifiers detect that content came from a specific model. In practice, watermarks can often be stripped by re-saving, paraphrasing, or re-encoding.

Face-swap

Short. A specific type of deepfake where one person's face replaces another's in existing video footage.

The most common deepfake variant. Tools range from research-grade pipelines to consumer apps. Detection focuses on the boundary between the swapped face and the surrounding scene — lighting mismatch, edge flickering, and ear/jaw shape inconsistency.

How to detect deepfake videos →

Prompt injection

Short. An attack on language-model systems where instructions hidden in input data hijack model behavior.

Outside the scope of media detection but increasingly relevant where detectors themselves use language models. A document or web page can include text designed to override the system prompt of a downstream LLM. Mitigation is mostly architectural — separating untrusted input from instructions.

Hallucination

Short. Confident output from a generative AI that has no basis in real data — a fabricated quote, a made-up source, a non-existent paper.

A reliability problem rather than a detection problem. Hallucinations are why AI-written text needs source verification regardless of how authentic the writing style appears. Detection cannot catch a hallucination; only fact-checking can.

Generative AI

Short. AI systems that produce new content — images, video, audio, or text — rather than classifying or analyzing existing content.

Generative AI includes diffusion models for images, transformers for text, audio models for music and speech, and combinations of all of these. The detection problem exists because generative AI is now good enough that humans cannot reliably tell its output from real content.

Perceptual hash

Short. A short fingerprint of an image, computed so that visually similar images produce similar fingerprints.

Used in reverse image search and de-duplication systems. Useful for detecting reused photos but does not identify AI generation directly. A new AI image will not match any perceptual hash unless it has been published before.

EXIF metadata

Short. Information embedded in a digital photo describing the camera, settings, and time of capture.

Real camera files carry EXIF data; many AI-generated images do not, or carry obviously synthetic values. Social platforms strip EXIF on upload, so missing EXIF on an Instagram photo is not informative. Present-and-coherent EXIF is mildly authenticating; absent EXIF is neutral.

Latent diffusion

Short. A diffusion model that operates in a compressed latent space rather than directly on pixels — the architecture behind Stable Diffusion.

By compressing images to a smaller latent representation before applying diffusion, latent models run faster and at higher resolution than pixel-space diffusion. The compression also leaves characteristic patterns in the output that detectors can target.

Adversarial example

Short. Synthetic content specifically tweaked to evade a detector while remaining convincing to a human.

Adversarial tuning is the moving frontier of the cat-and-mouse between generators and detectors. Each new detector eventually faces inputs designed to defeat it. The practical implication: a 'likely synthetic' result is more meaningful than a 'likely authentic' result on an unknown source.

Why AI detectors are not 100% accurate →