AI explainers

How accurate are AI text detectors?

Updated 2026-05-105 min read

AI text detection has the largest accuracy gap of any detection category. The technology is real but the marketing claims are not. This post is the honest version, written for educators, editors, and HR teams who actually have to make decisions.

Two structural problems

Problem one: human writers in formal registers — academic English, legal writing, business prose — share many surface patterns with LLM output. Sentence-rhythm regularity, generic transitions, hedged claims. These patterns flag legitimate writing as AI.

Problem two: LLM text is easy to humanize. Light editing, paraphrasing tools, or simply running output through a second pass with different instructions can defeat detectors trained on the original outputs.

Reported vs realistic accuracy

Vendor benchmarks frequently report 90%+ accuracy on long, unedited LLM samples. Independent evaluations on classroom-realistic essays — short, edited, mixed-author — typically see accuracy below 80% with non-trivial false-positive rates on real student work.

The OpenAI text classifier was retired in 2023 partly because of this gap. The problem has not become easier since.

What to do with the score

Use it as a triage signal that prompts a conversation, not as evidence. A 'likely AI' score should lead to a discussion with the writer about their drafting process, not directly to a grade or HR decision. A 'likely human' score does not prove anything either.

If you're an educator, see the for-educators page for the workflow we recommend.

When the score is more reliable

Long samples, unedited output, casual register, and tasks where LLM patterns are most visible (formulaic structures, list-heavy writing, generic openings). Short samples and creative writing are the worst cases for detection accuracy.

Try the tool