AI explainers
How accurate are AI text detectors?
AI text detection has the largest accuracy gap of any detection category. The technology is real but the marketing claims are not. This post is the honest version, written for educators, editors, and HR teams who actually have to make decisions.
Two structural problems
Problem one: human writers in formal registers — academic English, legal writing, business prose — share many surface patterns with LLM output. Sentence-rhythm regularity, generic transitions, hedged claims. These patterns flag legitimate writing as AI.
Problem two: LLM text is easy to humanize. Light editing, paraphrasing tools, or simply running output through a second pass with different instructions can defeat detectors trained on the original outputs.
Reported vs realistic accuracy
Vendor benchmarks frequently report 90%+ accuracy on long, unedited LLM samples. Independent evaluations on classroom-realistic essays — short, edited, mixed-author — typically see accuracy below 80% with non-trivial false-positive rates on real student work.
The OpenAI text classifier was retired in 2023 partly because of this gap. The problem has not become easier since.
What to do with the score
Use it as a triage signal that prompts a conversation, not as evidence. A 'likely AI' score should lead to a discussion with the writer about their drafting process, not directly to a grade or HR decision. A 'likely human' score does not prove anything either.
If you're an educator, see the for-educators page for the workflow we recommend.
When the score is more reliable
Long samples, unedited output, casual register, and tasks where LLM patterns are most visible (formulaic structures, list-heavy writing, generic openings). Short samples and creative writing are the worst cases for detection accuracy.
Try the tool
AI Essay Detector
Run a sample to see the specific patterns the engine is reacting to — useful as a discussion starter, not a verdict.