Papers
arxiv:2507.02778

Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Published on Jul 3, 2025
· Submitted by
Ken Tsui
on Jul 4, 2025
Authors:

Abstract

LLMs exhibit a 'Self-Correction Blind Spot' where they fail to correct errors in their own outputs, which can be mitigated by appending "Wait" and is related to the composition of training data.

Although large language models (LLMs) have become transformative, they still make mistakes and can explore unproductive reasoning paths. Self-correction is an important capability for a trustworthy LLM, particularly an autoregressive LLM. While LLMs can identify error in user input, they exhibit a systematic 'Self-Correction Blind Spot' - failing to correct identical error in their own outputs. To systematically study this phenomenon, we introduce Self-Correction Bench, a systematic framework to measure this phenomenon through controlled error injection at three complexity levels. Testing 14 models, we find an average 64.5% blind spot rate. We find multiple evidences that this limitation relates to training data composition: human training demonstrations predominantly show error-free responses rather than error-correction sequences, unlike RL-trained models that learn error correction through outcome feedback. Remarkably, simply appending "Wait" reduces blind spots by 89.3%, suggesting that the capability exists but requires activation. Our work highlights a critical limitation in current LLMs and offers potential avenues for improving their reliability and trustworthiness.

Community

Wonderful work @kenhktsui !

·
Paper author

Thanks @huu-ontocord !

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2507.02778
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.02778 in a model README.md to link it from this page.

Datasets citing this paper 4

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.02778 in a Space README.md to link it from this page.

Collections including this paper 2