Entropy Is Not Enough: Uncertainty Quantification for LLMs fails under Aleatoric Uncertainty

Tim Tomov, Dominik Fuchsgruber, Tom Wollschläger, Stephan Günnemann

Accurate uncertainty quantification (UQ) in Large Language Models (LLMs)is critical for trustworthy deployment. While real-world language is inherentlyambiguous, existing UQ methods implicitly assume scenarios with no ambiguity.Therefore, a natural question is how they work under ambiguity. In this work,we demonstrate that current uncertainty estimators only perform well under therestrictive assumption of no aleatoric uncertainty and degrade significantly onambiguous data. Specifically, we provide theoretical insights into this limitationand introduce two question-answering (QA) datasets with ground-truth answerprobabilities. Using these datasets, we show that current uncertainty estimatorsperform close to random under real-world ambiguity. This highlights a fundamentallimitation in existing practices and emphasizes the urgent need for new uncertaintyquantification approaches that account for the ambiguity in language modeling

picture_as_pdf flag

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment