noisebench

Description

NoiseBench is an NER benchmark consisting of clean training data corrupted with six types of real noise, including expert errors, crowdsourcing errors, automatic annotation errors and LLM errors. It enables evaluation of noise-robust learning methods and shows that real noise is significantly more challenging than simulated noise while current state-of-the-art approaches fall far short of their theoretical upper bound.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/noisebench | OpenReward