commonlid

Description

CommonLID (Community-driven Language Identification benchmark) is a community-driven, human-annotated benchmark for language identification in the web domain covering 109 languages, including many previously under-served. It is designed to evaluate LID models on noisy, heterogeneous web data and is released with code and annotations under an open, permissive license.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/commonlid | OpenReward