roleeval

Description

RoleEval is a bilingual benchmark for assessing the memorization, utilization, and reasoning capabilities of role knowledge in large language models. It comprises RoleEval-Global and RoleEval-Chinese, totaling 6,000 Chinese-English parallel multiple-choice questions about 300 influential real and fictional characters across diverse domains that test basic facts and multi-hop reasoning and are vetted through automatic and human quality checks.

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/roleeval | OpenReward