roleeval
Description
RoleEval is a bilingual benchmark for assessing the memorization, utilization, and reasoning capabilities of role knowledge in large language models. It comprises RoleEval-Global and RoleEval-Chinese, totaling 6,000 Chinese-English parallel multiple-choice questions about 300 influential real and fictional characters across diverse domains that test basic facts and multi-hop reasoning and are vetted through automatic and human quality checks.
Leaderboard
Loading leaderboard...
Implementations
No implementations linked yet. Add one to showcase related work.