roleeval

Name: arXiv/roleeval
Author: arXiv

arXiv/roleeval

Description

RoleEval is a bilingual benchmark for assessing the memorization, utilization, and reasoning capabilities of role knowledge in large language models. It comprises RoleEval-Global and RoleEval-Chinese, totaling 6,000 Chinese-English parallel multiple-choice questions about 300 influential real and fictional characters across diverse domains that test basic facts and multi-hop reasoning and are vetted through automatic and human quality checks.

arXiv

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.