scholarsearch

Description

ScholarSearch is a benchmark dataset for evaluating large language models' complex academic information retrieval capabilities, emphasizing academic practicality, high difficulty (answers often requiring multiple deep searches), concise evaluation with clear sources and brief explanations, and broad coverage across at least 15 disciplines. It is designed to mirror real academic research tasks—measuring LLM performance in deep literature tracing, organization, professional database support, long-tail knowledge navigation, and academic rigor.

Leaderboard
Loading leaderboard...
Implementations (1)
EnvironmentStarsLast Updated
GeneralReasoningGeneralReasoning/ScholarSearch
0
1 months ago
arXiv/scholarsearch | OpenReward