climaqa

Description

ClimaQA-Gold is an expert-annotated benchmark dataset for evaluating the quality and scientific validity of LLM outputs on climate science question-answering. It consists of graduate-textbook-derived QA pairs generated by the ClimaGen adaptive framework with climate scientists in the loop (complemented by ClimaQA-Silver, a large-scale synthetic QA dataset).

Leaderboard
Loading leaderboard...
Implementations

No implementations linked yet. Add one to showcase related work.

arXiv/climaqa | OpenReward