LongFact
LongFact
Description
LongFact is an environment for evaluating long-form factual accuracy. Based on Google DeepMind's LongFact benchmark, agents are given open-ended questions requiring detailed factual responses. Evaluation uses the SAFE (Search-Augmented Factuality Evaluation) pipeline: responses are decomposed into atomic facts, each verified via web search, and scored by factual precision.
Capabilities
- Long-form factual question answering
- Generating detailed and accurate responses across 38 subject areas
Compute Requirements
Agents in LongFact are given a standard environment with no sandbox or file system access.
License
MIT.
Tasks
One split: test (2,280 tasks) spanning 38 subject areas.
Reward Structure
Single-turn evaluation. Agent submits a long-form response via submit_answer. The response is decomposed into atomic facts using gpt-5-mini, then each fact is verified via web search. Reward is factual precision: supported_facts / relevant_facts, ranging from 0.0 to 1.0. Irrelevant facts are excluded from the denominator.
Data
longfact_data.parquet sourced from HuggingFace claserken/longfact. Stored on the OpenReward platform.
Tools
Single tool: submit_answer — submit a long-form factual response for SAFE evaluation.
Time Horizon
Single-turn.
Environment Difficulty
The original paper evaluates 13 models across four model families (Gemini, GPT, Claude, PaLM-2) using F1@K metrics. Top performers were GPT-4-Turbo, Gemini-Ultra, and PaLM-2-L-IT-RLHF. Larger models consistently achieve higher factual precision than smaller variants within the same family.
Other Environment Requirements
OpenAI API key required for fact decomposition and web-search-based verification. Pass via secrets={"openai_api_key": "..."}.
Safety
Agents in LongFact generate factual responses in a standard environment. The environment does not present direct safety risks.
Citation
@inproceedings{wei2024longfact,
title={Long-form factuality in large language models},
author={Wei, Jerry and Yang, Chengrun and Song, Xinying and Lu, Yifeng and Hu, Nathan and Huang, Jie and Tran, Dustin and Peng, Daiyi and Liu, Ruibo and Huang, Da and Du, Cosmo and Le, Quoc V.},
booktitle={NeurIPS},
year={2024}
}