codereval

Name: arXiv/codereval
Author: arXiv

arXiv/codereval

Description

CoderEval is a benchmark for evaluating code generation models that consists of 230 Python and 230 Java tasks carefully curated from real-world open-source projects plus a self-contained execution platform to automatically assess functional correctness. It supports six levels of context dependency (code elements defined outside the target function such as types, APIs, variables, and constants) to measure models' ability to generate non-standalone functions beyond standalone scenarios.

arXiv GitHub

Leaderboard

Loading leaderboard...

Implementations

No implementations linked yet. Add one to showcase related work.