SocialData

Description

SocialData is a collection of data science competition environments sourced from DrivenData. It contains 5 multi-turn sandboxed tasks where agents develop machine learning models to solve real-world prediction problems spanning public health, infrastructure, natural language processing, and disaster response.

Capabilities

Data exploration and feature engineering
Machine learning model development
Time series prediction
Multi-target classification and regression
Document summarization with LLMs

Compute Requirements

Agents are given a sandboxed environment with 1 CPU and 4 GB RAM, with access to scientific Python libraries (pandas, scikit-learn, etc.).

Tasks

There are 5 environment variants, each with a train split:

Variant	Description	Metric
FluVaccinePrediction	Predict H1N1 and seasonal flu vaccination probabilities	Mean ROC AUC
PumpItUpPrediction	Classify water pump functionality in Tanzania	F1-micro
DocSumTask	Summarize social science research papers	ROUGE-2 F1
DengAIPrediction	Predict weekly dengue fever case counts	Mean Absolute Error
RichterPrediction	Predict earthquake building damage grades	F1-micro

Reward Structure

This is a multi-turn environment. Agents explore data, develop models, generate predictions, and submit via the submit_predictions tool. Each variant uses its specific evaluation metric:

FluVaccinePrediction: Mean ROC AUC across H1N1 and seasonal targets (0-1)
PumpItUpPrediction: Micro-averaged F1 across 3 classes (0-1)
DocSumTask: ROUGE-2 F1 score (0-1)
DengAIPrediction: Inverted MAE (lower error = higher reward)
RichterPrediction: Micro-averaged F1 across 3 damage grades (0-1)

Data

Training data is mounted read-only at /orwd_data. Each competition includes:

Training features and labels
Test features (labels hidden)
Data dictionaries and descriptions

Data is sourced from DrivenData competitions and stored on the OpenReward platform.

Tools

Each variant provides CLI tools plus a submission tool:

Tool	Description
`bash`	Execute shell commands in the sandbox
`glob`	Find files by pattern
`grep`	Search file contents
`ls`	List directory contents
`read`	Read file contents
`write`	Write file contents
`edit`	Edit existing files
`multi_edit`	Make multiple edits
`todo_write`	Track task progress
`submit_predictions`	Submit predictions CSV for evaluation. Ends the episode.

Time Horizon

Multi-turn. Agents explore data, develop and train models, generate predictions, save to submission.csv, and submit for evaluation.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

None. All evaluation is deterministic using competition-specific metrics.

Safety

Agents in SocialData work within sandboxed environments to develop ML models. The environment does not present direct safety risks.

Citation

@software{socialdata_openreward,
  title={SocialData: DrivenData Competition Environments for OpenReward},
  author={GeneralReasoning},
  year={2025},
  url={https://openreward.ai/GeneralReasoning/SocialData}
}

Repository

Source repository

EnvCommons/SocialData

Clone Repository

Tools

Tools available in the environment

No tools available for this environment, it probably hasn't been indexed yet.

Compute Configuration

Resource allocation for this environment.

Component	Configuration
Environment Server	1 vCPU / 4 GB RAM
Sandbox Machine	1 vCPU / 4 GB RAM

Estimated Cost

Pay per second of active session usage. Billing starts when your session begins and stops when it ends.

Component	Cost / second
Environment	$0.0000320
Sandbox	$0.0000320
Total	$0.0000640

Examples

5-minute session$0.0192

1-hour session$0.2304