Skip to content

facebookresearch/halteval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Halteval-prelim

Data for program termination evaluation used in the Code World Model (CWM) paper [1]. For more information on CWM, see: https://github.com/facebookresearch/cwm

We generate Halteval-prelim by automatically translating C programs with termination annotations into Python using LLaMA-3-70B via few-shot prompting. The C programs are sourced from the International Competition on Software Verification (SVCOMP) [2] and the Termination Problems Database (TPDB) [3]. Each original problem comes with termination/non-termination annotations, which we manually verify are preserved during the Python translation phase and otherwise discard. We obtained a balanced dataset consisting of 115 terminating (T) and 115 non-terminating (NT) Python programs.

Each example in data/halteval_prelim.jsonl comes with the following fields:

  • id: string id from the original repository (in SVCOMP or TPDB).
  • terminates: boolean with the termination property.
  • enc_code: The Python function the termination property refers to.

The Python function is encoded to mitigate potential contamination in the future. To obtain the code:

import json
import pickle
import zlib
import base64

code = json.loads(
       pickle.loads(zlib.decompress(
       base64.b64decode(
       data["enc_code"].encode("utf-8"))
)))

Citation

If you use this dataset, please cite the CWM paper, where this data was originally used:

@misc{faircodegenteam2025cwmopenweightsllmresearch,
      title={CWM: An Open-Weights LLM for Research on Code Generation with World Models},
      author={Jade Copet and Quentin Carbonneaux and Gal Cohen and Jonas Gehring and Jacob Kahn and Jannik Kossen and Felix Kreuk and Emily McMilin and Michel Meyer and Yuxiang Wei and David Zhang and Kunhao Zheng and Jordi Armengol-Estapé and Pedram Bashiri and Maximilian Beck and Pierre Chambon and Abhishek Charnalia and Chris Cummins and Juliette Decugis and Zacharias V. Fisches and François Fleuret and Fabian Gloeckle and Alex Gu and Michael Hassid and Daniel Haziza and Badr Youbi Idrissi and Christian Keller and Rahul Kindi and Hugh Leather and Gallil Maimon and Aram Markosyan and Francisco Massa and Pierre-Emmanuel Mazaré and Vegard Mella and Naila Murray and Keyur Muzumdar and Peter O'Hearn and Matteo Pagliardini and Dmitrii Pedchenko and Tal Remez and Volker Seeker and Marco Selvi and Oren Sultan and Sida Wang and Luca Wehrstedt and Ori Yoran and Lingming Zhang and Taco Cohen and Yossi Adi and Gabriel Synnaeve},
      year={2025},
      eprint={2510.02387},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2510.02387},
}

Translations License

The Data is released CC BY and is intended for benchmarking purposes only. The data are outputs of Llama 3.1 and subject to the Llama 3.1 license (https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE). If you use of this portion of the data to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name. Third party content pulled from other locations are subject to its own licenses and you may have other legal obligations or restrictions that govern your use of that content.

References

[1] CWM: https://github.com/facebookresearch/cwm

[2] SVCOMP: https://sv-comp.sosy-lab.org/

[3] TPDB: https://termination-portal.org/wiki/TPDB

About

Program termination evaluation dataset (HaltEval-prelim) accompanying the CWM release

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •