Skip to content

TruX-DTF/CLCCD

Repository files navigation

The Struggles of LLMs in Cross-lingual Code Clone Detection

Description:

This project explores whether large language models (LLMs), baselines and traditional classifiction algorithms can identify code clones across different programming languages.

This work aims to:

  • Evaluate the effectiveness of LLMs in cross-lingual code clone detection.
  • Compare LLM performance to baselines and traditional methods.
  • Uncover the best approach for identifying similar code across languages.

Workflow

Content of the repository

This repository contains the following files:

  • data_selection: This directory contains all the code to select the subsets
  • data: This directory contains the subset of XLCoST and CodeNet used in the experiments
  • classifier: This directory contains all the code and data for the classification part
  • results: This directory contains all the results for each dataset and each LLM
  • get_embeddings.py : permits to generate the vectors for each code snippet
  • sp, se, sct : permit to run experiment with the gpt-3.5-turbo depending on the experiment
  • llama2_inf, falcon_inf, starchat_inf, starcoder_inf : permit to run experiment with the llama-2-7b-chat-hf, falcon-7b-instruct, starchat-beta and starcoder2-15b-instruct

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages