GitHub - TruX-DTF/CLCCD

The Struggles of LLMs in Cross-lingual Code Clone Detection

Description:

This project explores whether large language models (LLMs), baselines and traditional classifiction algorithms can identify code clones across different programming languages.

This work aims to:

Evaluate the effectiveness of LLMs in cross-lingual code clone detection.
Compare LLM performance to baselines and traditional methods.
Uncover the best approach for identifying similar code across languages.

Content of the repository

This repository contains the following files:

data_selection: This directory contains all the code to select the subsets
data: This directory contains the subset of XLCoST and CodeNet used in the experiments
classifier: This directory contains all the code and data for the classification part
results: This directory contains all the results for each dataset and each LLM
get_embeddings.py : permits to generate the vectors for each code snippet
sp, se, sct : permit to run experiment with the gpt-3.5-turbo depending on the experiment
llama2_inf, falcon_inf, starchat_inf, starcoder_inf : permit to run experiment with the llama-2-7b-chat-hf, falcon-7b-instruct, starchat-beta and starcoder2-15b-instruct

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
classifier		classifier
code_similarities		code_similarities
data		data
data_selection		data_selection
images		images
results		results
README.md		README.md
ada_sim_vis.py		ada_sim_vis.py
count_tokens.py		count_tokens.py
falcon_inf.py		falcon_inf.py
get_embeddings.py		get_embeddings.py
gpt_inf.py		gpt_inf.py
gpt_inf_se.py		gpt_inf_se.py
gpt_sim_vis.py		gpt_sim_vis.py
llama2_inf.py		llama2_inf.py
prompt.py		prompt.py
sct.py		sct.py
se.py		se.py
sp.py		sp.py
starchat_inf.py		starchat_inf.py
starcoder_inf.py		starcoder_inf.py
vis.py		vis.py
vis_bl.py		vis_bl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Struggles of LLMs in Cross-lingual Code Clone Detection

Description:

Content of the repository

About

Releases

Packages

Languages

TruX-DTF/CLCCD

Folders and files

Latest commit

History

Repository files navigation

The Struggles of LLMs in Cross-lingual Code Clone Detection

Description:

Content of the repository

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages