Instruct-or-Interact

Replication package of ICSE 2025 paper "Instruct or Interact? Exploring and Eliciting LLMs’ Capability in Code Snippet Adaptation Through Prompt Engineering".

Installation

Run the following command to install required dependencies

pip install -r requirements.txt

Run the inference

cd src

# adaptation (GPT-3.5 in instruction prompt)
python run.py --data_path ../data/retrieved.json --pattern instruction --task 1 --task_level 0 --context_level 1 --cot_level 0 --repeat 5 --model 0 --source api --temperature 0.8

The pattern parameter describes the prompt style, including direct (the text same as the input text), instruction (the instruction prompt used in our empirical study), enhanced (enhanced prompt with task decomposition),human for Human-LLM interaction, mac and mae for multi-agent prompt with counselor or evaluator.

The task parameter describe the task is generation (0) or adaptation (1).

The task_level, context_level and cot_level describe the information provided in the prompt. The detail settings are in the prompt_loader.py and utils.py.

The model parameter describes which LLM is used.

The source parameter serves as a label in the name string of the output file.

Run the test

cd src

# adaptation (GPT-3.5 in instruction prompt)
python test.py  --pattern instruction --task 1 --task_level 0 --context_level 1 --cot_level 0 --repeat 5 --model 0 --source api --temperature 0.8

The test script share almost the same parameters with the inference script, which could be used to obtain the evaluation results directly from the inference output (without re-running the inference script).

To reproduce our results

1. RQ1: Re-run from inference phase

For each model with different parameter and temperature settings, run the inference script with the following command:

cd src

# LLM adaptations
# --model: 0(GPT-3.5) 8(CodeLlama) 10(Llama-3)
# --source: 7b, 8b, 13b, 34b, api, ... It is a field for label.
# --temperature: 0, 0.2, 0.4, 0.6, 0.8, 1.0
python run.py --data_path ../data/retrieved.json --pattern instruction --task 1 --context_level 1 --repeat 5 --model MOD --source Xb --temperature TEMP


# GPT-3.5 generation (baseline)
# --temperature: 0, 0.2, 0.4, 0.6, 0.8, 1.0
python run.py --pattern instruction --context_level 1  --repeat 5 --model 0 --source api --temperature TEMP

Outputs can be found in ./src/output, test results are can be found in ./src/test.

2. RQ1: Re-run our results (test only)

# copy results to output dir
cp ./result/RQ1/MODEL/PARAM/XXX.json ./src/output/

cd src

# LLM adaptations
# --model: 0(GPT-3.5) 8(CodeLlama) 10(Llama-3)
# --source: 7b, 8b, 13b, 34b, api, ... It is a field for label.
# --temperature: 0, 0.2, 0.4, 0.6, 0.8, 1.0
python test.py --pattern instruction --task 1 --context_level 1 --repeat 5 --model MOD --source SRC --temperature TEMP

# GPT-3.5 generation (baseline)
# --temperature: 0, 0.2, 0.4, 0.6, 0.8, 1.0
python test.py --pattern instruction --context_level 1  --repeat 5 --model 0 --source api --temperature TEMP

Test results are can be found in ./src/test.

3. RQ2+RQ3: Manual Annotation

See ./results/RQ2+3/manual_annotation.xlsx.

4. Evaluation Results For Interactive Prompting

4.1 Ablation Study

cd src

# Enhanced
python run.py --data_path ../data/retrieved.json --pattern enhanced --task 1 --context_level 3 --cot_level 1 --repeat 5 --source api --temperature 0.8

# Ablation
--context_level 1: without docstring & code
--cot_level 0: without dependency
--pattern instruction: without task decomposition

4.2 Mitigation of Bugs

See ./results/Evaluation/mitigation_of_bugs/manual_annotation.xlsx.

4.3 Multi-Agent Interactions

cd src

# Human-LLM
# --model: 0 for GPT-3.5 (default)
# --source: manual, ... It is a field for label.
# --temperature: 0.8 for GPT-3.5
python run.py --data_path ../data/retrieved.json --pattern human --task 2 --context_level 3 --cot_level 2 --repeat 5 --source manual --temperature 0.8

# MAE
# --model: 0 for GPT-3.5 (default)
# --source: manual, ... It is a field for label.
# --temperature: 0.8 for GPT-3.5
python run.py --data_path ../data/retrieved.json --pattern mae --task 0 --context_level 3 --cot_level 1 --repeat 5 --source manual --temperature 0.8

# MAC
# --model: 0(GPT-3.5) 8(CodeLlama) 10(Llama-3)
# --source: 8b, 34b, api, manual, ... It is a field for label.
# --temperature: 0.8 for GPT-3.5, 0.6 for CodeLlama-34B, Llama-3-8B
python run.py --data_path ../data/retrieved.json --pattern mac --task 2 --context_level 3 --cot_level 1 --repeat 5 --model MOD --source SRC --temperature TEMP

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
results		results
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instruct-or-Interact

Installation

Run the inference

Run the test

To reproduce our results

1. RQ1: Re-run from inference phase

2. RQ1: Re-run our results (test only)

3. RQ2+RQ3: Manual Annotation

4. Evaluation Results For Interactive Prompting

4.1 Ablation Study

4.2 Mitigation of Bugs

4.3 Multi-Agent Interactions

About

Releases

Packages

Languages

License

ztwater/Instruct-or-Interact

Folders and files

Latest commit

History

Repository files navigation

Instruct-or-Interact

Installation

Run the inference

Run the test

To reproduce our results

1. RQ1: Re-run from inference phase

2. RQ1: Re-run our results (test only)

3. RQ2+RQ3: Manual Annotation

4. Evaluation Results For Interactive Prompting

4.1 Ablation Study

4.2 Mitigation of Bugs

4.3 Multi-Agent Interactions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages