Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM: Add JVM support in Prototyper #819

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 66 additions & 5 deletions llm_toolkit/prompt_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,9 @@ def __init__(self,
elif benchmark.is_cpp_target:
self.priming_template_file = self._find_template(
self.agent_templare_dir, 'prototyper-priming.cpp.txt')
elif benchmark.is_java_target:
self.priming_template_file = self._find_template(
self.agent_templare_dir, 'prototyper-priming.jvm.txt')
else:
self.problem_template_file = self._find_template(
self.agent_templare_dir, 'prototyper-priming.txt')
Expand All @@ -572,6 +575,53 @@ def __init__(self,
self.context_template_file = self._find_template(template_dir,
'context.txt')

def _format_jvm_requirement(self) -> str:
"""Formats a requirement based on the prompt template for JVM."""
requirement = self._get_template(
self._find_template(self._template_dir, 'jvm_requirement.txt'))

harness_name = ''
if self.benchmark:
harness_name = os.path.basename(self.benchmark.target_path).replace(
'.java', '')
if harness_name:
requirement = requirement.replace('{HARNESS_NAME}', harness_name)
else:
requirement = requirement.replace('{HARNESS_NAME}', 'Fuzz')

requirement = requirement.replace('{IMPORT_MAPPINGS}', '')
requirement = requirement.replace('{STATIC_OR_INSTANCE}', '')
requirement = requirement.replace('{NEED_CLOSE}', '')

return requirement

def format_jvm_problem(self, signature: str, priming: str) -> str:
"""Format target problem specifically for JVM project."""
if not self.benchmark:
return ''

template_file = self._find_template(self._template_dir, 'jvm_target.txt')

class_name = signature.split('].')[0][1:]

target = self._get_template(template_file)
target = target.replace('{CLASS}', class_name)
target = target.replace('{SIGNATURE}', signature)
target = target.replace('{PROJECT_NAME}', self.benchmark.project)
target = target.replace(
'{PROJECT_URL}',
oss_fuzz_checkout.get_project_repository(self.benchmark.project))

priming = priming.replace('{TARGET}', target)
priming = priming.replace('{REQUIREMENTS}', self._format_jvm_requirement())
priming = priming.replace(
'{DATA_MAPPING}',
self._get_template(
self._find_template(self._template_dir,
'jvm_specific_data_filler.txt')))

return priming

def build(self,
example_pair: list[list[str]],
project_example_content: Optional[list[list[str]]] = None,
Expand All @@ -583,10 +633,16 @@ def build(self,
return self._prompt
priming = self._format_priming(self.benchmark)
priming = priming.replace('{PROJECT_DIR}', project_dir)
final_problem = self.format_problem(self.benchmark.function_signature)
final_problem += (f'You MUST call <code>\n'
f'{self.benchmark.function_signature}\n'
f'</code> in your solution!\n')

if self.benchmark.language == 'jvm':
priming = self.format_jvm_problem(self.benchmark.function_signature,
priming)
final_problem = ''
else:
final_problem = self.format_problem(self.benchmark.function_signature)
final_problem += (f'You MUST call <code>\n'
f'{self.benchmark.function_signature}\n'
f'</code> in your solution!\n')
if project_context_content:
final_problem += self.format_context(project_context_content)
self._prepare_prompt(priming, final_problem, example_pair,
Expand All @@ -607,8 +663,13 @@ def __init__(self,
initial: Any = None):
super().__init__(model, benchmark, template_dir, initial)
# Load templates.
if benchmark.is_java_target:
priming_file = 'prototyper-fixing.jvm.txt'
else:
priming_file = 'prototyper-fixing.txt'

self.priming_template_file = self._find_template(self.agent_templare_dir,
'prototyper-fixing.txt')
priming_file)
self.build_result = build_result
self.compile_log = compile_log

Expand Down
32 changes: 32 additions & 0 deletions prompts/agent/prototyper-fixing.jvm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
Failed to build fuzz target. Please help fixing it.

Your response MUST follow the format shown here:
<conclusion>
[Conclusion summarising your finding and describe your modified fuzzing harness design.]
</conclusion>
<fuzz target>
[The full code of the modified and generated fuzzing harness.]
</fuzz target>
<build script>
[The full code of the build script if it is modified. Otherwise, skip this tag.]
</build script>
</system>


Here is the full Java fuzz target, build script, compilation command, and compilation output.
<fuzz target>\n{FUZZ_TARGET_SOURCE}\n</fuzz target>
{BUILD_TEXT}
<compilation log>\n{COMPILE_LOG}\n</compilation log>
YOU MUST first analyze the error messages with the fuzz target and the build script carefully to identify the root cause.
YOU MUST NOT make any assumptions of the source code or build environment. Always confirm assumptions with source code evidence, obtain them via Bash commands.
Once you are absolutely certain of the error root cause, output the FULL SOURCE CODE of the fuzz target (and only output FULL SOURCE CODE of build script, if /src/build.sh is insufficient and requires modification).

TIPS:
1. If necessary, add additional import statements for missing class dependencies.
2. Consult existing cross refencing code or unit testing to help you identify the error roort cause.
3. After collecting information, analyzing and understanding the error root cause. YOU MUST take at least one step to validate your theory with source code evidence.
4. Always use the source code from project source code directory `{PROJECT_DIR}/` to understand errors and how to fix them. For example, search for the key words (e.g., function name, type name, constant name) in the source code to learn how they are used. Similarly, learn from the other fuzz targets and the build script to understand how to include the correct headers.
5. Once you have verified the error root cause, output the FULL SOURCE CODE of the fuzz target (and FULL SOURCE CODE of build script, if /src/build.sh is insufficient or required modification).
6. Focus on writing a compilable fuzz target that calls the function-under-test {FUNCTION_SIGNATURE}, don't worry about coverage or finding bugs. We can improve that later, but first try to ensure it calls the function-under-test {FUNCTION_SIGNATURE} and can compile successfully.
7. If an error happens repeatedly and cannot be fixed, try to mitigate it. For example, replace or remove the line.

113 changes: 113 additions & 0 deletions prompts/agent/prototyper-priming.jvm.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
<system>
As a security testing engineer, you must write a fuzzing harness in Java following the Jazzer framework.
Objective: Your goal is to modify an existing fuzz target `{FUZZ_TARGET_PATH}` to write a minimum fuzz target of a given method shown beflow that can build successfully.
Your response MUST follow the format shown here:
<conclusion>
[Conclusion summarising your findsing and describe your modified fuzzing harness design.]
</conclusion>
<fuzz target>
[The full code of the modified and generated fuzzing harness.]
</fuzz target>
<build script>
[The full code of the build script if it is modified. Otherwise, skip this tag.]
</build script>
</system>

The details of the fuzz target is shown below.
{TARGET}

<steps>
Follow these steps to write a minimum fuzz target:

Step 1. Determine the information you need to write an effective fuzz target.
This includes:
* **Source code** of the function under test.
* **Custom Types and Dependencies** definitions and implementations.
* **Initialization and setup** requirements and steps.
* **Build details** and integration steps.
* **JDK version** requires to build the target project.
* **Build system** requires to build the target project, including but not limited to **Maven*, **Gradle**, **Ant** or plain **Javac**.
* Valid and edge-case input values.
* Environmental and runtime dependencies.

Step 2. Collect information using the Bash tool.
Use the bash tool (see <tool> section) and follow its rules to gather the necessary information. You can collect information from:
* The existing human written fuzz target at `{FUZZ_TARGET_PATH}`.
* The existing human written build script `/src/build.sh`.
* The project source code directory `{PROJECT_DIR}/` cloned from the project repository.
* Documentation about the project, the function, and the variables/constants involved and the Javadoc API for thet target project.
* Existing test cases and examples from the project source code directory `{PROJECT_DIR}/`
* The build system used by the target project by searching for build system property files, including but not limited to **pom.xml**, **build.gradle**, **build.gradle.kts** or **build.xml**.
* Environment variables.
* Knowledge about OSS-Fuzz's build infrastructure: It will compile your fuzz target in the same way as the exiting human written fuzz target with the build script.

Step 3. Analyze the method and its classes and parameters.
Understand the method under test by analyzing its source code, documentation and Javadoc API:
* **Purpose and functionality** of the method.
* **Input processing** and internal logic.
* **Dependencies** on other functions or global variables.
* **Exception handling** and edge cases.
* **Class construction** for invoking the necessary object.
* **Import statements** for necessary classes.
* **Modifiers** of the method, including but not limited to staticity or publicity of the method.

Step 4. Understand initialization requirements.
Identify what is needed to properly initialize the method or its class:
* **Complex input parameters or objects** initialization.
* **Constructor methods** or initialization routines.
* **Global state** or configuration needs to be set up.
* **Mocking** external dependencies if necessary.
* If it is a complex initialisation that can be done once and for all, consider adding those initialisation in the `public static void fuzzerInitialize()` method.

Step 5. Understand general requirements for Java fuzzing harness writing.
* **<requirements>** tag contains a list of additional requirements for the fuzzing harness generation that you MUST follow.

{REQUIREMENTS}

Step 6. Understand Constraints and edge cases.
For each input parameter, understand:
* Valid ranges and data types.
* Invalid or edge-case values (e.g., zero, NULL, predefined constants, predefined enum, maximum values).
* Special values that trigger different code paths.

Step 7: Plan Fuzz Target Implementation.
Decide how to implement the fuzz target:
* **Extract parameters** and random data from the `com.code_intelligence.jazzer.api.FuzzedDataProviderdata` object from the fuzzing entrypoint method `public static void fuzzerTestOneInput(com.code_intelligence.jazzer.api.FuzzedDataProvider)`. Consult the data mapping table and requirements for extracting parameters wrapped in the **<data_mapping>** tag below.
* Handle fixed-size versus variable-size data.
* **Initialize method's parameters** by appropriately mapping the raw input bytes.
* Ensure that the fuzz target remains deterministic and avoids side effects.

{DATA_MAPPING}


Step 8: **Write** the fuzz target code.
Implement the `fuzzeruzzerTestOneInput`, `fuzzerInitialize` and `fuzzerTearDown` method:
* Global initialisation:
* Put all once and for all initialisation in `fuzzerInitialize` method.
* External class import:
* Investigate how existing fuzz targets import external classes.
* Investigate where they are located in the project
* Collect all additional external classes required by your fuzz target and import them.
* Input Handling:
* Use `FuzzedDataProvider` if and only if the fuzz target at `{FUZZ_TARGET_PATH}` is a Java file.
* Check that the input size is sufficient.
* Extract parameters from the input data through the data mapping given in the last step.
* Handle any necessary conversions or validations.
* Method Invocation:
* Initialize required objects or state.
* Modify the existing fuzz target at `{FUZZ_TARGET_PATH}` to fuzz the method under test with the fuzzed parameters.
* Ensure proper exception handling.
* Cleanup:
* Reset any global state if necessary.
* Consider invoking `System.gc()` if large amount of object is created during fuzzing.
* Call `close` method for created objects of classes which implements `AutoCloseble` Interface.
* Clean up and close all global initialisation in `fuzzerTearDown` method.

Step 9 (Optional): **Modify** the Build Script.
Write a new build script only if the existing one (`/src/build.sh`) is insufficient:
* Decide if you need to modify the build script at `/src/build.sh` to successfully build the new fuzz target. For example, new dependency jars are needed.
* Include compilation steps for the project under test.
* Include compilation steps for the new fuzz target.
* Specify necessary compiler and linker flags.
* Ensure all dependencies are correctly linked.
</step>