Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented Oct 23, 2025

This PR introduces corpus support for the DotnetFuzzing project. Currently this works for local runs only (OneFuzz requires creating a special container for corpus, which probably needs to be done manually -- once per fuzzer, this is left for future work).

Currently, some fuzzers (IMO incorrectly) use dictionaries as a replacement for lack of corpus support, which makes fuzzing inefficient. This PR prepares ground for future improvements in this regard.

To validate the concept, this PR converts ZipArchive fuzzing to use corpus instead of a dictionary, and adds code to fuzz Deflate64 (for which we have managed implementation internal to reading ZipArchives).

@rzikm rzikm marked this pull request as ready for review October 23, 2025 13:57
@Copilot Copilot AI review requested due to automatic review settings October 23, 2025 13:57
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 23, 2025
@rzikm
Copy link
Member Author

rzikm commented Oct 23, 2025

cc @MihaZupan

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds corpus support to the DotnetFuzzing project for local runs, allowing fuzzers to use seed corpora instead of relying on dictionaries. The change improves fuzzing efficiency by providing proper initial test cases. The ZipArchiveFuzzer is updated to use a corpus, and a new Deflate64Fuzzer is introduced to test the managed Deflate64 decompression implementation.

  • Adds corpus infrastructure to the fuzzing framework with validation and deployment logic
  • Converts ZipArchiveFuzzer to use corpus instead of dictionary for better fuzzing effectiveness
  • Introduces Deflate64Fuzzer to test internal Deflate64 decompression used in ZipArchive reading

Reviewed Changes

Copilot reviewed 6 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/libraries/Fuzzing/DotnetFuzzing/Program.cs Adds corpus directory handling, validation, and deployment logic for both OneFuzz and local runs
src/libraries/Fuzzing/DotnetFuzzing/IFuzzer.cs Extends IFuzzer interface with optional Corpus property
src/libraries/Fuzzing/DotnetFuzzing/Fuzzers/ZipArchiveFuzzer.cs Adds corpus property to use seed files instead of dictionary
src/libraries/Fuzzing/DotnetFuzzing/Fuzzers/Deflate64Fuzzer.cs New fuzzer for testing Deflate64 decompression with reflection-based stream creation
src/libraries/Fuzzing/DotnetFuzzing/DotnetFuzzing.csproj Simplifies fuzzer file inclusion using wildcard and adds corpus files to build output
eng/pipelines/libraries/fuzzing/deploy-to-onefuzz.yml Adds OneFuzz deployment task for new Deflate64Fuzzer

Comment on lines +25 to +26
TestArchive(CopyToRentedArray(bytes), bytes.Length, async: false).GetAwaiter().GetResult();
TestArchive(CopyToRentedArray(bytes), bytes.Length, async: true).GetAwaiter().GetResult();
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fuzzer rents two separate arrays for synchronous and asynchronous test paths. Consider reusing a single rented array for both paths to reduce allocation overhead during fuzzing.

Copilot uses AI. Check for mistakes.

@rzikm rzikm added area-Meta and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 23, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

@rzikm rzikm requested a review from MihaZupan October 23, 2025 14:01
string script = $"%~dp0/libfuzzer-dotnet.exe --target_path=%~dp0/DotnetFuzzing.exe --target_arg={fuzzer.Name}";

if (fuzzer.Dictionary is not null)
// We don't support dictionaries and corpora at the same time yet, and some fuzzers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any reason why not?
The corpus will be ignored by OneFuzz, but I don't see why we should block it locally.

Copy link
Member Author

@rzikm rzikm Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes only the local run script, OneFuzz runs are unaffected.

Since we don't have corpora setup for OneFuzz (yet), some fuzzers place example inputs as dictionary entries instead, but that does not work as well as having them in the corpus. I didn't want to remove the dictionaries entirely because that might slow down OneFuzz runs, so as a temporary compromise, corpus takes precedence over dictionary when running locally (as it is more efficient to omit the suboptimal dictionary in this case).

Of course, having both at the same time (corpus of example whole inputs, dictionary with the right "alphabet" from which to compose inputs) would be best and is something we should aim for the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those dictionaries aren't adding any value beyond seeding the initial corpus for OneFuzz, I think they should be fine to delete now - OneFuzz will reuse the current (already seeded) corpus for new runs for us.

// use corpus as corpus if available as it is more effective that way.
if (fuzzer.Corpus is not null)
{
script += " %~dp0/corpus";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this after the "additional arguments" so that it becomes the last option.

Going by the text in https://llvm.org/docs/LibFuzzer.html#options

To run the fuzzer, pass zero or more corpus directories as command line arguments. The fuzzer will read test inputs from each of these corpus directories, and any new test inputs that are generated will be written back to the first corpus directory

This way if you use a custom folder when fuzzing locally, you'll see the inputs being written there instead of in the deployment folder.

@MihaZupan MihaZupan added this to the 11.0.0 milestone Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants