Skip to content

Conversation

@kkysen
Copy link
Contributor

@kkysen kkysen commented Jan 8, 2026

As a general-purpose solution to bypassing functions the postprocessor's comment transferer (and future transforms, too) can't handle yet, this adds an exclude list to skip functions in files. There is also --ident-filter, which is a single regex just based on the function name, but this is more specific and also takes into account the file name. The file passed to --exclude-file is a YAML file containing file paths (relative to the exclude file's directory) and the function names within them to skip. This gets us around issues like conditionally-compiled comments (#1534), the first function in a file also including license comments and #includes, and anything else. I've run this on json-c so far

(cd c2rust-postprocess/ && uv run postprocess $OLDPWD/tests/integration/tests/json-c/repo/lib.rs --no-update-rust --exclude-file $OLDPWD/tests/integration/tests/json-c/postprocess-exclude.yml)

and checked-in the LLM cache for it, but setting up the testing infrastructure to run this automatically is still a WIP (should be the next PR). Funnily enough, the LLM cache is going to end up behaving similarly to snapshot tests once it's up and running.

@kkysen kkysen requested a review from thedataking January 8, 2026 22:42
Copy link
Contributor

@thedataking thedataking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should drop the functionality provided by --ident-filter. I would like to retain the ability to filter on the command line without having to provide a map file. I like to just translate a single function over and over for development; I'm too lazy to write a file for that.

Please make it such that the internal exclude list can be built from a file or a command line argument? I would just like to pass a flag that lets me restrict postprocessing to a given identifier (or, ideally, set of related identifiers which is why I used a regex).

Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should drop the functionality provided by --ident-filter. I would like to retain the ability to filter on the command line without having to provide a map file. I like to just translate a single function over and over for development; I'm too lazy to write a file for that.

Okay, I can add that back. Is it important that it's still a regex?

Please make it such that the internal exclude list can be built from a file or a command line argument? I would just like to pass a flag that lets me restrict postprocessing to a given identifier (or, ideally, set of related identifiers which is why I used a regex).

Is this also needed if there's also --ident-filter?

@thedataking
Copy link
Contributor

thedataking commented Jan 8, 2026

Okay, I can add that back. Is it important that it's still a regex?

It is nice to be able to filter on a glob pattern since related C functions share a prefix or suffix. Not sure we need a full regex. I think you should consider supporting glob patterns in the exclude files however since many related C files follow a pattern.... I'm assuming it is easy to do and useful but I'm open to being told otherwise.

Is this also needed if there's also --ident-filter?

No, I didn't mean to specify implementation. I shouldn't have phrased it that way. I would like something similar to what we had and I wasn't sure if you'd just wanna drop the commit so we preserve the old functionality or merge it with the new somehow. Please do what you think makes sense and we can continue the discussion from there.

@kkysen kkysen force-pushed the kkysen/postprocess-exclude branch 2 times, most recently from 5674ba2 to dc319fd Compare January 9, 2026 06:47
@kkysen kkysen requested a review from thedataking January 9, 2026 06:47
Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thedataking, I added back --ident-filter now. Seems like the simplest solution rather than trying to combine them.

@kkysen
Copy link
Contributor Author

kkysen commented Jan 9, 2026

@fw-immunant, --emit-c-decl-map is failing on zstd and python2 in slice_decl_with_fixups.

Copy link
Contributor

@thedataking thedataking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 🚀 Minor stuff:

  • exclude_list.py: using assert for data validation isn't encouraged. They can be disabled with the optimize flag. Moreover, incorrect input data aren't truly exceptional events. Can you raise ValueError, TypeError, or KeyError as appropriate to communicate bad exclude file contents?
  • postprocess-exclude.yml: How do we capture why an identifier is skipped? I know we have the postprocessor issue where some comments are guarded but there could also be other reasons to skip a function. Along the same lines, how do I capture that we should skip an identifier for transform X but apply transform Y? (I'm fine landing this without support for multiple transforms on the assumption that it is something we can easily add later.)
  • Could you document the exclude file syntax in the README? Doesn't have to be fancy or perfect; could be a simple example plus some limitations (can I skip a whole file, can't use globs to skip a group of identifiers.).

Again, looks great, most important is to not use asserts for data validation IMO. Also happy that this functionality surfaced problems with --emit-c-decl-map which we need to get fixed.

Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • exclude_list.py: using assert for data validation isn't encouraged. They can be disabled with the optimize flag. Moreover, incorrect input data aren't truly exceptional events. Can you raise ValueError, TypeError, or KeyError as appropriate to communicate bad exclude file contents?

Oh, whoops. I didn't realize Python was like that and you could disable asserts. I'll fix that, but we also use asserts in a few other places that need to be checked, too.

  • postprocess-exclude.yml: How do we capture why an identifier is skipped? I know we have the postprocessor issue where some comments are guarded but there could also be other reasons to skip a function.

We don't do anything for why an identifier is skipped for now, but that is something we could probably add.

Along the same lines, how do I capture that we should skip an identifier for transform X but apply transform Y?

We might want to just have a different set of excludes for every transform. I'm not sure how interrelated they'll end up being, so we should probably wait until we have other transforms to decide how exactly we manage that. Shouldn't be too difficult whichever way we decide to handle that.

(I'm fine landing this without support for multiple transforms on the assumption that it is something we can easily add later.)

Yup, we should be able to add it later.

  • Could you document the exclude file syntax in the README? Doesn't have to be fancy or perfect; could be a simple example plus some limitations (can I skip a whole file, can't use globs to skip a group of identifiers.).

👍

@kkysen kkysen requested a review from thedataking January 12, 2026 22:35
@kkysen kkysen force-pushed the kkysen/postprocess-exclude branch from 67be1e4 to 0638a26 Compare January 12, 2026 22:37
@kkysen
Copy link
Contributor Author

kkysen commented Jan 12, 2026

@thedataking, I fixed the asserts introduced here, but not all of them in the codebase (we probably should, though, if we expect someone may turn off assertions, as they're used similarly elsewhere as well).

I added the README explanations as well.

I'll do the exclude file extensions later once we know more about what exactly we want for that.

Should I try to exclude --emit-c-decl-map for the projects it fails on, or wait until that's fixed first (simpler but would have to wait longer)?

@thedataking
Copy link
Contributor

Everything you wrote here #1541 (comment) sounds good. Can you unpack:

Should I try to exclude --emit-c-decl-map for the projects it fails on, or wait until that's fixed first (simpler but would have to wait longer)?

Couldn't we wrap up this PR and then add exclude files for additional projects in a follow up PR? If yes, I'd like to do that.

Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything you wrote here #1541 (comment) sounds good. Can you unpack:

Should I try to exclude --emit-c-decl-map for the projects it fails on, or wait until that's fixed first (simpler but would have to wait longer)?

Couldn't we wrap up this PR and then add exclude files for additional projects in a follow up PR? If yes, I'd like to do that.

I added --emit-c-decl-map in templates.py in this PR so that after running the tests, you can run c2rust-postprocess manually on it. That's what's breaking in CI right now. I could

  1. Don't include --emit-c-decl-map in this PR, so to test yourself, you'd first have to add it there.
  2. Add support to conditionally add --emit-c-decl-map and only enable it on the projects that work at the moment.
  3. Fix the --emit-c-decl-map bug.

@kkysen kkysen force-pushed the kkysen/postprocess-exclude branch 2 times, most recently from 94af50b to 64b51fd Compare January 13, 2026 22:03
@kkysen
Copy link
Contributor Author

kkysen commented Jan 13, 2026

I removed 5946368 for now until #1545 is fixed so I can merge it now.

`--ident-filter` takes a single regex,
while `--exclude-file` is a YAML file
containing file paths and identifiers of functions to skip.
This is more precise and scalable.
We can use this as a general solution to
skipping functions we can't handle yet so that we can scale up testing first.
`--ident-filter` is still kept since it's easy to use one-off,
while `--exclude-file` is better for more comprehensive tests.
Also, check-in the LLM cache for running `c2rust-postprocess` on `json-c`.

This doesn't yet set up automated testing in CI,
but we're now able to run `c2rust-postprocess` on `json-c` manually
after running the existing `tests/integration` tests
and adding `--emit-c-decl-map` to `TRANSPILE_SH` in `templates.py`.
…e `assert`s can be disabled

`assert`s can be disabled in Python at runtime,
so replace them with actual `if` checks raising errors.
This does so for `isinstance` checks.
@kkysen kkysen force-pushed the kkysen/postprocess-exclude branch from 64b51fd to 53c18d1 Compare January 14, 2026 21:03
Copy link
Contributor Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed 5946368 for now until #1545 is fixed so I can merge it now.

#1545 is fixed now by #1546, so nevermind, but 5946368 is included in #1546.

@kkysen kkysen merged commit b648f56 into master Jan 14, 2026
11 checks passed
@kkysen kkysen deleted the kkysen/postprocess-exclude branch January 14, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants