-
Notifications
You must be signed in to change notification settings - Fork 294
postprocess: add --exclude-file for skipping functions
#1541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
thedataking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should drop the functionality provided by --ident-filter. I would like to retain the ability to filter on the command line without having to provide a map file. I like to just translate a single function over and over for development; I'm too lazy to write a file for that.
Please make it such that the internal exclude list can be built from a file or a command line argument? I would just like to pass a flag that lets me restrict postprocessing to a given identifier (or, ideally, set of related identifiers which is why I used a regex).
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should drop the functionality provided by
--ident-filter. I would like to retain the ability to filter on the command line without having to provide a map file. I like to just translate a single function over and over for development; I'm too lazy to write a file for that.
Okay, I can add that back. Is it important that it's still a regex?
Please make it such that the internal exclude list can be built from a file or a command line argument? I would just like to pass a flag that lets me restrict postprocessing to a given identifier (or, ideally, set of related identifiers which is why I used a regex).
Is this also needed if there's also --ident-filter?
It is nice to be able to filter on a glob pattern since related C functions share a prefix or suffix. Not sure we need a full regex. I think you should consider supporting glob patterns in the exclude files however since many related C files follow a pattern.... I'm assuming it is easy to do and useful but I'm open to being told otherwise.
No, I didn't mean to specify implementation. I shouldn't have phrased it that way. I would like something similar to what we had and I wasn't sure if you'd just wanna drop the commit so we preserve the old functionality or merge it with the new somehow. Please do what you think makes sense and we can continue the discussion from there. |
5674ba2 to
dc319fd
Compare
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thedataking, I added back --ident-filter now. Seems like the simplest solution rather than trying to combine them.
|
@fw-immunant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall 🚀 Minor stuff:
exclude_list.py: usingassertfor data validation isn't encouraged. They can be disabled with the optimize flag. Moreover, incorrect input data aren't truly exceptional events. Can you raiseValueError,TypeError, orKeyErroras appropriate to communicate bad exclude file contents?postprocess-exclude.yml: How do we capture why an identifier is skipped? I know we have the postprocessor issue where some comments are guarded but there could also be other reasons to skip a function. Along the same lines, how do I capture that we should skip an identifier for transformXbut apply transformY? (I'm fine landing this without support for multiple transforms on the assumption that it is something we can easily add later.)- Could you document the exclude file syntax in the README? Doesn't have to be fancy or perfect; could be a simple example plus some limitations (can I skip a whole file, can't use globs to skip a group of identifiers.).
Again, looks great, most important is to not use asserts for data validation IMO. Also happy that this functionality surfaced problems with --emit-c-decl-map which we need to get fixed.
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exclude_list.py: usingassertfor data validation isn't encouraged. They can be disabled with the optimize flag. Moreover, incorrect input data aren't truly exceptional events. Can you raiseValueError,TypeError, orKeyErroras appropriate to communicate bad exclude file contents?
Oh, whoops. I didn't realize Python was like that and you could disable asserts. I'll fix that, but we also use asserts in a few other places that need to be checked, too.
postprocess-exclude.yml: How do we capture why an identifier is skipped? I know we have the postprocessor issue where some comments are guarded but there could also be other reasons to skip a function.
We don't do anything for why an identifier is skipped for now, but that is something we could probably add.
Along the same lines, how do I capture that we should skip an identifier for transform
Xbut apply transformY?
We might want to just have a different set of excludes for every transform. I'm not sure how interrelated they'll end up being, so we should probably wait until we have other transforms to decide how exactly we manage that. Shouldn't be too difficult whichever way we decide to handle that.
(I'm fine landing this without support for multiple transforms on the assumption that it is something we can easily add later.)
Yup, we should be able to add it later.
- Could you document the exclude file syntax in the README? Doesn't have to be fancy or perfect; could be a simple example plus some limitations (can I skip a whole file, can't use globs to skip a group of identifiers.).
👍
67be1e4 to
0638a26
Compare
|
@thedataking, I fixed the asserts introduced here, but not all of them in the codebase (we probably should, though, if we expect someone may turn off assertions, as they're used similarly elsewhere as well). I added the README explanations as well. I'll do the exclude file extensions later once we know more about what exactly we want for that. Should I try to exclude |
|
Everything you wrote here #1541 (comment) sounds good. Can you unpack:
Couldn't we wrap up this PR and then add exclude files for additional projects in a follow up PR? If yes, I'd like to do that. |
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything you wrote here #1541 (comment) sounds good. Can you unpack:
Should I try to exclude
--emit-c-decl-mapfor the projects it fails on, or wait until that's fixed first (simpler but would have to wait longer)?Couldn't we wrap up this PR and then add exclude files for additional projects in a follow up PR? If yes, I'd like to do that.
I added --emit-c-decl-map in templates.py in this PR so that after running the tests, you can run c2rust-postprocess manually on it. That's what's breaking in CI right now. I could
- Don't include
--emit-c-decl-mapin this PR, so to test yourself, you'd first have to add it there. - Add support to conditionally add
--emit-c-decl-mapand only enable it on the projects that work at the moment. - Fix the
--emit-c-decl-mapbug.
94af50b to
64b51fd
Compare
`--ident-filter` takes a single regex, while `--exclude-file` is a YAML file containing file paths and identifiers of functions to skip. This is more precise and scalable. We can use this as a general solution to skipping functions we can't handle yet so that we can scale up testing first. `--ident-filter` is still kept since it's easy to use one-off, while `--exclude-file` is better for more comprehensive tests.
Also, check-in the LLM cache for running `c2rust-postprocess` on `json-c`. This doesn't yet set up automated testing in CI, but we're now able to run `c2rust-postprocess` on `json-c` manually after running the existing `tests/integration` tests and adding `--emit-c-decl-map` to `TRANSPILE_SH` in `templates.py`.
…e `assert`s can be disabled `assert`s can be disabled in Python at runtime, so replace them with actual `if` checks raising errors. This does so for `isinstance` checks.
…ughly in the README
64b51fd to
53c18d1
Compare
kkysen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general-purpose solution to bypassing functions the postprocessor's comment transferer (and future transforms, too) can't handle yet, this adds an exclude list to skip functions in files. There is also
--ident-filter, which is a single regex just based on the function name, but this is more specific and also takes into account the file name. The file passed to--exclude-fileis a YAML file containing file paths (relative to the exclude file's directory) and the function names within them to skip. This gets us around issues like conditionally-compiled comments (#1534), the first function in a file also including license comments and#includes, and anything else. I've run this onjson-cso farand checked-in the LLM cache for it, but setting up the testing infrastructure to run this automatically is still a WIP (should be the next PR). Funnily enough, the LLM cache is going to end up behaving similarly to snapshot tests once it's up and running.