-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix digit escaping in OpenQASM 3 identifier #15305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The previous regex used `\w` as the match for an initial id character, and `[\w\d]` for continuations. Actually, `\w` matches any character that `str.isalnum` returns `True` plus the underscore, which includes all Unicode characters in the "digit" class, so is a superset of `\d`. The match `\d` is only the Unicode group `[Nd]` (decimal digits), which is not as complete as the entire `[N?]` set, so we can't even use variants of `[^\W\d_]` or the like to get something usable; we'd have to pull in the third-party `regex` package or the like to make an exact match. This instead causes the Qiskit exporter to allow only a subset of valid OQ3 identifiers on export in lieu of _complete_ matching. Additionally, we now do not accidentally include digits in the allowed set of initial characters for IDs (but do allow then in continuations).
|
One or more of the following people are relevant to this code:
|
Pull Request Test Coverage Report for Build 19320339788Details
💛 - Coveralls |
|
Would it be worth adding a test case for consecutive invalid characters? For example, if a register is named Just wanted to check if this edge case is worth covering, or if you think the existing tests are sufficient? |
|
Thanks Debasmita, that's a good idea. Done in 9745eab. |
46af300 to
240bf12
Compare
The previous regex used
\was the match for an initial id character, and[\w\d]for continuations. Actually,\wmatches any character thatstr.isalnumreturnsTrueplus the underscore, which includes all Unicode characters in the "digit" class, so is a superset of\d. The match\dis only the Unicode group[Nd](decimal digits), which is not as complete as the entire[N?]set, so we can't even use variants of[^\W\d_]or the like to get something usable; we'd have to pull in the third-partyregexpackage or the like to make an exact match.This instead causes the Qiskit exporter to allow only a subset of valid OQ3 identifiers on export in lieu of complete matching. Additionally, we now do not accidentally include digits in the allowed set of initial characters for IDs (but do allow then in continuations).
Summary
Details and comments
Fix #15303
Fix #15304