feat: Refine JS/TS binding entity classification for const/function/generator#21
feat: Refine JS/TS binding entity classification for const/function/generator#21MattCozendey wants to merge 1 commit intoAtaraxy-Labs:mainfrom
Conversation
JS/TS binding declarations were not being classified with the right level
of specificity.
Before this change:
- plain `const` bindings in JS/TS were flattened to `variable`
- function-valued bindings were only partially refined, and the behavior
was not consistent across the TypeScript parser and the Rust core
- generator-valued bindings were not surfaced as a distinct entity type
- review risk detection only treated `function` and `method` deletions as
function-like removals
After this change:
- `const foo = "abc"` is classified as `constant`
- `let foo = "abc"` and `var foo = "abc"` stay `variable`
- `const foo = function () {}` is classified as `function`
- `const foo = () => {}` is classified as `function`
- `let foo = function* () {}` is classified as `generator`
- function-like classification wins over declaration kind, so `const`
does not mask a more specific function/generator entity type
- deleted `generator` entities are included in review risk signals the
same way deleted functions already are
Implementation notes:
- added a small declaration classifier in the TS parser that first checks
whether a binding's assigned value is function-like and only falls back
to declaration kind (`const` vs `let`/`var`) when it is not
- mirrored the same logic in `sem-core` so the Rust path and TS path stay
behaviorally aligned
- kept local non-function bindings inside function bodies suppressed, so
this does not widen entity extraction noise
- added focused parser tests for both `.ts` and `.js` inputs covering
constant, variable, function expression, arrow function, and generator
cases
Repro before:
1. Create a file such as `bindings.ts` with:
const constantValue = "abc";
let mutableValue = "def";
var legacyValue = "ghi";
const functionExpr = function () { return 1; };
const arrowFn = () => 2;
let generatorExpr = function* () { yield 3; };
2. Run the parser / semantic diff flow against that file.
3. Observe that:
- `constantValue` is reported as `variable` instead of `constant`
- function-valued declarations are not consistently classified between
parser implementations
- generator-valued declarations are not surfaced as `generator`
Expected behavior after this change:
- `constantValue` => `constant`
- `mutableValue` => `variable`
- `legacyValue` => `variable`
- `functionExpr` => `function`
- `arrowFn` => `function`
- `generatorExpr` => `generator`
Verification:
- added Vitest coverage for JS/TS binding classification
- added `sem-core` unit coverage for the same cases
- ran `cargo test -p sem-core`
d462d0d to
5ddbb19
Compare
|
Thanks @Palanikannan1437 will check the JS/TS scenario and will revert back. |
There was a problem hiding this comment.
Thanks so much @MattCozendey for the PR, here's some comments/questions/nitpicks....
| if ((node.type === 'lexical_declaration' || node.type === 'variable_declaration') && isFunctionLikeDeclaration(node)) { | ||
| return 'function'; | ||
| if (node.type === 'lexical_declaration') { | ||
| return getFunctionLikeDeclarationType(node) ?? (/^\s*const\b/.test(node.text) ? 'constant' : 'variable'); |
There was a problem hiding this comment.
btw buildEntityId() bakes entityType into the ID, and the fuzzy matcher in identity.ts:109 requires same type too — so anything that flips from variable to constant/function/generator will show up as delete+add instead of modified. probably fine but worth knowing
| } | ||
| } | ||
|
|
||
| fn get_function_like_declaration_type<'a>(node: Node, source: &'a [u8]) -> Option<&'a str> { |
There was a problem hiding this comment.
what happens with const a = 1, f = () => {};? this returns on the first matching declarator so the classification depends on ordering. pre-existing thing since we only emit one entity per declaration anyway but maybe worth a test to pin the behavior
| expect(appEntity?.entityType).toBe('function'); | ||
| }); | ||
|
|
||
| it('classifies bindings by declaration kind and assigned function shape in JS and TS', () => { |
There was a problem hiding this comment.
could we add async variants too? const f = async () => {} and const g = async function*() {} — just to make sure the stripping works e2e
|
|
||
| const deletedFunctions = result.changes.filter( | ||
| c => c.changeType === 'deleted' && (c.entityType === 'function' || c.entityType === 'method') | ||
| c => c.changeType === 'deleted' && (c.entityType === 'function' || c.entityType === 'method' || c.entityType === 'generator') |
There was a problem hiding this comment.
heads up — anyone with .semrc rules matching entityType: variable will quietly stop matching const declarations now. maybe a release note thing
JS/TS binding declarations were not being classified with the right level of specificity.
Before this change:
constbindings in JS/TS were flattened tovariablefunctionandmethoddeletions as function-like removalsAfter this change:
const foo = "abc"is classified asconstantlet foo = "abc"andvar foo = "abc"stayvariableconst foo = function () {}is classified asfunctionconst foo = () => {}is classified asfunctionlet foo = function* () {}is classified asgeneratorconstdoes not mask a more specific function/generator entity typegeneratorentities are included in review risk signals the same way deleted functions already areImplementation notes:
constvslet/var) when it is notsem-coreso the Rust path and TS path stay behaviorally aligned.tsand.jsinputs covering constant, variable, function expression, arrow function, and generator casesRepro before:
Create a file such as
bindings.tswith:const constantValue = "abc"; let mutableValue = "def"; var legacyValue = "ghi";
const functionExpr = function () { return 1; }; const arrowFn = () => 2; let generatorExpr = function* () { yield 3; };
Run the parser / semantic diff flow against that file.
Observe that:
constantValueis reported asvariableinstead ofconstantgeneratorExpected behavior after this change:
constantValue=>constantmutableValue=>variablelegacyValue=>variablefunctionExpr=>functionarrowFn=>functiongeneratorExpr=>generatorVerification:
sem-coreunit coverage for the same casescargo test -p sem-core