Skip to content

feat: Refine JS/TS binding entity classification for const/function/generator#21

Open
MattCozendey wants to merge 1 commit intoAtaraxy-Labs:mainfrom
MattCozendey:feat/generators-and-constants
Open

feat: Refine JS/TS binding entity classification for const/function/generator#21
MattCozendey wants to merge 1 commit intoAtaraxy-Labs:mainfrom
MattCozendey:feat/generators-and-constants

Conversation

@MattCozendey
Copy link
Contributor

JS/TS binding declarations were not being classified with the right level of specificity.

Before this change:

  • plain const bindings in JS/TS were flattened to variable
  • function-valued bindings were only partially refined, and the behavior was not consistent across the TypeScript parser and the Rust core
  • generator-valued bindings were not surfaced as a distinct entity type
  • review risk detection only treated function and method deletions as function-like removals

After this change:

  • const foo = "abc" is classified as constant
  • let foo = "abc" and var foo = "abc" stay variable
  • const foo = function () {} is classified as function
  • const foo = () => {} is classified as function
  • let foo = function* () {} is classified as generator
  • function-like classification wins over declaration kind, so const does not mask a more specific function/generator entity type
  • deleted generator entities are included in review risk signals the same way deleted functions already are

Implementation notes:

  • added a small declaration classifier in the TS parser that first checks whether a binding's assigned value is function-like and only falls back to declaration kind (const vs let/var) when it is not
  • mirrored the same logic in sem-core so the Rust path and TS path stay behaviorally aligned
  • kept local non-function bindings inside function bodies suppressed, so this does not widen entity extraction noise
  • added focused parser tests for both .ts and .js inputs covering constant, variable, function expression, arrow function, and generator cases

Repro before:

  1. Create a file such as bindings.ts with:

    const constantValue = "abc"; let mutableValue = "def"; var legacyValue = "ghi";

    const functionExpr = function () { return 1; }; const arrowFn = () => 2; let generatorExpr = function* () { yield 3; };

  2. Run the parser / semantic diff flow against that file.

  3. Observe that:

    • constantValue is reported as variable instead of constant
    • function-valued declarations are not consistently classified between parser implementations
    • generator-valued declarations are not surfaced as generator

Expected behavior after this change:

  • constantValue => constant
  • mutableValue => variable
  • legacyValue => variable
  • functionExpr => function
  • arrowFn => function
  • generatorExpr => generator

Verification:

  • added Vitest coverage for JS/TS binding classification
  • added sem-core unit coverage for the same cases
  • ran cargo test -p sem-core

JS/TS binding declarations were not being classified with the right level
of specificity.

Before this change:
- plain `const` bindings in JS/TS were flattened to `variable`
- function-valued bindings were only partially refined, and the behavior
  was not consistent across the TypeScript parser and the Rust core
- generator-valued bindings were not surfaced as a distinct entity type
- review risk detection only treated `function` and `method` deletions as
  function-like removals

After this change:
- `const foo = "abc"` is classified as `constant`
- `let foo = "abc"` and `var foo = "abc"` stay `variable`
- `const foo = function () {}` is classified as `function`
- `const foo = () => {}` is classified as `function`
- `let foo = function* () {}` is classified as `generator`
- function-like classification wins over declaration kind, so `const`
  does not mask a more specific function/generator entity type
- deleted `generator` entities are included in review risk signals the
  same way deleted functions already are

Implementation notes:
- added a small declaration classifier in the TS parser that first checks
  whether a binding's assigned value is function-like and only falls back
  to declaration kind (`const` vs `let`/`var`) when it is not
- mirrored the same logic in `sem-core` so the Rust path and TS path stay
  behaviorally aligned
- kept local non-function bindings inside function bodies suppressed, so
  this does not widen entity extraction noise
- added focused parser tests for both `.ts` and `.js` inputs covering
  constant, variable, function expression, arrow function, and generator
  cases

Repro before:
1. Create a file such as `bindings.ts` with:

   const constantValue = "abc";
   let mutableValue = "def";
   var legacyValue = "ghi";

   const functionExpr = function () { return 1; };
   const arrowFn = () => 2;
   let generatorExpr = function* () { yield 3; };

2. Run the parser / semantic diff flow against that file.
3. Observe that:
   - `constantValue` is reported as `variable` instead of `constant`
   - function-valued declarations are not consistently classified between
     parser implementations
   - generator-valued declarations are not surfaced as `generator`

Expected behavior after this change:
- `constantValue` => `constant`
- `mutableValue` => `variable`
- `legacyValue` => `variable`
- `functionExpr` => `function`
- `arrowFn` => `function`
- `generatorExpr` => `generator`

Verification:
- added Vitest coverage for JS/TS binding classification
- added `sem-core` unit coverage for the same cases
- ran `cargo test -p sem-core`
@MattCozendey MattCozendey force-pushed the feat/generators-and-constants branch from d462d0d to 5ddbb19 Compare March 9, 2026 12:19
@rs545837
Copy link
Member

rs545837 commented Mar 9, 2026

Thanks @Palanikannan1437 will check the JS/TS scenario and will revert back.

Copy link
Contributor

@Palanikannan1437 Palanikannan1437 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @MattCozendey for the PR, here's some comments/questions/nitpicks....

if ((node.type === 'lexical_declaration' || node.type === 'variable_declaration') && isFunctionLikeDeclaration(node)) {
return 'function';
if (node.type === 'lexical_declaration') {
return getFunctionLikeDeclarationType(node) ?? (/^\s*const\b/.test(node.text) ? 'constant' : 'variable');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw buildEntityId() bakes entityType into the ID, and the fuzzy matcher in identity.ts:109 requires same type too — so anything that flips from variable to constant/function/generator will show up as delete+add instead of modified. probably fine but worth knowing

}
}

fn get_function_like_declaration_type<'a>(node: Node, source: &'a [u8]) -> Option<&'a str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with const a = 1, f = () => {};? this returns on the first matching declarator so the classification depends on ordering. pre-existing thing since we only emit one entity per declaration anyway but maybe worth a test to pin the behavior

expect(appEntity?.entityType).toBe('function');
});

it('classifies bindings by declaration kind and assigned function shape in JS and TS', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add async variants too? const f = async () => {} and const g = async function*() {} — just to make sure the stripping works e2e


const deletedFunctions = result.changes.filter(
c => c.changeType === 'deleted' && (c.entityType === 'function' || c.entityType === 'method')
c => c.changeType === 'deleted' && (c.entityType === 'function' || c.entityType === 'method' || c.entityType === 'generator')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up — anyone with .semrc rules matching entityType: variable will quietly stop matching const declarations now. maybe a release note thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants