Skip to content

Feature/sem review#19

Open
antiartificial wants to merge 10 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review
Open

Feature/sem review#19
antiartificial wants to merge 10 commits intoAtaraxy-Labs:mainfrom
antiartificial:feature/sem-review

Conversation

@antiartificial
Copy link

@antiartificial antiartificial commented Mar 8, 2026

Adds a few new features:

sem review and sem log work with historical state whereas sem graph and sem impact work with current state. Signature change detection (new capability, powers the [signature changed] / [body only] labels in review and changelog).

  • sem log full history of any function (entity), following it through renames and moves across commits
  • sem changelog auto-categorized release notes from a commit range with a suggested semver bump
  • sem review groups changes by impact (API surface, internal, config) with dependency counts and a risk assessment

Other items in this PR:

  • This includes a slight fix to the version string handling, previously this was hardcoded and could become stale, now reflects what's in the Cargo.toml.
  • Test coverage for all three new modules (19 new tests)
  • README updated with documentation for the new commands

Example usage/output

sem review --from HEAD~2 --to HEAD # last two commits

  ┌─ Internal Changes ──────────────────────────────────
  │  ∆ function   getSupportedMimeType      [modified]
  │  ∆ variable   types                     [modified]
  └───────────────────────────────────────────────────────

  Summary: 2 internal
  Risk: low (internal/config changes only, no public API impact)

Example usages:
sem log --entity authenticateUser # finds all references to entity (ex. a function)
sem log --entity login --file src/auth.ts # constrains search to specific file

Example output:

>sem log --entity getSupportedMimeType                                                                                                              (base)

  prism-web/src/lib/audio.ts :: function :: getSupportedMimeType

  349f65f  2026-02-18  Aaron Barton  [modified]       Fix Brave misdetected as Safari: prefer webm over mp4 MIME type
  45b819f  2026-02-06  antiartifici  [added]          Add Prism: voice-driven Socratic interview simulator

Example usages:
sem changelog --from v1.2.0 --to HEAD # ex changes spanning a tag range (or commits)
sem changelog --from v1.0.0 --to v2.0.0 --format markdown

Example output:

sem changelog --from HEAD~1 --to HEAD                                                                                                              (base)

## Unreleased — 2026-03-08

### Internal
  · `getSupportedMimeType` — function modified in prism-web/src/lib/audio.ts
  · `types` — variable modified in prism-web/src/lib/audio.ts

Suggested version bump: PATCH (no new public API, no breaking changes)

  Groups entity-level changes into API surface (cross-file dependents),
  internal (body-only, deletions, no cross-file refs), and config/data
  categories. Includes approximate dependent counts from the entity graph
  and a risk heuristic (low/medium/high). Supports terminal and JSON output.
Classifies entity-level changes into Keep-a-Changelog categories
(Breaking, Added, Changed, Removed, Internal) with semver bump
suggestion. Includes Conventional Commits parsing from commit
messages and supports terminal, markdown, and JSON output formats.
Also adds get_log_range() to GitBridge for commit message access.
Re-parses before/after entity content with tree-sitter to extract
parameter lists and classify changes as breaking (params removed or
reordered) vs non-breaking (params added, body-only). Supports
TypeScript, JavaScript, Python, Rust, Go, Java, C, C++, and more.
Integrated into `sem changelog` for automatic breaking change detection.
Add README sections for entity history tracking, semantic review,
and changelog generation. Update language table with Swift, Elixir,
Bash, and Vue. Document sem diff file comparison mode.
19 new tests covering risk assessment, config classification,
entity resolution, changelog generation, and modification
classification. All 83 tests pass.
@antiartificial antiartificial marked this pull request as ready for review March 8, 2026 23:31
@rs545837
Copy link
Member

rs545837 commented Mar 9, 2026

Thanks I will take a look at this asap.

@rs545837
Copy link
Member

rs545837 commented Mar 9, 2026

Replied on the discussion with more context on how this relates to inspect and the broader roadmap. The sem log and sem changelog modules look good and fill real gaps in the CLI. sem review overlaps significantly with inspect which does entity-level triage with graph-based risk scoring and LLM integration.

Going to review the code in detail. Main things I want to check: how the signature detection interacts with sem-core's existing structural_hash (which already distinguishes cosmetic vs structural changes), and whether the log module's git history walking could live in sem-core as a library function so inspect and weave can use it too.

Palanikannan1437

This comment was marked as off-topic.

Copy link
Contributor

@Palanikannan1437 Palanikannan1437 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feature set — review, changelog, and log are solid additions. Left some inline comments. The two I'd want addressed before merge are the signature breaking classification (return type changes + language-aware param handling) and the entity name collision in log history tracking. The rest are suggestions for follow-ups.

Also we could discuss more on what should stay here vs in inspect as @rs545837 mentioned above

pub fn is_breaking(&self) -> bool {
matches!(
self,
SignatureChangeKind::ParamsRemoved { .. }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ReturnTypeChanged should be in here too — changing a return type will break every caller that depends on the old type. Right now it silently gets classified as non-breaking, which means changelog/semver will miss real breakage.

pub fn label(&self) -> &str {
match self {
SignatureChangeKind::BodyOnly => "body only",
SignatureChangeKind::ParamsAdded { .. } => "parameter added",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only non-breaking in languages with optional/default params (JS/TS/Python). In Rust, Go, Java, C — adding a required param is absolutely breaking. Since we don't actually check for defaults here (only compare names), I'd either treat this as breaking by default, or at least label it "signature changed" instead of claiming it's safe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great callout as I've been testing this feature with Go primarily.

pub fn get_log(&self, limit: usize) -> Result<Vec<CommitInfo>, GitError> {
/// Walk commits lazily from `to` (or HEAD) backward, stopping at `from`.
/// The callback returns `true` to continue or `false` to stop early.
pub fn for_each_commit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice API — the lazy callback with early-stop is a good design for entity log. One thing to consider for later: the callback receives CommitInfo with sha as a String, but callers like build_entity_log immediately re-resolve that SHA back into trees via resolve_tree. Passing richer data (or at least the Oid + parent) would avoid that round-trip.


// 3. Resolve both trees once per commit, then read blobs.
let after_tree = git.resolve_tree(&commit.sha).ok();
let before_tree = git.resolve_tree(&format!("{}~1", commit.sha)).ok();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If commit.sha is the very first (root) commit, sha~1 won't resolve and this will silently skip the "before" content. Might be worth handling that case explicitly so we don't miss the initial creation of an entity.


// 5. Find the change that matches our tracked entity.
for change in &diff.changes {
if change.entity_name != tracked_name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching by entity_name alone could misattribute history when different files have entities with the same name (e.g., login in auth.ts and login in admin.ts). Worth also checking change.file_path == tracked_file here.

}
};

let file_changes = if let Some(ref sha) = opts.commit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire if/else-if chain for resolving file changes is duplicated almost verbatim in changelog.rs (and partially in diff.rs). Would be nice to extract a small commands/common.rs helper like resolve_file_changes(git, commit, from, to, staged, file_exts) — keeps each command focused on its own logic.

@@ -0,0 +1,53 @@
/// Format a Unix timestamp (seconds since epoch) as YYYY-MM-DD.
pub fn format_unix_date(unix_seconds: i64) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: hand-rolled calendar math always makes me a bit nervous (leap year edges, negative timestamps, etc.). If you ever pull in time or chrono, this would be a good candidate to replace. Works fine as-is though.

format: String,

/// Follow renames (default: true)
#[arg(long, default_value_t = true)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: default_value_t = true means --follow is always on and there's no way to turn it off from CLI. Clap won't parse --follow false. You'd want either a --no-follow flag, or use #[arg(long, action = ArgAction::Set, default_value_t = true)].

fn assess_risk(
api_surface: &[ReviewChange],
internal: &[ReviewChange],
_config: &[ReviewChange],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious — _config is unused in the risk assessment. Feels like config changes with high impact (e.g., changing a database URL or feature flag) could warrant at least a mention in the risk reason. Not blocking, just something to think about.

@antiartificial
Copy link
Author

I'll have more time to digest and reflect this evening, thank you for your eyes and input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants