Skip to content

refactor(database): split LoadMetadataFor into orchestration + helpers#170

Open
dpage wants to merge 2 commits into
mainfrom
fix/issue-153-load-metadata-refactor
Open

refactor(database): split LoadMetadataFor into orchestration + helpers#170
dpage wants to merge 2 commits into
mainfrom
fix/issue-153-load-metadata-refactor

Conversation

@dpage

@dpage dpage commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

  • Extracts the metadata loader SQL into internal/database/load_metadata.sql (embedded with //go:embed) so it can be reviewed as SQL on its own.
  • Introduces scanMetadataRow and buildTableInfo in internal/database/metadata.go. buildTableInfo is pure (no I/O, no time, no shared state) and produces the metadata map, schema set, and column count from a slice of scanned rows.
  • Reduces LoadMetadataFor in internal/database/connection.go to orchestration: query, scan loop, build, atomic swap, log. The file shrinks from 687 to 491 lines.
  • Adds 10 table-driven unit tests for buildTableInfo covering the zero-column-table case (issue Metadata loader fails with NULL column_name when database contains a table with no columns #126 regression), grouping, multiple schemas, vector columns (with and without dimensions), and full column-attribute preservation. Tests do not require a database.

No behavior change. The existing live-DB regression test TestLoadMetadata_TableWithNoColumns (PR #144) still passes.

Test plan

  • make build
  • make test — all existing tests still pass; 10 new unit tests for buildTableInfo pass.
  • golangci-lint run ./internal/database/... — 0 issues. (make lint still reports 3 pre-existing staticcheck findings in internal/embedding/, untouched by this PR.)
  • gofmt -l internal/database/ — clean.
  • go vet ./internal/database/... — clean.
  • go test -race ./internal/database/... — passes.

Closes #153

Summary by CodeRabbit

  • Refactor

    • Improved internal architecture of metadata loading by modularizing query execution and data transformation logic into dedicated helper functions.
    • Extracted metadata query into a dedicated external SQL file for better maintainability.
  • Tests

    • Added comprehensive unit test suite for metadata building with table-driven tests covering edge cases, multi-schema scenarios, and vector column detection.
  • Documentation

    • Updated changelog to document refactoring effort (no behavior changes).

Review Change Stack

The CTE-based metadata query and its 70-line row-scan loop lived in
a single ~260-line function, making it hard to test the
transformation in isolation and hard to spot defects in the scan
loop (cf. #126 / PR #144).

This commit splits the function along three seams:

- `internal/database/load_metadata.sql` holds the SQL, embedded via
  `//go:embed`, so it can be diffed and reviewed as SQL.
- `scanMetadataRow` in `internal/database/metadata.go` takes one
  `pgx.Rows` and returns a typed `metadataRow`. Nullable columns
  use `sql.NullString` / `sql.NullInt32`, preserving the
  zero-column-table tolerance from #126.
- `buildTableInfo` in the same file is a pure function that takes
  `[]metadataRow` and produces the metadata map, schema set, and
  column count. It does no I/O and is covered by table-driven
  unit tests that do not need a database.

`LoadMetadataFor` now orchestrates: query, scan loop, build,
atomic swap, log. No behavior change.

Closes #153
@codacy-production

codacy-production Bot commented May 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 4 medium

Results:
4 new issues

Category Results
Compatibility 1 medium (1 false positive)
BestPractice 3 medium (3 false positives)

View in Codacy

🟢 Metrics 51 complexity · 0 duplication

Metric Results
Complexity 51
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 10296a00-17d9-4c21-8312-7c28ea35ff5f

📥 Commits

Reviewing files that changed from the base of the PR and between ecfd640 and 400d938.

📒 Files selected for processing (2)
  • internal/database/metadata.go
  • internal/database/metadata_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/database/metadata_test.go
  • internal/database/metadata.go

Walkthrough

This PR refactors Client.LoadMetadataFor by embedding the metadata SQL, adding scanMetadataRow to produce typed rows, extracting a pure buildTableInfo transformer that groups rows into TableInfo per schema.table, and adding table-driven tests for buildTableInfo; LoadMetadataFor now orchestrates these parts.

Changes

Metadata Loader Refactor

Layer / File(s) Summary
SQL query and row capture
internal/database/load_metadata.sql, internal/database/metadata.go
Embed multi-CTE load_metadata.sql; add metadataRow and scanMetadataRow to capture query results in SELECT-list order.
Pure metadata transformation (buildTableInfo)
internal/database/metadata.go
Add vectorColumnInfo, columnInfoFromRow, tableInfoFromRow, and buildTableInfo to group rows by schema.table, detect/parse pgvector dims, skip empty column names, and return metadata map, schema set, and column count.
LoadMetadataFor integration and imports cleanup
internal/database/connection.go
Refactor LoadMetadataFor to run loadMetadataSQL, accumulate metadataRow via scanMetadataRow, call buildTableInfo, atomically swap conn.Metadata, set MetadataLoadedAt, and log; remove now-unused stdlib imports.
Table-driven tests for buildTableInfo
internal/database/metadata_test.go
Add tests covering empty input, table-with-no-columns, grouping, multi-schema, pgvector detection (with/without dimensions), non-vector types, table-level attributes selection, full column attribute preservation, and empty-column-name behavior; includes helpers.
Changelog entry
docs/changelog.md
Document the refactor: embedded SQL, split scan/build, added tests for pure transform; state no behavior change and reference #153.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs:

Suggested reviewers:

  • rshoemaker
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: refactoring LoadMetadataFor by splitting it into orchestration and helper functions.
Linked Issues check ✅ Passed All objectives from issue #153 are met: SQL extracted to load_metadata.sql, scanMetadataRow introduced, pure buildTableInfo function created, LoadMetadataFor reduced to orchestration, and comprehensive table-driven tests added.
Out of Scope Changes check ✅ Passed All changes are directly aligned with #153's decomposition objectives; no unrelated modifications or scope creep detected across the five modified files.
Docstring Coverage ✅ Passed Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/issue-153-load-metadata-refactor

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/database/load_metadata.sql`:
- Around line 73-88: The fk_columns CTE can emit multiple rows per source column
when a column participates in multiple foreign-key pairs, causing duplicate
ColumnInfo downstream; update the fk_columns CTE to collapse those duplicates by
grouping on schema_name, table_name, column_name and either aggregating
fk_reference (e.g., string_agg(DISTINCT fk_reference, ',') AS fk_reference) or
using DISTINCT ON (n.nspname, c.relname, a.attname) to pick one reference, so
the later join by (schema, table, column) returns a single row per source
column.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9330ba79-5c4a-4f1f-8319-402bc2c715ee

📥 Commits

Reviewing files that changed from the base of the PR and between 26cbb2e and ecfd640.

📒 Files selected for processing (5)
  • docs/changelog.md
  • internal/database/connection.go
  • internal/database/load_metadata.sql
  • internal/database/metadata.go
  • internal/database/metadata_test.go

Comment thread internal/database/load_metadata.sql
…exity

Codacy's Lizard flagged buildTableInfo at cyclomatic complexity 9
(limit 8) and TestBuildTableInfo_PreservesAllColumnAttributes at 10
(limit 8). Split the vector-detection branch and the ColumnInfo and
TableInfo construction into their own helper functions
(vectorColumnInfo, columnInfoFromRow, tableInfoFromRow), bringing
buildTableInfo's body down to a single short loop. Consolidate the
test to a single struct comparison.

No behavior change. All existing tests still pass.
@dpage

dpage commented May 27, 2026

Copy link
Copy Markdown
Member Author

Codacy findings — summary of what I'm addressing

Codacy's PR check is green ("Up to standards"), but the report lists 36 new findings. Here's how each category breaks down:

Addressed in code

  • Complexity (2) — Lizard flagged buildTableInfo at cc=9 and TestBuildTableInfo_PreservesAllColumnAttributes at cc=10 (limit 8). Both are my own code from this PR. Fixed in 400d938 by extracting vectorColumnInfo / columnInfoFromRow / tableInfoFromRow helpers and collapsing the test's per-field assertions into one struct comparison.

Not addressed — pre-existing SQL, out of scope per issue #153

The remaining 34 findings all sit on internal/database/load_metadata.sql. Codacy sees that file as "new" because it didn't exist on main, but its contents are moved verbatim from the inline query string that lived in LoadMetadataForgit log -p confirms the SQL itself didn't change. Issue #153 explicitly scopes these out: "Out of scope: Changing the SQL itself or the metadata schema."

  • TSQLLint (5 BestPractice / Compatibility)SET QUOTED_IDENTIFIER, SET ANSI_NULLS, SET NOCOUNT, SET TRANSACTION ISOLATION LEVEL. All T-SQL (SQL Server) directives applied to PostgreSQL; Codacy itself flags all five as false positives at 95% probability.
  • SQLFluff RF02 (1 ErrorProne) — "Unqualified reference inhrelid". This is inside a subquery on pg_inherits where the reference is unambiguous; Codacy flags it as a false positive at 80%.
  • SQLFluff CodeStyle (28) — table aliasing without AS, layout/indent rules, etc. Stylistic; the SQL was reviewed and merged as-is, and rewriting it would conflict with #153's explicit out-of-scope rule.

If we want to address the SQL-style findings now that the query is in its own file (which would be more justifiable than it was when it was a Go string literal), happy to do that in a follow-up PR — let me know.

CodeRabbit FK-duplicate-rows thread

Left unresolved with a reply — same reasoning: the fk_columns CTE didn't change here and the deeper data-model question deserves a separate issue rather than a MIN() patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: LoadMetadataFor is large and hard to test in isolation

1 participant