Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions SCENARIOS-oracle-completion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Oracle Production Completion Scenarios

Goal: Oracle completion is production-ready for Bytebase when omni exposes parser-native completion signals that let Bytebase replace its current ANTLR/C3 `backend/plugin/parser/plsql/completion.go` implementation without losing SQL editor behavior.

Status legend: `[ ]` pending, `[x]` passing, `[~]` partial/deferred.

## Phase 0: Public Parser Completion API

- [x] Empty input at cursor 0 returns statement starter candidates.
- [x] Whitespace-only input returns statement starter candidates.
- [x] Cursor after semicolon returns statement starter candidates for a new statement.
- [x] Cursor inside a partially typed keyword backs up to token start and prefix-filters candidates.
- [x] `Tokenize` returns non-EOF tokens with stable byte `Loc` and `End`.
- [x] `TokenName` returns uppercase Oracle keyword text for keyword token types.
- [x] `Collect` never panics on incomplete SQL.
- [x] `CollectCompletion` returns candidates, prefix, scope, CTEs, and object intent.

## Phase 1: Bytebase SELECT Object Completion

- [x] `SELECT * FROM |` emits table-reference intent.
- [x] `SELECT * FROM schema.|` emits table-reference intent qualified by schema.
- [x] `SELECT | FROM t` emits column-reference intent and table scope for `t`.
- [x] `SELECT DISTINCT | FROM t` emits expression/column-reference intent.
- [x] `SELECT a, | FROM t` emits expression/column-reference intent after target-list comma.
- [x] `SELECT t.| FROM t` emits column-reference intent qualified by table `t`.
- [x] `SELECT alias.| FROM t alias` resolves `alias` as a visible range reference.
- [x] `SELECT * FROM t, |` emits table-reference intent after comma-separated table sources.
- [x] `SELECT * FROM t JOIN |` emits table-reference intent.
- [x] `SELECT * FROM t LEFT/RIGHT/FULL OUTER/CROSS/NATURAL JOIN |` emits table-reference intent.
- [x] `SELECT * FROM t JOIN u ON |` emits column-reference intent with both tables visible.
- [x] `SELECT * FROM t JOIN u USING (|` emits column-reference intent with both tables visible.
- [x] `SELECT * FROM t WHERE |` emits column-reference intent.
- [x] `SELECT * FROM t WHERE a = 1 AND/OR |` emits expression/column-reference intent.
- [x] `SELECT * FROM t WHERE a + | > 0` emits expression/column-reference intent after operators.
- [x] `SELECT * FROM t WHERE a IN (|` emits expression/column-reference intent.
- [x] `SELECT * FROM t WHERE a BETWEEN | AND ...` and `BETWEEN ... AND |` emit expression/column-reference intent.
- [x] `SELECT * FROM t WHERE EXISTS (|` suggests `SELECT`.
- [x] `SELECT c FROM t GROUP BY |` emits column-reference intent.
- [x] `SELECT c FROM t GROUP BY c, |` emits column-reference intent.
- [x] `SELECT c FROM t GROUP BY c HAVING |` emits expression/column-reference intent.
- [x] `SELECT c AS alias FROM t ORDER BY |` emits column-reference intent and Bytebase adapter returns select aliases.
- [x] `SELECT c FROM t ORDER BY c, |` emits column-reference intent.
- [x] `SELECT * FROM t |` suggests clause starters such as `WHERE`, `JOIN`, `GROUP`, and `ORDER`.
- [x] `SELECT * FROM (SELECT c FROM t) x WHERE x.|` exposes virtual table `x`.
- [x] `SELECT | FROM (SELECT c FROM t) x` keeps nested source tables out of local scope.
- [x] `SELECT * FROM t WHERE EXISTS (SELECT | FROM u WHERE u.id = t.id)` records local and outer scope levels separately.
- [x] `SELECT c FROM t UNION SELECT | FROM u` scopes completion to the current set-operation arm.
- [x] `WITH x AS (SELECT * FROM t) SELECT * FROM |` exposes CTE `x` as table-reference candidate.
- [x] `WITH x(c1, c2) AS (SELECT * FROM t) SELECT x.| FROM x` exposes explicit CTE columns.

## Phase 2: DML Completion

- [x] `INSERT INTO |` emits table-reference intent.
- [x] `INSERT INTO t (|)` emits column-reference intent scoped to table `t`.
- [x] `INSERT INTO t (c1, |)` emits column-reference intent after insert column-list comma.
- [x] `INSERT INTO t VALUES (|)` emits expression/column-reference context.
- [x] `INSERT INTO t VALUES (1, |)` emits expression/column-reference context after values-list comma.
- [x] `INSERT INTO t SELECT | FROM u` emits column-reference intent scoped to `u`.
- [x] `WITH x AS (...) INSERT INTO t SELECT | FROM x` exposes the CTE source.
- [x] `UPDATE | SET c = 1` emits table-reference intent.
- [x] `UPDATE t SET |` emits column-reference intent scoped to table `t`.
- [x] `UPDATE t SET c = |` emits expression/column-reference context.
- [x] `UPDATE t SET c = 1, |` emits column-reference intent after assignment comma.
- [x] `UPDATE t SET c = 1 WHERE |` emits column-reference intent scoped to table `t`.
- [x] `DELETE FROM |` emits table-reference intent.
- [x] `DELETE FROM t WHERE |` emits column-reference intent scoped to table `t`.
- [x] `MERGE INTO |` emits table-reference intent.
- [x] `MERGE INTO t USING |` emits table-reference intent for source.
- [x] `MERGE INTO t USING u ON |` emits column-reference intent with target and source visible.
- [x] `MERGE ... WHEN MATCHED THEN UPDATE SET c = |` emits expression/column-reference intent with target and source visible.
- [x] `MERGE ... WHEN NOT MATCHED THEN INSERT (|)` emits column-reference intent with target and source visible.

## Phase 3: DDL And Utility Completion

- [x] `CREATE |` suggests supported Oracle object-type keywords.
- [x] `CREATE TABLE t (c |)` emits datatype candidates.
- [x] `CREATE TABLE t (c NUMBER, |)` suggests column/constraint starters.
- [x] `CREATE TABLE t (c NUMBER |)` suggests column options such as `NOT`, `DEFAULT`, `PRIMARY`, `UNIQUE`, `REFERENCES`, and `CHECK`.
- [x] `CREATE TABLE t (c NUMBER REFERENCES |)` emits table-reference intent.
- [x] `CREATE TABLE t (... PRIMARY KEY (|))` emits column-reference intent scoped to the new table.
- [x] `CREATE TABLE t (... FOREIGN KEY (|) REFERENCES u(c))` emits column-reference intent scoped to the new table.
- [x] `CREATE TABLE t (... REFERENCES u(|))` emits column-reference intent scoped to referenced table `u`.
- [x] `CREATE INDEX idx ON |` emits table-reference intent.
- [x] `CREATE INDEX idx ON t (|)` emits column-reference intent scoped to table `t`.
- [x] `ALTER TABLE |` emits table-reference intent.
- [x] `ALTER TABLE t ADD |` suggests column/constraint action keywords.
- [x] `ALTER TABLE t MODIFY |` emits column-reference intent scoped to table `t`.
- [x] `ALTER TABLE t DROP COLUMN |` emits column-reference intent scoped to table `t`.
- [x] `ALTER TABLE t DROP CONSTRAINT |` emits constraint-reference intent scoped to table `t`.
- [x] `ALTER SEQUENCE |` emits sequence-reference intent.
- [x] `ALTER VIEW |` emits view-reference intent.
- [x] `ALTER PROCEDURE |` emits procedure-reference intent.
- [x] `DROP TABLE |` emits table-reference intent.
- [x] `DROP VIEW |` emits view/table-reference intent.
- [x] `DROP SEQUENCE |` emits sequence-reference intent.
- [x] `DROP INDEX |` emits index-reference intent.
- [x] `DROP SYNONYM |` emits synonym-reference intent.
- [x] `DROP TRIGGER |` emits trigger-reference intent.
- [x] `TRUNCATE TABLE |` emits table-reference intent.
- [x] `COMMENT ON TABLE |` emits table-reference intent.
- [x] `COMMENT ON COLUMN t.|` emits column-reference intent scoped to table `t`.
- [x] `GRANT SELECT ON |` emits table-reference intent.
- [x] `REVOKE SELECT ON |` emits table-reference intent.

## Phase 4: Oracle-Specific Production Hardening

- [x] Quoted identifier prefix completion keeps the user's quoting mode.
- [x] Reserved keywords are not suggested as unquoted identifiers where parser rejects them.
- [x] Type keywords come from the Oracle keyword manifest.
- [x] Function-like keyword candidates appear only in call-capable expression positions.
- [x] Pseudo-column candidates appear in expression positions.
- [x] `seq.|` emits sequence-member intent for `NEXTVAL`/`CURRVAL`-style completion.
- [x] `pkg.|` in SQL and PL/SQL blocks emits package-member intent.
- [x] `table@|` emits database-link intent.
- [x] `SELECT c INTO | FROM t` emits PL/SQL variable-reference intent.
- [x] `DECLARE v |;` emits datatype candidates for PL/SQL declarations.
- [x] `BEGIN | END;` suggests PL/SQL statement starters.
- [x] Case-sensitive metadata names can be quoted by the Bytebase adapter.
- [x] Multi-statement scripts isolate completion to the cursor statement.
- [x] Malformed earlier statements do not prevent completion in the current statement.
- [x] Large scripts avoid whole-file expensive parsing in completion hot path.
- [x] Oracle parser soft-fail, strictness, keyword, Loc, and corpus gates remain green.

## Phase 5: Bytebase Adapter Cutover

- [x] Bytebase Oracle completion calls omni Oracle completion APIs.
- [x] Bytebase Oracle completion no longer imports ANTLR or `github.com/bytebase/parser/plsql`.
- [x] Existing Bytebase `backend/plugin/parser/plsql/test-data/test_completion.yaml` passes unchanged or with only intended ordering updates.
- [x] Bytebase Oracle LSP completion continues returning `base.Candidate` schema/table/view/sequence/column/function/keyword values.
- [x] Bytebase Oracle completion preserves schema-as-database metadata behavior.
7 changes: 4 additions & 3 deletions docs/PARSER-DEFENSE-MATRIX.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ Every SQL parser in omni needs a systematic suite of defensive tests to ensure c
PG Done -- -- Partial -- Done Partial
MySQL Partial* Partial* Done Done Done Done Done
MSSQL Done Partial Done -- -- Done* Done
Oracle Done Done Done Partial* Done -- Done
Oracle Done Done Done Partial* Done Partial* Done
CosmosDB -- -- N/A -- -- -- --
MongoDB Partial -- N/A -- -- -- --
```
Expand All @@ -135,6 +135,7 @@ Legend: `Done` = complete, `Partial` = in progress, `--` = not started, `N/A` =
- MSSQL L2: 14 oracle mismatches remaining (12 option validation + 2 multi-statement). Option validation is the L2 Strict core work.
- MSSQL L6: Core instrumentation complete (9 phases, 3659 tests), but 4 secondary CREATE statements uninstrumented + catalog resolution stubbed
- Oracle L4: Parser coverage accounting is strict but not full grammar support. 171 BNF rows are classified with 0 unknown rows, high-value statement families have no missing/unknown rows, and every non-covered BNF row carries explicit approval/debt metadata. 47 deferred and 86 partial rows remain approved feature debt.
- Oracle L6: Parser-native completion API, SELECT/CTE/subquery/DML/DDL rule signals, keyword-driven candidates, Bytebase adapter cutover, DDL object-kind filtering, and quoted/reserved/case-sensitive metadata hardening are implemented for the tracked production scenario set.

### PG
- **L1 Soft-Fail**: Dual-return migration and soft-fail fixes complete
Expand Down Expand Up @@ -180,7 +181,7 @@ Legend: `Done` = complete, `Partial` = in progress, `--` = not started, `N/A` =
- **L3 Keyword**: **Done for declared Oracle lexer set plus SQL reserved-word audit.** `TestOracleKeyword*` covers all 344 entries in the local Oracle lexer keyword table across reserved, nonreserved, context, type, function-like, pseudo-column, clause-starter, quoted identifier, keyword-as-expression, and reserved-identifier guard behavior. `TestOracleKeywordOfficialSQLReserved26aiAudit` pins the Oracle 26ai SQL reserved-word list and prevents official SQL reserved words from being missing or lexed as plain identifiers. `TestOracleVReservedWordsKeywordAudit` passed against Oracle Free and checked 107 word-like reserved/context entries.
- **L4 Coverage**: **Partial for full grammar support, strict for accounting.** `TestOracleBNFCoverageManifestCompleteness` classifies all 171 BNF files with 0 unknown rows, `TestOracleHighValueBNFGapsClosed` keeps high-value statement families free of missing/unknown rows, `TestOracleBNFImplementationDebtRequiresApproval` requires approval metadata on every non-covered row, and `TestOracleCoverage` enforces the soft-fail, strictness, keyword, BNF, Loc-node, and reference-oracle gates. The explicit Oracle Free reference run passed all 20 rows.
- **L5 Corpus**: **Done for current corpus.** `TestVerifyCorpus` reports 128 total statements, 125 parser accepts, 3 expected parser rejects, 0 parse violations, 0 Loc violations, and 0 crashes; Loc violations are fatal.
- **L6 Completion**: Not implemented. Parser readiness gates are in place; completion scope is tracked in `docs/plans/2026-04-28-oracle-completion-scope.md`.
- **L6 Completion**: **Done for the tracked Bytebase production scenario set.** `Collect`, `CollectCompletion`, token/rule candidates, visible scope, CTE/subquery references, DML/DDL/utility intents, datatype/function/pseudo-column keyword candidates, and Bytebase metadata-backed adapter integration are implemented. Bytebase Oracle completion no longer imports ANTLR/C3, existing YAML completion tests pass unchanged, DDL object-kind filtering is covered, and quoted/reserved/case-sensitive metadata behavior is tested.
- **L7 Loc**: **Done for parser layer, with fixture debt tracked.** `NoLoc()`/`Loc.IsUnknown()` enforce `{-1,-1}` as the only unknown sentinel, mixed sentinels are rejected, synthetic/corpus Loc contracts pass, and `TestOracleLocNodeCoverage` classifies 249 Loc-bearing node rows with 0 unknown rows. Direct SQL fixture coverage is 152 rows; the remaining 97 deferred rows carry approval metadata.

### CosmosDB / MongoDB
Expand Down Expand Up @@ -234,7 +235,7 @@ Ordered by dependency chain. Within each tier, sorted by impact.
|--------|-------|-----------|
| MSSQL | L2 Strict | MySQL already found extensive "too lenient" issues; MSSQL very likely has the same (blocked by L1 cleanup) |
| PG | L2 Strict | Largest engine with no systematic strictness testing |
| Oracle | L6 Completion | Only major engine without a completion engine |
| Oracle | L6 Completion expansion | Add future Oracle editor scenarios beyond the current Bytebase production set as product needs surface |

### Tier 3: Verification (Global Regression Safety Net)

Expand Down
6 changes: 4 additions & 2 deletions docs/engine-capability-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,11 +186,13 @@ This is the most complex layer, comprising four subsystems.
- CandidateSet collection: keywords, table names, column names, function names, schema names, type names
- Table reference extraction: extract visible tables from FROM/JOIN clauses in the current query
- Tricky completion fallback: when the parser can't determine context due to truncation, fall back to heuristic completion
- Bytebase integration boundary: omni emits parser-native token/rule/scope/intent signals; Bytebase resolves live metadata into product `Candidate` values.

**Lessons from PG**:
- Completion has hard dependencies on L2 (Loc End) and L3 (Soft-fail). **If either is incomplete, Completion cannot function**
- Recursive CTE self-references, materialized view context filtering, and similar scenarios are very difficult — these can be marked as partial
- Integration tests verifying real completion behavior across multi-table schemas are essential
- Oracle showed that replacing ANTLR/C3 safely needs both parser-native intent/scope tests in omni and unchanged Bytebase YAML completion tests against real metadata adapter behavior.

### L7 Bytebase Integration

Expand Down Expand Up @@ -331,8 +333,8 @@ PROGRESS_SUMMARY.json # Progress summary
| L3 Keyword | ✅ Parser keyword matrix complete | 344-row exhaustive manifest plus Oracle 26ai SQL reserved-word audit |
| L4 Strict/Soft-fail/Coverage | ⚠️ Strict accounting, partial grammar support | 62 soft-fail, 121 strictness, 171 BNF rows classified, high-value BNF gaps closed, non-covered BNF rows require approval metadata |
| L5 Corpus | ✅ Current corpus clean | 128 statements, 125 parser accepts, 3 expected rejects, 0 parse violations, 0 Loc violations, 0 crashes |
| L6 Completion | ❌ Scoped, not implemented | Scope plan: `docs/plans/2026-04-28-oracle-completion-scope.md` |
| L7 Integration | ❌ Not started | Catalog/migration not in parser-layer scope |
| L6 Completion | ✅ Bytebase production set complete | Parser-native API, SELECT/CTE/DML/DDL signals, keyword candidates, adapter cutover, object-kind filtering, and quoted/reserved/case-sensitive metadata behavior covered |
| L7 Integration | 🚧 Parser integration started | Completion adapter now uses omni locally; catalog/migration remain outside parser-layer scope |

---

Expand Down
Loading
Loading