|
| 1 | +# Porting Rules from Eugene to postgres-language-server |
| 2 | + |
| 3 | +This document tracks the progress of porting lint rules from [Eugene](https://github.com/kaaveland/eugene) to the postgres-language-server analyzer. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Eugene is a PostgreSQL migration linter that detects dangerous operations. We are porting its rules to the postgres-language-server to provide similar safety checks within the language server environment. |
| 8 | + |
| 9 | +**Eugene source location**: `eugene/eugene/src/lints/rules.rs` |
| 10 | +**Hint metadata location**: `eugene/eugene/src/hint_data.rs` |
| 11 | +**Example SQL files**: `eugene/eugene/examples/*/` |
| 12 | + |
| 13 | +## Step-by-Step Porting Process |
| 14 | + |
| 15 | +### 1. Understand the Rule |
| 16 | + |
| 17 | +1. **Read Eugene's implementation** in `eugene/eugene/src/lints/rules.rs` |
| 18 | + - Find the rule function (e.g., `added_serial_column`) |
| 19 | + - Understand the AST patterns it matches |
| 20 | + - Note any special logic (e.g., checking previous statements) |
| 21 | + |
| 22 | +2. **Read the hint metadata** in `eugene/eugene/src/hint_data.rs` |
| 23 | + - Get the ID (e.g., "E11") |
| 24 | + - Get name, condition, effect, and workaround text |
| 25 | + - This provides documentation content |
| 26 | + |
| 27 | +3. **Review example SQL** in `eugene/eugene/examples/<ID>/` |
| 28 | + - `bad.sql` - Invalid cases that should trigger the rule |
| 29 | + - `good.sql` - Valid cases that should NOT trigger |
| 30 | + |
| 31 | +### 2. Create the Rule |
| 32 | + |
| 33 | +```bash |
| 34 | +# Create rule with appropriate severity (error/warn) |
| 35 | +just new-lintrule safety <ruleName> <severity> |
| 36 | + |
| 37 | +# Example: |
| 38 | +just new-lintrule safety addSerialColumn error |
| 39 | +``` |
| 40 | + |
| 41 | +This generates: |
| 42 | +- `crates/pgt_analyser/src/lint/safety/<rule_name>.rs` |
| 43 | +- `crates/pgt_analyser/tests/specs/safety/<ruleName>/basic.sql` |
| 44 | +- Updates to configuration files |
| 45 | +- Diagnostic category registration |
| 46 | + |
| 47 | +### 3. Implement the Rule |
| 48 | + |
| 49 | +**File**: `crates/pgt_analyser/src/lint/safety/<rule_name>.rs` |
| 50 | + |
| 51 | +#### Key Components: |
| 52 | + |
| 53 | +```rust |
| 54 | +use pgt_analyse::{context::RuleContext, declare_lint_rule, Rule, RuleDiagnostic, RuleSource}; |
| 55 | +use pgt_console::markup; |
| 56 | +use pgt_diagnostics::Severity; |
| 57 | + |
| 58 | +declare_lint_rule! { |
| 59 | + /// Brief one-line description (shown in lists). |
| 60 | + /// |
| 61 | + /// Detailed explanation of what the rule detects and why it's problematic. |
| 62 | + /// Explain the PostgreSQL behavior and performance/safety implications. |
| 63 | + /// |
| 64 | + /// ## Examples |
| 65 | + /// |
| 66 | + /// ### Invalid |
| 67 | + /// |
| 68 | + /// ```sql,expect_diagnostic |
| 69 | + /// -- SQL that should trigger the rule |
| 70 | + /// ALTER TABLE users ADD COLUMN id serial; |
| 71 | + /// ``` |
| 72 | + /// |
| 73 | + /// ### Valid |
| 74 | + /// |
| 75 | + /// ```sql |
| 76 | + /// -- SQL that should NOT trigger |
| 77 | + /// CREATE TABLE users (id serial PRIMARY KEY); |
| 78 | + /// ``` |
| 79 | + /// |
| 80 | + pub RuleName { |
| 81 | + version: "next", |
| 82 | + name: "ruleName", |
| 83 | + severity: Severity::Error, // or Warning |
| 84 | + recommended: true, // or false |
| 85 | + sources: &[RuleSource::Eugene("<ID>")], // e.g., "E11" |
| 86 | + } |
| 87 | +} |
| 88 | + |
| 89 | +impl Rule for RuleName { |
| 90 | + type Options = (); |
| 91 | + |
| 92 | + fn run(ctx: &RuleContext<Self>) -> Vec<RuleDiagnostic> { |
| 93 | + let mut diagnostics = Vec::new(); |
| 94 | + |
| 95 | + // Pattern match on the statement type |
| 96 | + if let pgt_query::NodeEnum::AlterTableStmt(stmt) = &ctx.stmt() { |
| 97 | + // Rule logic here |
| 98 | + |
| 99 | + if /* condition */ { |
| 100 | + diagnostics.push( |
| 101 | + RuleDiagnostic::new( |
| 102 | + rule_category!(), |
| 103 | + None, |
| 104 | + markup! { |
| 105 | + "Error message with "<Emphasis>"formatting"</Emphasis>"." |
| 106 | + }, |
| 107 | + ) |
| 108 | + .detail(None, "Additional context about the problem.") |
| 109 | + .note("Suggested fix or workaround."), |
| 110 | + ); |
| 111 | + } |
| 112 | + } |
| 113 | + |
| 114 | + diagnostics |
| 115 | + } |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +#### Important Patterns: |
| 120 | + |
| 121 | +**Accessing previous statements** (for rules like `multipleAlterTable`): |
| 122 | +```rust |
| 123 | +let file_ctx = ctx.file_context(); |
| 124 | +let previous = file_ctx.previous_stmts(); |
| 125 | +``` |
| 126 | + |
| 127 | +**Schema normalization** (treating empty schema as "public"): |
| 128 | +```rust |
| 129 | +let schema_normalized = if schema.is_empty() { |
| 130 | + "public" |
| 131 | +} else { |
| 132 | + schema.as_str() |
| 133 | +}; |
| 134 | +``` |
| 135 | + |
| 136 | +**Checking for specific ALTER TABLE actions**: |
| 137 | +```rust |
| 138 | +for cmd in &stmt.cmds { |
| 139 | + if let Some(pgt_query::NodeEnum::AlterTableCmd(cmd)) = &cmd.node { |
| 140 | + if cmd.subtype() == pgt_query::protobuf::AlterTableType::AtAddColumn { |
| 141 | + // Handle ADD COLUMN |
| 142 | + } |
| 143 | + } |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +**Extracting column type**: |
| 148 | +```rust |
| 149 | +if let Some(pgt_query::NodeEnum::ColumnDef(col_def)) = &cmd.def.as_ref().and_then(|d| d.node.as_ref()) { |
| 150 | + if let Some(type_name) = &col_def.type_name { |
| 151 | + let type_str = get_type_name(type_name); |
| 152 | + } |
| 153 | +} |
| 154 | + |
| 155 | +fn get_type_name(type_name: &pgt_query::protobuf::TypeName) -> String { |
| 156 | + type_name |
| 157 | + .names |
| 158 | + .iter() |
| 159 | + .filter_map(|n| { |
| 160 | + if let Some(pgt_query::NodeEnum::String(s)) = &n.node { |
| 161 | + Some(s.sval.as_str()) |
| 162 | + } else { |
| 163 | + None |
| 164 | + } |
| 165 | + }) |
| 166 | + .collect::<Vec<_>>() |
| 167 | + .join(".") |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +### 4. Create Comprehensive Tests |
| 172 | + |
| 173 | +**Directory**: `crates/pgt_analyser/tests/specs/safety/<ruleName>/` |
| 174 | + |
| 175 | +Create multiple test files covering: |
| 176 | + |
| 177 | +#### Invalid Cases (should trigger): |
| 178 | +```sql |
| 179 | +-- expect_lint/safety/<ruleName> |
| 180 | +-- Description of what this tests |
| 181 | +<SQL that should trigger> |
| 182 | +``` |
| 183 | + |
| 184 | +#### Valid Cases (should NOT trigger): |
| 185 | +```sql |
| 186 | +-- Description of what this tests |
| 187 | +-- expect_no_diagnostics |
| 188 | +<SQL that should NOT trigger> |
| 189 | +``` |
| 190 | + |
| 191 | +#### Example Test Structure: |
| 192 | +``` |
| 193 | +tests/specs/safety/addSerialColumn/ |
| 194 | +├── basic.sql # Basic case triggering the rule |
| 195 | +├── bigserial.sql # Variant (bigserial type) |
| 196 | +├── generated_stored.sql # Another variant (GENERATED) |
| 197 | +├── valid_regular_column.sql # Valid: regular column |
| 198 | +└── valid_create_table.sql # Valid: CREATE TABLE context |
| 199 | +``` |
| 200 | + |
| 201 | +**Run tests and accept snapshots**: |
| 202 | +```bash |
| 203 | +cargo insta test -p pgt_analyser --accept |
| 204 | +``` |
| 205 | + |
| 206 | +### 5. Verify and Generate Code |
| 207 | + |
| 208 | +```bash |
| 209 | +# Check compilation |
| 210 | +cargo check |
| 211 | + |
| 212 | +# Generate lint code and documentation |
| 213 | +just gen-lint |
| 214 | + |
| 215 | +# Run all tests |
| 216 | +cargo test -p pgt_analyser --test rules_tests |
| 217 | + |
| 218 | +# Final verification |
| 219 | +just ready |
| 220 | +``` |
| 221 | + |
| 222 | +### 6. Test the Rule Manually |
| 223 | + |
| 224 | +Create a test SQL file: |
| 225 | +```sql |
| 226 | +-- test.sql |
| 227 | +ALTER TABLE users ADD COLUMN id serial; |
| 228 | +``` |
| 229 | + |
| 230 | +Run the CLI: |
| 231 | +```bash |
| 232 | +cargo run -p pgt_cli -- check /path/to/test.sql |
| 233 | +``` |
| 234 | + |
| 235 | +## Common Pitfalls and Solutions |
| 236 | + |
| 237 | +### 1. Schema Matching |
| 238 | + |
| 239 | +**Problem**: Need to match tables across statements with different schema notations. |
| 240 | + |
| 241 | +**Solution**: Normalize schema names: |
| 242 | +```rust |
| 243 | +let schema_normalized = if schema.is_empty() { "public" } else { schema.as_str() }; |
| 244 | +``` |
| 245 | + |
| 246 | +### 2. AST Navigation |
| 247 | + |
| 248 | +**Problem**: Eugene uses a simplified AST (`StatementSummary`), but pgt uses the full PostgreSQL AST. |
| 249 | + |
| 250 | +**Solution**: Use pattern matching and helper functions. Look at existing rules for examples. |
| 251 | + |
| 252 | +### 3. File Context Rules |
| 253 | + |
| 254 | +**Problem**: Rules that need to track state across statements (like `multipleAlterTable`). |
| 255 | + |
| 256 | +**Solution**: Use `ctx.file_context()` to access `AnalysedFileContext`: |
| 257 | +```rust |
| 258 | +let file_ctx = ctx.file_context(); |
| 259 | +let previous_stmts = file_ctx.previous_stmts(); |
| 260 | +``` |
| 261 | + |
| 262 | +### 4. Transaction State |
| 263 | + |
| 264 | +**Problem**: Some Eugene rules check transaction state (e.g., `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE`). |
| 265 | + |
| 266 | +**Solution**: This is more complex and may require extending the `AnalysedFileContext` to track transaction state. Consider implementing simpler rules first. |
| 267 | + |
| 268 | +### 5. Test Expectations |
| 269 | + |
| 270 | +**Problem**: `expect_lint` expects exactly ONE diagnostic, but rule generates multiple. |
| 271 | + |
| 272 | +**Solution**: Either: |
| 273 | +- Split into separate test files (one diagnostic each) |
| 274 | +- Adjust the test to only trigger once |
| 275 | +- Use expect_diagnostic for each occurrence (check existing tests) |
| 276 | + |
| 277 | +## Rule Mapping Considerations |
| 278 | + |
| 279 | +### Overlapping Rules |
| 280 | + |
| 281 | +Some Eugene rules may overlap with existing pgt rules: |
| 282 | + |
| 283 | +| Eugene Rule | Potential PGT Overlap | Action | |
| 284 | +|-------------|----------------------|--------| |
| 285 | +| `SET_COLUMN_TYPE_TO_JSON` | `preferJsonb` | Review both, may enhance existing | |
| 286 | +| `CREATE_INDEX_NONCONCURRENTLY` | `requireConcurrentIndexCreation` | Review both | |
| 287 | +| `CHANGE_COLUMN_TYPE` | `changingColumnType` | Review both | |
| 288 | +| `ADD_NEW_UNIQUE_CONSTRAINT_WITHOUT_USING_INDEX` | `disallowUniqueConstraint` | Review both | |
| 289 | + |
| 290 | +**When overlap exists**: |
| 291 | +1. Compare implementations |
| 292 | +2. If Eugene's is more comprehensive, consider updating the existing rule |
| 293 | +3. If they cover different aspects, keep both |
| 294 | +4. Document any differences |
| 295 | + |
| 296 | +### Transaction-Aware Rules |
| 297 | + |
| 298 | +These rules require tracking transaction state across multiple statements: |
| 299 | + |
| 300 | +- `RUNNING_STATEMENT_WHILE_HOLDING_ACCESS_EXCLUSIVE` (E4) |
| 301 | +- `LOCKTIMEOUT_WARNING` (E9) |
| 302 | + |
| 303 | +**Approach**: |
| 304 | +1. First implement simpler, statement-level rules |
| 305 | +2. Design transaction state tracking in `AnalysedFileContext` |
| 306 | +3. Add fields to track: |
| 307 | + - Current transaction state (BEGIN/COMMIT/ROLLBACK) |
| 308 | + - Locks held |
| 309 | + - Lock timeout settings |
| 310 | +4. Update state as statements are processed |
| 311 | + |
| 312 | +## Eugene AST vs PostgreSQL AST |
| 313 | + |
| 314 | +### Eugene's Simplified AST |
| 315 | + |
| 316 | +Eugene uses `StatementSummary` enum with simplified representations: |
| 317 | +```rust |
| 318 | +enum StatementSummary { |
| 319 | + AlterTable { schema: String, name: String, actions: Vec<AlterTableAction> }, |
| 320 | + CreateIndex { schema: String, idxname: String, concurrently: bool, target: String }, |
| 321 | + // ... |
| 322 | +} |
| 323 | + |
| 324 | +enum AlterTableAction { |
| 325 | + AddColumn { column: String, type_name: String, stored_generated: bool, ... }, |
| 326 | + SetType { column: String, type_name: String }, |
| 327 | + // ... |
| 328 | +} |
| 329 | +``` |
| 330 | + |
| 331 | +### PostgreSQL AST (pgt_query) |
| 332 | + |
| 333 | +We use the full PostgreSQL protobuf AST: |
| 334 | +```rust |
| 335 | +pgt_query::NodeEnum::AlterTableStmt(stmt) |
| 336 | + -> stmt.cmds: Vec<Node> |
| 337 | + -> NodeEnum::AlterTableCmd(cmd) |
| 338 | + -> cmd.subtype: AlterTableType |
| 339 | + -> cmd.def: Option<Node> |
| 340 | + -> NodeEnum::ColumnDef(col_def) |
| 341 | +``` |
| 342 | + |
| 343 | +**Translation Strategy**: |
| 344 | +1. Look at Eugene's simplified logic |
| 345 | +2. Map to corresponding PostgreSQL AST nodes |
| 346 | +3. Use existing pgt rules as references |
| 347 | +4. Check `pgt_query::protobuf` for available types/enums |
| 348 | + |
| 349 | +## Useful References |
| 350 | + |
| 351 | +- **Eugene source**: `eugene/eugene/src/lints/rules.rs` |
| 352 | +- **Existing pgt rules**: `crates/pgt_analyser/src/lint/safety/` |
| 353 | +- **Contributing guide**: `crates/pgt_analyser/CONTRIBUTING.md` |
| 354 | +- **AST types**: `crates/pgt_query/src/lib.rs` |
| 355 | +- **PostgreSQL protobuf**: `pgt_query::protobuf` module |
| 356 | + |
| 357 | +## Next Steps |
| 358 | + |
| 359 | +1. **Priority**: Port high-priority safety rules (E1, E2, E6, E7, E9) |
| 360 | +2. **Review overlaps**: Check if E3, E5, E6, E7 overlap with existing rules |
| 361 | +3. **Transaction tracking**: Design transaction state tracking for E4, E9 |
| 362 | +4. **Documentation**: Update Eugene source attribution in ported rules |
| 363 | +5. **Testing**: Ensure comprehensive test coverage for all ported rules |
| 364 | + |
| 365 | +## Template Checklist |
| 366 | + |
| 367 | +When porting a new rule, ensure: |
| 368 | + |
| 369 | +- [ ] Rule implementation in `src/lint/safety/<rule>.rs` |
| 370 | +- [ ] Documentation with examples (invalid and valid cases) |
| 371 | +- [ ] `sources: &[RuleSource::Eugene("<ID>")]` attribution |
| 372 | +- [ ] At least 3-5 test files (mix of invalid and valid) |
| 373 | +- [ ] Snapshot tests accepted with `cargo insta test --accept` |
| 374 | +- [ ] All tests pass: `cargo test -p pgt_analyser --test rules_tests` |
| 375 | +- [ ] Compilation clean: `cargo check` |
| 376 | +- [ ] Code generation: `just gen-lint` |
| 377 | +- [ ] Manual CLI test with sample SQL |
| 378 | +- [ ] Update this document with completed rule |
0 commit comments