Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
b156a86
Fix 53 parser issues: keyword types, implicit aliases, empty tuples, …
claude Dec 13, 2025
3be711d
Fix 41 more parser issues: NAN/INF literals, keywords as identifiers,…
claude Dec 13, 2025
ca277d5
Fix 14 more parser issues: engine names, Nested types, settings, LIMI…
claude Dec 13, 2025
c3f8187
Add INSERT INTO FUNCTION SETTINGS support and INDEX definitions in CR…
claude Dec 13, 2025
55ed562
Fix 27 parser and explain output issues
claude Dec 13, 2025
e65fcd7
Fix more parser and explain output issues
claude Dec 13, 2025
af21faa
Add WITH clause element explain output support
claude Dec 13, 2025
0bc058e
Fix IN expression and WITH clause parsing
claude Dec 13, 2025
93d6e99
Add hex, binary, octal literals and number separators to lexer
claude Dec 13, 2025
151dbbd
Add WITH CUBE support for GROUP BY clause
claude Dec 13, 2025
06f016f
Add hex and binary string literal support in lexer
claude Dec 13, 2025
3949190
Support WITH TOTALS at end of SELECT statement
claude Dec 13, 2025
7fff627
Fix multiple parser and explain output issues
claude Dec 13, 2025
1635e12
Fix CAST expression rendering and operator mappings
claude Dec 13, 2025
a49c9ba
Fix INTERVAL expression unit formatting
claude Dec 13, 2025
866dd1c
Fix CAST operator syntax literal formatting
claude Dec 13, 2025
cb6b879
Add PRIMARY KEY output to CREATE TABLE explain
claude Dec 13, 2025
fc06a9b
Fix explain output formatting and test comparison
claude Dec 14, 2025
e62fd78
Add PARTITION BY support and fix tuple expansion
claude Dec 14, 2025
8c333a4
Fix aliased expression handling for binary/unary/function/identifier
claude Dec 14, 2025
eb8f29d
Fix float literal formatting to avoid scientific notation
claude Dec 14, 2025
c68db17
Add window function (OVER clause) support to explain output
claude Dec 14, 2025
cd167d1
Fix explain output for TableJoin, table function aliases, array casts…
claude Dec 14, 2025
8370b18
Add table identifier alias support to explain output
claude Dec 14, 2025
106be54
Flatten chained || (concat) operations in explain output
claude Dec 14, 2025
fac8af3
Update TODO.md with current state and identified parser issues
claude Dec 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 127 additions & 63 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,120 +2,179 @@

## Current State

- **Tests passing:** 5,197 (76.2%)
- **Tests skipped:** 1,627 (23.8%)
- Parser issues: ~675
- Explain mismatches: ~637
- **Tests passing:** 5,933 (86.9%)
- **Tests skipped:** 891 (13.1%)

## Parser Issues
## Recently Fixed (explain layer)

- ✅ TableJoin output - removed join type keywords
- ✅ Table function aliases (e.g., `remote('127.1') AS t1`)
- ✅ Table identifier aliases (e.g., `system.one AS xxx`)
- ✅ Array/tuple cast formatting for `::` syntax
- ✅ SETTINGS placement with FORMAT clause
- ✅ Concat operator `||` flattening into single `concat` function
- ✅ Window function (OVER clause) support
- ✅ Float literal formatting
- ✅ Aliased expression handling for binary/unary/function/identifier
- ✅ PARTITION BY support in CREATE TABLE
- ✅ Server error message stripping from expected output

## Parser Issues (High Priority)

These require changes to `parser/parser.go`:

### Table/Database Names Starting with Numbers
Tables and databases with names starting with digits fail to parse:
### DROP TABLE with Multiple Tables
Parser only captures first table when multiple are specified:
```sql
DROP TABLE IF EXISTS 03657_gby_overflow;
DROP DATABASE IF EXISTS 03710_database;
DROP TABLE IF EXISTS t1, t2, t3;
-- Expected: ExpressionList with 3 TableIdentifiers
-- Got: Single Identifier for t1
```

### FORMAT Null
The `FORMAT Null` clause is not recognized:
### Negative Integer Literals
Negative numbers are parsed as `Function negate` instead of negative literals:
```sql
SELECT ... FORMAT Null;
SELECT -1, -10000;
-- Expected: Literal Int64_-1
-- Got: Function negate (children 1) with Literal UInt64_1
```

### FETCH FIRST ... ROW ONLY
SQL standard fetch syntax is not supported:
### CREATE TABLE with INDEX Clause
INDEX definitions in CREATE TABLE are not captured:
```sql
SELECT ... FETCH FIRST 1 ROW ONLY;
CREATE TABLE t (x Array(String), INDEX idx1 x TYPE bloom_filter(0.025)) ENGINE=MergeTree;
```

### INSERT INTO FUNCTION
Function-based inserts are not supported:
### SETTINGS Inside Function Arguments
SETTINGS clause within function calls is not parsed:
```sql
INSERT INTO FUNCTION file('file.parquet') SELECT ...;
SELECT * FROM icebergS3(s3_conn, filename='test', SETTINGS key='value');
-- The SETTINGS should become a Set child of the function
```

### WITH ... AS Subquery Aliases
Subquery aliases in FROM clauses with keyword `AS`:
### CREATE TABLE with Column TTL
TTL expressions on columns are not captured:
```sql
SELECT * FROM (SELECT 1 x) AS alias;
CREATE TABLE t (c Int TTL expr()) ENGINE=MergeTree;
-- Expected: ColumnDeclaration with 2 children (type + TTL function)
```

### String Concatenation Operator ||
The `||` operator in some contexts:
### Empty Tuple in ORDER BY
`ORDER BY ()` should capture empty tuple expression:
```sql
SELECT currentDatabase() || '_test' AS key;
CREATE TABLE t (...) ENGINE=MergeTree ORDER BY ();
-- Expected: Function tuple (children 1) with empty ExpressionList
-- Got: Storage definition with no ORDER BY
```

### MOD/DIV Operators
The MOD and DIV keywords as operators:
### String Escape Handling
Parser stores escaped characters literally instead of unescaping:
```sql
SELECT number MOD 3, number DIV 3 FROM ...;
SELECT 'x\'e2\'';
-- Parser stores: x\'e2\' (with backslashes)
-- Should store: x'e2' (unescaped)
```

### Reserved Keyword Handling
Keywords like `LEFT`, `RIGHT` used as table aliases:
## Parser Issues (Medium Priority)

### CREATE DICTIONARY
Dictionary definitions are not supported:
```sql
SELECT * FROM numbers(10) AS left RIGHT JOIN ...;
CREATE DICTIONARY d0 (c1 UInt64) PRIMARY KEY c1 LAYOUT(FLAT()) SOURCE(...);
```

### Parameterized Settings
Settings with `$` parameters:
### CREATE USER / CREATE FUNCTION
User and function definitions are not supported:
```sql
SET param_$1 = 'Hello';
CREATE USER test_user GRANTEES ...;
CREATE OR REPLACE FUNCTION myFunc AS ...;
```

### Incomplete CASE Expression
CASE without END:
### QUALIFY Clause
Window function filtering clause:
```sql
SELECT CASE number -- missing END
SELECT x QUALIFY row_number() OVER () = 1;
```

## Explain Output Issues
### INTO OUTFILE with TRUNCATE
Extended INTO OUTFILE syntax:
```sql
SELECT 1, 2 INTO OUTFILE '/dev/null' TRUNCATE FORMAT Npy;
```

These require changes to `internal/explain/`:
### GROUPING SETS
Advanced grouping syntax:
```sql
SELECT ... GROUP BY GROUPING SETS ((a), (b));
```

### Double Equals (==) Operator
The `==` operator creates extra nested equals/tuple nodes:
### view() Table Function
The view() table function in FROM:
```sql
SELECT value == '127.0.0.1:9181'
SELECT * FROM view(SELECT 1 as id);
```
Expected: `Function equals` with `Identifier` and `Literal`
Got: Nested `Function equals` with extra `Function tuple`

### CreateQuery Spacing
Some ClickHouse versions output extra space before `(children`:
### CREATE TABLE ... AS SELECT
CREATE TABLE with inline SELECT:
```sql
CREATE TABLE src ENGINE=Memory AS SELECT 1;
```
CreateQuery d1 (children 1) -- two spaces
CreateQuery d1 (children 1) -- one space (our output)

### Variant() Type with PRIMARY KEY
Complex column definitions:
```sql
CREATE TABLE t (c Variant() PRIMARY KEY) ENGINE=Redis(...);
```

### Server Error Messages in Expected Output
Some test expected outputs include trailing messages:
## Parser Issues (Lower Priority)

### INTERVAL with Dynamic Type
INTERVAL with type cast:
```sql
SELECT INTERVAL 1 MINUTE AS c0, INTERVAL c0::Dynamic DAY;
```
The query succeeded but the server error '42' was expected

### ALTER TABLE with Multiple Operations
Multiple ALTER operations in parentheses:
```sql
ALTER TABLE t (DELETE WHERE ...), (MODIFY SETTING ...), (UPDATE ... WHERE ...);
```
These are not part of the actual EXPLAIN output.

## Lower Priority
### Tuple Type in Column with Subfield Access
Tuple type with engine using subfield:
```sql
CREATE TABLE t (t Tuple(a Int32)) ENGINE=EmbeddedRocksDB() PRIMARY KEY (t.a);
```

### DateTime64 with Timezone
Type parameters with string timezone:
### insert() Function with input()
INSERT using input() function:
```sql
DateTime64(3,'UTC')
INSERT INTO FUNCTION null() SELECT * FROM input('x Int') ...;
```

### Complex Type Expressions
Nested type expressions in column definitions:
## Explain Issues (Remaining)

### Scientific Notation for Floats
Very small/large floats should use scientific notation:
```sql
CREATE TABLE t (c LowCardinality(UUID));
SELECT 2.2250738585072014e-308;
-- Expected: Float64_2.2250738585072014e-308
-- Got: Float64_0.0000...22250738585072014
```

### Parameterized Views
View definitions with parameters:
### Array Literals with Negative Numbers
Arrays with negative integers expand to Function instead of Literal:
```sql
CREATE VIEW v AS SELECT ... WHERE x={parity:Int8};
SELECT [-10000, 5750];
-- Expected: Literal Array_[Int64_-10000, UInt64_5750]
-- Got: Function array with Function negate for -10000
```

### WithElement for CTE Subqueries
Some CTE subqueries should use WithElement wrapper:
```sql
WITH sub AS (SELECT ...) SELECT ...;
-- Expected: WithElement (children 1) > Subquery > SelectWithUnionQuery
```

## Testing Notes
Expand All @@ -127,10 +186,15 @@ go test ./parser -timeout 5s -v

Count test results:
```bash
go test ./parser -timeout 5s -v 2>&1 | grep -E 'PASS:|SKIP:' | cut -d':' -f1 | sort | uniq -c
go test ./parser -v 2>&1 | grep -E 'PASS:|SKIP:' | wc -l
```

View explain mismatches:
```bash
go test ./parser -timeout 5s -v 2>&1 | grep -A 30 "TODO: Explain output mismatch" | head -100
go test ./parser -v 2>&1 | grep -A 30 "TODO: Explain output mismatch" | head -100
```

View parser failures:
```bash
go test ./parser -v 2>&1 | grep "TODO: Parser does not yet support" | head -20
```
73 changes: 63 additions & 10 deletions ast/ast.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
package ast

import (
"encoding/json"
"math"

"github.com/kyleconroy/doubleclick/token"
)

Expand Down Expand Up @@ -51,6 +54,7 @@ type SelectQuery struct {
Where Expression `json:"where,omitempty"`
GroupBy []Expression `json:"group_by,omitempty"`
WithRollup bool `json:"with_rollup,omitempty"`
WithCube bool `json:"with_cube,omitempty"`
WithTotals bool `json:"with_totals,omitempty"`
Having Expression `json:"having,omitempty"`
Window []*WindowDefinition `json:"window,omitempty"`
Expand Down Expand Up @@ -199,13 +203,14 @@ func (s *SettingExpr) End() token.Position { return s.Position }

// InsertQuery represents an INSERT statement.
type InsertQuery struct {
Position token.Position `json:"-"`
Database string `json:"database,omitempty"`
Table string `json:"table,omitempty"`
Function *FunctionCall `json:"function,omitempty"` // For INSERT INTO FUNCTION syntax
Columns []*Identifier `json:"columns,omitempty"`
Select Statement `json:"select,omitempty"`
Format *Identifier `json:"format,omitempty"`
Position token.Position `json:"-"`
Database string `json:"database,omitempty"`
Table string `json:"table,omitempty"`
Function *FunctionCall `json:"function,omitempty"` // For INSERT INTO FUNCTION syntax
Columns []*Identifier `json:"columns,omitempty"`
Select Statement `json:"select,omitempty"`
Format *Identifier `json:"format,omitempty"`
HasSettings bool `json:"has_settings,omitempty"` // For SETTINGS clause
}

func (i *InsertQuery) Pos() token.Position { return i.Position }
Expand Down Expand Up @@ -261,15 +266,27 @@ func (c *ColumnDeclaration) End() token.Position { return c.Position }

// DataType represents a data type.
type DataType struct {
Position token.Position `json:"-"`
Name string `json:"name"`
Parameters []Expression `json:"parameters,omitempty"`
Position token.Position `json:"-"`
Name string `json:"name"`
Parameters []Expression `json:"parameters,omitempty"`
HasParentheses bool `json:"has_parentheses,omitempty"`
}

func (d *DataType) Pos() token.Position { return d.Position }
func (d *DataType) End() token.Position { return d.Position }
func (d *DataType) expressionNode() {}

// NameTypePair represents a named type pair, used in Nested types.
type NameTypePair struct {
Position token.Position `json:"-"`
Name string `json:"name"`
Type *DataType `json:"type"`
}

func (n *NameTypePair) Pos() token.Position { return n.Position }
func (n *NameTypePair) End() token.Position { return n.Position }
func (n *NameTypePair) expressionNode() {}

// CodecExpr represents a CODEC expression.
type CodecExpr struct {
Position token.Position `json:"-"`
Expand Down Expand Up @@ -589,6 +606,42 @@ func (l *Literal) Pos() token.Position { return l.Position }
func (l *Literal) End() token.Position { return l.Position }
func (l *Literal) expressionNode() {}

// MarshalJSON handles special float values (NaN, +Inf, -Inf) that JSON doesn't support.
func (l *Literal) MarshalJSON() ([]byte, error) {
type literalAlias Literal
// Handle special float values
if f, ok := l.Value.(float64); ok {
if math.IsNaN(f) {
return json.Marshal(&struct {
*literalAlias
Value string `json:"value"`
}{
literalAlias: (*literalAlias)(l),
Value: "NaN",
})
}
if math.IsInf(f, 1) {
return json.Marshal(&struct {
*literalAlias
Value string `json:"value"`
}{
literalAlias: (*literalAlias)(l),
Value: "+Inf",
})
}
if math.IsInf(f, -1) {
return json.Marshal(&struct {
*literalAlias
Value string `json:"value"`
}{
literalAlias: (*literalAlias)(l),
Value: "-Inf",
})
}
}
return json.Marshal((*literalAlias)(l))
}

// LiteralType represents the type of a literal.
type LiteralType string

Expand Down
6 changes: 6 additions & 0 deletions internal/explain/explain.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ func Node(sb *strings.Builder, node interface{}, depth int) {
explainSubquery(sb, n, indent, depth)
case *ast.AliasedExpr:
explainAliasedExpr(sb, n, depth)
case *ast.WithElement:
explainWithElement(sb, n, indent, depth)
case *ast.Asterisk:
explainAsterisk(sb, n, indent)

Expand Down Expand Up @@ -97,6 +99,8 @@ func Node(sb *strings.Builder, node interface{}, depth int) {
explainExtractExpr(sb, n, indent, depth)

// DDL statements
case *ast.InsertQuery:
explainInsertQuery(sb, n, indent, depth)
case *ast.CreateQuery:
explainCreateQuery(sb, n, indent, depth)
case *ast.DropQuery:
Expand All @@ -117,6 +121,8 @@ func Node(sb *strings.Builder, node interface{}, depth int) {
// Types
case *ast.DataType:
explainDataType(sb, n, indent, depth)
case *ast.NameTypePair:
explainNameTypePair(sb, n, indent, depth)
case *ast.Parameter:
explainParameter(sb, n, indent)

Expand Down
Loading