|
| 1 | +# CLI Module - Pydantic Serialization Patterns |
| 2 | + |
| 3 | +## Quick Reference: When Pydantic Auto-Serializes Types |
| 4 | + |
| 5 | +| Type | In Model Fields | In Arbitrary Dicts | Solution for Dicts | |
| 6 | +| ----------- | -------------------------- | ---------------------- | ----------------------- | |
| 7 | +| `Path` | ✅ Auto → `str` | ❌ Requires custom | `str(path)` | |
| 8 | +| `datetime` | ✅ Auto → ISO format | ❌ Requires custom | `dt.isoformat()` | |
| 9 | +| `dataclass` | ✅ Auto → dict (recursive) | ❌ Requires conversion | `asdict()` then recurse | |
| 10 | + |
| 11 | +**Key insight:** Pydantic's automatic serialization only applies to **typed model fields**, not arbitrary Python objects nested in dicts. |
| 12 | + |
| 13 | +--- |
| 14 | + |
| 15 | +## Design Decision: json_output.py Architecture |
| 16 | + |
| 17 | +### Dict-Only Approach with Pydantic Validation Where Needed |
| 18 | + |
| 19 | +The `emit_json()` function accepts only dicts: |
| 20 | + |
| 21 | +```python |
| 22 | +def emit_json(data: dict[str, Any]) -> None: |
| 23 | + serialized = _serialize_for_json(data) |
| 24 | + json_str = json.dumps(serialized, indent=2) |
| 25 | + machine_output(json_str) |
| 26 | +``` |
| 27 | + |
| 28 | +**Rationale:** |
| 29 | + |
| 30 | +1. **Single contract** - Clear expectation: "give me a dict" |
| 31 | + - No union types (`BaseModel | dict`) |
| 32 | + - No branching logic based on type |
| 33 | + - Simple, explicit interface |
| 34 | + |
| 35 | +2. **Pydantic models convert first** - Commands using schemas call `.model_dump(mode='json')` |
| 36 | + - Example: `emit_json(status_model.model_dump(mode='json'))` |
| 37 | + - Pydantic handles Path/datetime serialization during conversion |
| 38 | + - Validation happens at model construction, not at emit time |
| 39 | + |
| 40 | +3. **Custom serialization for dict values** - `_serialize_for_json()` handles special types |
| 41 | + - Path → str conversion |
| 42 | + - datetime → ISO format |
| 43 | + - dataclass → dict recursion |
| 44 | + |
| 45 | +**Why not accept Pydantic models directly?** |
| 46 | + |
| 47 | +- Eliminates dual-path complexity |
| 48 | +- Makes conversion explicit at call sites |
| 49 | +- Pydantic `.model_dump(mode='json')` already does serialization |
| 50 | +- No performance benefit to accepting BaseModel (still converts to dict internally) |
| 51 | + |
| 52 | +--- |
| 53 | + |
| 54 | +## Why NOT Use RootModel for Dict Serialization? |
| 55 | + |
| 56 | +### The Tempting (But Wrong) Approach |
| 57 | + |
| 58 | +```python |
| 59 | +# ❌ This seems like it would work, but doesn't |
| 60 | +from pydantic import RootModel |
| 61 | + |
| 62 | +data = {"path": Path("/test"), "timestamp": datetime.now()} |
| 63 | +model = RootModel[dict[str, Any]](data) |
| 64 | +json_str = model.model_dump_json() # ← TypeError: Path not JSON serializable |
| 65 | +``` |
| 66 | + |
| 67 | +**Why it fails:** |
| 68 | + |
| 69 | +- `RootModel[dict[str, Any]]` tells Pydantic "this is a dict" |
| 70 | +- Pydantic only serializes Path/datetime when they're **typed fields** |
| 71 | +- Generic `Any` type provides no serialization hints |
| 72 | +- The Path/datetime are **dict values**, not model fields |
| 73 | + |
| 74 | +### What Actually Works |
| 75 | + |
| 76 | +**Option 1: Typed model fields** (what Pydantic excels at) |
| 77 | + |
| 78 | +```python |
| 79 | +# ✅ Pydantic knows path is a Path field |
| 80 | +class MyModel(BaseModel): |
| 81 | + path: Path |
| 82 | + timestamp: datetime |
| 83 | + |
| 84 | +model = MyModel(path=Path("/test"), timestamp=datetime.now()) |
| 85 | +json_str = model.model_dump_json() # ← Works: {"path": "/test", "timestamp": "..."} |
| 86 | +``` |
| 87 | + |
| 88 | +**Option 2: Custom serialization function** (for arbitrary dicts) |
| 89 | + |
| 90 | +```python |
| 91 | +# ✅ Explicit control over serialization |
| 92 | +def _serialize_for_json(obj: Any) -> Any: |
| 93 | + if isinstance(obj, Path): |
| 94 | + return str(obj) |
| 95 | + if isinstance(obj, datetime): |
| 96 | + return obj.isoformat() |
| 97 | + # ... handle dicts, lists, nested structures |
| 98 | + |
| 99 | +serialized = _serialize_for_json(data) |
| 100 | +json_str = json.dumps(serialized, indent=2) |
| 101 | +``` |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Performance Characteristics |
| 106 | + |
| 107 | +### RootModel Construction Overhead |
| 108 | + |
| 109 | +**Creating a Pydantic model has significant overhead:** |
| 110 | + |
| 111 | +```python |
| 112 | +# Every emit_json() call would do this: |
| 113 | +model = RootModel[dict[str, Any]](data) # ← Model construction |
| 114 | +json_str = model.model_dump_json() # ← Validation + serialization |
| 115 | +``` |
| 116 | + |
| 117 | +**Overhead includes:** |
| 118 | + |
| 119 | +- Schema construction/validation |
| 120 | +- Field inspection and type checking |
| 121 | +- Serialization setup |
| 122 | +- Internal model bookkeeping |
| 123 | + |
| 124 | +**From Pydantic documentation:** |
| 125 | + |
| 126 | +> "Schema construction carries significant overhead...it is recommended to create a TypeAdapter for a given type just once and reuse it" |
| 127 | +
|
| 128 | +### Custom Function Performance |
| 129 | + |
| 130 | +```python |
| 131 | +# Direct Python recursion - minimal overhead |
| 132 | +serialized = _serialize_for_json(data) # ← Pure Python recursion |
| 133 | +json_str = json.dumps(serialized) # ← Standard library |
| 134 | +``` |
| 135 | + |
| 136 | +**Characteristics:** |
| 137 | + |
| 138 | +- No model construction |
| 139 | +- No validation overhead |
| 140 | +- Direct type checking with `isinstance()` |
| 141 | +- Minimal memory allocation |
| 142 | + |
| 143 | +**Performance comparison:** |
| 144 | + |
| 145 | +- RootModel: ~100-1000x slower (model construction dominates) |
| 146 | +- Custom function: Near-zero overhead beyond the serialization itself |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Common Mistakes to Avoid |
| 151 | + |
| 152 | +### Mistake 1: Assuming RootModel Auto-Serializes Nested Types |
| 153 | + |
| 154 | +```python |
| 155 | +# ❌ WRONG: Assumes RootModel handles Path in dict values |
| 156 | +data = {"worktree": {"path": Path("/test")}} |
| 157 | +model = RootModel[dict[str, Any]](data) |
| 158 | +json_str = model.model_dump_json() # ← Fails: Path not serializable |
| 159 | +``` |
| 160 | + |
| 161 | +**Why it fails:** `dict[str, Any]` doesn't tell Pydantic that values contain Path objects. |
| 162 | + |
| 163 | +**Correct approach:** |
| 164 | + |
| 165 | +```python |
| 166 | +# ✅ CORRECT: Define schema with typed fields |
| 167 | +class WorktreeInfo(BaseModel): |
| 168 | + path: Path |
| 169 | + |
| 170 | +class Output(BaseModel): |
| 171 | + worktree: WorktreeInfo |
| 172 | + |
| 173 | +model = Output(worktree=WorktreeInfo(path=Path("/test"))) |
| 174 | +json_str = model.model_dump_json() # ← Works: field typing provides serialization hint |
| 175 | +``` |
| 176 | + |
| 177 | +### Mistake 2: Using model_dump() Then json.dumps() |
| 178 | + |
| 179 | +```python |
| 180 | +# ❌ INEFFICIENT: Two-step serialization |
| 181 | +model = MyPydanticModel(...) |
| 182 | +dict_data = model.model_dump(mode='json') # ← Pydantic converts to dict |
| 183 | +json_str = json.dumps(dict_data, indent=2) # ← json.dumps formats it |
| 184 | +``` |
| 185 | + |
| 186 | +**Why inefficient:** Pydantic already has formatted JSON serialization. |
| 187 | + |
| 188 | +**Correct approach:** |
| 189 | + |
| 190 | +```python |
| 191 | +# ✅ CORRECT: One-step serialization |
| 192 | +model = MyPydanticModel(...) |
| 193 | +json_str = model.model_dump_json(indent=2) # ← Single call, already formatted |
| 194 | +``` |
| 195 | + |
| 196 | +### Mistake 3: Not Converting Pydantic Model to Dict |
| 197 | + |
| 198 | +```python |
| 199 | +# ❌ WRONG: emit_json() only accepts dicts |
| 200 | +def my_command(): |
| 201 | + model = MyResponseModel(name="test", path=Path("/foo")) |
| 202 | + emit_json(model) # ← Type error: Expected dict, got BaseModel |
| 203 | +``` |
| 204 | + |
| 205 | +**Why wrong:** `emit_json()` signature is `dict[str, Any]`, not `BaseModel | dict`. |
| 206 | + |
| 207 | +**Correct approach:** |
| 208 | + |
| 209 | +```python |
| 210 | +# ✅ CORRECT: Convert Pydantic model to dict first |
| 211 | +def my_command(): |
| 212 | + model = MyResponseModel(name="test", path=Path("/foo")) |
| 213 | + emit_json(model.model_dump(mode="json")) # ← Explicit conversion |
| 214 | +``` |
| 215 | + |
| 216 | +--- |
| 217 | + |
| 218 | +## Design Rationale Summary |
| 219 | + |
| 220 | +### Why json_output.py Uses This Approach |
| 221 | + |
| 222 | +1. **Dict-only interface** - Single, simple contract |
| 223 | + - All callers pass dicts to `emit_json()` |
| 224 | + - No union types or branching logic |
| 225 | + - Explicit conversions at call sites |
| 226 | + |
| 227 | +2. **Pydantic models call .model_dump(mode='json')** - Explicit conversion |
| 228 | + - Validation happens at model construction |
| 229 | + - Serialization happens during .model_dump() |
| 230 | + - emit_json() receives pre-serialized dict |
| 231 | + |
| 232 | +3. **Custom serialization for dict values** - Handles special types |
| 233 | + - Path → str, datetime → ISO format |
| 234 | + - Explicit, performant, testable |
| 235 | + - Follows LBYL (Look Before You Leap) principles |
| 236 | + |
| 237 | +4. **ErrorResponse still uses Pydantic** - Provides validation value |
| 238 | + - Validates exit_code range (0-255) |
| 239 | + - Converts to dict before emit: `emit_json(error.model_dump(mode='json'))` |
| 240 | + |
| 241 | +5. **No RootModel** - Avoids anti-pattern |
| 242 | + - Doesn't solve the serialization problem |
| 243 | + - Adds performance overhead |
| 244 | + - More complex without benefit |
| 245 | + |
| 246 | +### When to Use Pydantic Models |
| 247 | + |
| 248 | +**Use Pydantic models before calling emit_json() when:** |
| 249 | + |
| 250 | +- ✅ Output has a well-defined schema |
| 251 | +- ✅ Want runtime validation (e.g., field constraints) |
| 252 | +- ✅ Multiple commands share the same structure |
| 253 | +- ✅ Need to document JSON schema |
| 254 | + |
| 255 | +**Pattern:** |
| 256 | + |
| 257 | +```python |
| 258 | +model = MyResponseModel(...) # ← Validation happens here |
| 259 | +emit_json(model.model_dump(mode='json')) # ← Serialization happens here |
| 260 | +``` |
| 261 | + |
| 262 | +**Use plain dicts when:** |
| 263 | + |
| 264 | +- ✅ Output structure is dynamic or simple |
| 265 | +- ✅ Don't need validation |
| 266 | +- ✅ One-off command output |
| 267 | + |
| 268 | +--- |
| 269 | + |
| 270 | +## Code Examples |
| 271 | + |
| 272 | +### Example 1: Pydantic Model with Explicit Conversion |
| 273 | + |
| 274 | +```python |
| 275 | +from pathlib import Path |
| 276 | +from datetime import datetime |
| 277 | +from pydantic import BaseModel |
| 278 | +from erk.cli.json_output import emit_json |
| 279 | + |
| 280 | +class WorktreeResponse(BaseModel): |
| 281 | + name: str |
| 282 | + path: Path # ← Typed field: auto-converts to str |
| 283 | + created: datetime # ← Typed field: auto-converts to ISO format |
| 284 | + |
| 285 | +# Usage |
| 286 | +response = WorktreeResponse( |
| 287 | + name="feature", |
| 288 | + path=Path("/repo/worktrees/feature"), |
| 289 | + created=datetime.now(), |
| 290 | +) |
| 291 | + |
| 292 | +# Convert to dict then emit |
| 293 | +emit_json(response.model_dump(mode="json")) |
| 294 | +# Output: { |
| 295 | +# "name": "feature", |
| 296 | +# "path": "/repo/worktrees/feature", |
| 297 | +# "created": "2025-11-17T10:30:45.123456Z" |
| 298 | +# } |
| 299 | +``` |
| 300 | + |
| 301 | +### Example 2: Dict with Custom Serialization |
| 302 | + |
| 303 | +```python |
| 304 | +from pathlib import Path |
| 305 | +from datetime import datetime |
| 306 | + |
| 307 | +def _serialize_for_json(obj: Any) -> Any: |
| 308 | + """Recursively serialize special types.""" |
| 309 | + if isinstance(obj, Path): |
| 310 | + return str(obj) |
| 311 | + if isinstance(obj, datetime): |
| 312 | + return obj.isoformat() |
| 313 | + if isinstance(obj, dict): |
| 314 | + return {k: _serialize_for_json(v) for k, v in obj.items()} |
| 315 | + if isinstance(obj, list): |
| 316 | + return [_serialize_for_json(item) for item in obj] |
| 317 | + return obj |
| 318 | + |
| 319 | +# Usage |
| 320 | +data = { |
| 321 | + "name": "feature", |
| 322 | + "path": Path("/repo/worktrees/feature"), |
| 323 | + "created": datetime.now(), |
| 324 | +} |
| 325 | + |
| 326 | +serialized = _serialize_for_json(data) |
| 327 | +json_str = json.dumps(serialized, indent=2) |
| 328 | +# Output: { |
| 329 | +# "name": "feature", |
| 330 | +# "path": "/repo/worktrees/feature", |
| 331 | +# "created": "2025-11-17T10:30:45.123456" |
| 332 | +# } |
| 333 | +``` |
| 334 | + |
| 335 | +### Example 3: ErrorResponse with Validation |
| 336 | + |
| 337 | +```python |
| 338 | +from pydantic import BaseModel, Field |
| 339 | + |
| 340 | +class ErrorResponse(BaseModel): |
| 341 | + error: str |
| 342 | + error_type: str |
| 343 | + exit_code: int = Field(default=1, ge=0, le=255) # ← Validates range |
| 344 | + |
| 345 | +# Valid |
| 346 | +error = ErrorResponse(error="File not found", error_type="FileNotFoundError", exit_code=1) |
| 347 | + |
| 348 | +# Invalid - raises ValidationError |
| 349 | +error = ErrorResponse(error="Bad", error_type="Error", exit_code=999) # ← Fails: > 255 |
| 350 | +``` |
| 351 | + |
| 352 | +--- |
| 353 | + |
| 354 | +## Related Files |
| 355 | + |
| 356 | +**Implementation:** |
| 357 | + |
| 358 | +- `src/erk/cli/json_output.py` - Core JSON output utilities |
| 359 | +- `src/erk/cli/json_schemas.py` - Pydantic schemas for command responses |
| 360 | +- `src/erk/cli/rendering.py` - Output rendering framework |
| 361 | + |
| 362 | +**Tests:** |
| 363 | + |
| 364 | +- `tests/cli/test_json_output.py` - Tests for emit_json() and serialization |
| 365 | +- `tests/cli/test_rendering.py` - Tests for renderer framework |
| 366 | + |
| 367 | +--- |
| 368 | + |
| 369 | +## Further Reading |
| 370 | + |
| 371 | +**Pydantic documentation:** |
| 372 | + |
| 373 | +- [Serialization](https://docs.pydantic.dev/latest/concepts/serialization/) - Model dump, JSON mode |
| 374 | +- [Custom Serializers](https://docs.pydantic.dev/latest/concepts/serialization/#custom-serializers) - Field and model serializers |
| 375 | +- [JSON Schema](https://docs.pydantic.dev/latest/concepts/json_schema/) - Generating schemas |
| 376 | + |
| 377 | +**Erk coding standards:** |
| 378 | + |
| 379 | +- `AGENTS.md` - Top 6 critical rules |
| 380 | +- `dignified-python` skill - Complete Python standards |
0 commit comments