Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
*.user
*.userosscache
*.sln.docstates
AGENTS.local.md

# User-specific files (MonoDevelop/Xamarin Studio)
*.userprefs
Expand Down Expand Up @@ -362,4 +363,4 @@ MigrationBackup/
FodyWeavers.xsd

.claude/
.idea/
.idea/
147 changes: 147 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Agent Contribution Guide

## Project Overview

This repository contains the official Entity Framework Core provider for ClickHouse. It is built on top of `ClickHouse.Driver` and implements EF Core relational provider services with ClickHouse-specific SQL generation, type mapping, migrations, and write-path behavior.

Primary target versions:

- .NET: `net10.0`
- EF Core: `Microsoft.EntityFrameworkCore.Relational` 10.x
- ClickHouse ADO.NET driver: `ClickHouse.Driver`
- Tests: xUnit v3 with Testcontainers for integration coverage

## Project Structure

```text
src/EFCore.ClickHouse/
Extensions/ Public entry points such as UseClickHouse()
Infrastructure/ Options, model validation, singleton options
Diagnostics/Internal/ Provider logging definitions
Storage/Internal/ Connection, SQL helper, database creator, type mapping source
Storage/Internal/Mapping/ Individual ClickHouse type mappings
Query/ SQL expression factory
Query/Internal/ Query pipeline visitors, processors, SQL generator
Query/Expressions/Internal/ Custom SQL AST nodes
Query/ExpressionTranslators/Internal/ LINQ member/method/aggregate translators
Metadata/ Annotations, fluent API builders, conventions
Migrations/ Migration operations and SQL generation
Update/Internal/ SaveChanges insert batching and unsupported mutation paths

test/EFCore.ClickHouse.Tests/ Focused unit and integration tests
test/EFCore.ClickHouse.FunctionalTests/ EF relational-harness/Northwind-style query tests
test/EFCore.ClickHouse.DesignSmoke/ dotnet-ef/design-time smoke project
```

## Build And Test

Use the solution-level commands unless you have a reason to narrow the scope:

```bash
dotnet build
dotnet test
```

Integration and functional tests require Docker because they use `Testcontainers.ClickHouse` to start a real ClickHouse server.

For targeted runs:

```bash
dotnet test test/EFCore.ClickHouse.Tests/EFCore.ClickHouse.Tests.csproj
dotnet test test/EFCore.ClickHouse.FunctionalTests/EFCore.ClickHouse.FunctionalTests.csproj
dotnet test --filter FullyQualifiedName~TypeMapping
```

For coverage, prefer the collector output and parse the Cobertura XML directly:

```bash
dotnet test --collect:"XPlat Code Coverage"
```

Do not generate HTML coverage reports for routine agent work; they are slower and harder to inspect programmatically.

### Coverage Helpers

Both test projects include `coverlet.collector` and `coverlet.msbuild`. After running coverage, use the helper scripts in `scripts/` to inspect the generated Cobertura XML:

```bash
python3 scripts/coverage-summary.py "test/**/coverage.cobertura.xml" "test/**/TestResults/**/coverage.cobertura.xml"
python3 scripts/coverage-uncovered.py "test/**/coverage.cobertura.xml" "test/**/TestResults/**/coverage.cobertura.xml" ClickHouseTypeMappingSource.cs
```

`scripts/coverage-summary.py` prints per-file coverage sorted worst-first. `scripts/coverage-uncovered.py` prints uncovered line numbers for a specific source file. Both scripts accept multiple coverage XML paths or glob patterns and use the most recent matching file.

## Development Workflow

- Make focused changes that match the existing provider patterns.
- Keep public docs current when behavior changes. Update `README.md`, `CHANGELOG.md`, or `RELEASENOTES.md` when appropriate.
- Do not edit local-only files such as `AGENTS.local.md` if present.
- If there is any doubt at all about ClickHouse db behavior, test it empirically.
- For PR or diff reviews, use the project-specific review guidance in `skills/review/SKILL.md`.
- Avoid unrelated refactors, formatting churn, and broad rewrites.
- Avoid ad-hoc solutions; prefer clean abstractions and logical groupings that are extensible and reusable.
- When adding provider services, register them in `ClickHouseServiceCollectionExtensions.AddEntityFrameworkClickHouse()`.

## Where To Make Common Changes

- New ClickHouse type mapping: add or update a class under `Storage/Internal/Mapping/`, then register it in `ClickHouseTypeMappingSource`.
- New LINQ method translation: add a translator under `Query/ExpressionTranslators/Internal/` and register it in the relevant translator provider.
- SQL syntax changes: update `ClickHouseQuerySqlGenerator` or `ClickHouseSqlGenerationHelper`.
- Custom SQL expression node: add it under `Query/Expressions/Internal/`, then handle it in SQL generation and nullability processing.
- Migrations or DDL behavior: update `Migrations/Internal/` and cover the generated SQL.
- SaveChanges write-path behavior: update `Update/Internal/ClickHouseModificationCommandBatch` and related factory/connection code.

## Testing Guidelines

- Use unit tests for type mapping resolution, SQL literal generation, nullability processing, and SQL generator edge cases.
- Use integration tests in `EFCore.ClickHouse.Tests` for provider behavior that must run against real ClickHouse.
- Use functional tests in `EFCore.ClickHouse.FunctionalTests` for EF relational query-suite parity and Northwind-style query behavior.
- Use `IClassFixture<T>` and shared fixtures so ClickHouse containers are not started per test.
- xUnit v3 `IAsyncLifetime` methods return `Task`.
- Give each integration fixture an isolated database or table setup. Prefer deterministic seed data.
- Assert both result semantics and SQL shape when a bug is specifically about translation.
- Cover runtime paths such as `GenerateNonNullSqlLiteral()`, data-reader materialization, conversion helpers, type resolution branches, query translators, and SQL generator overrides.
- It is acceptable to leave trivial `Clone()` overrides, pass-through constructors, and no-op transaction plumbing lightly covered.

In general, prefer integration tests that actually talk with the database over unit tests.

When writing tests that use the driver directly from the test project, prefer `global::ClickHouse.Driver.ADO.ClickHouseConnection` to avoid namespace collisions with this provider's `ClickHouse.EntityFrameworkCore` namespace.

## Design Considerations

ClickHouse is not a general OLTP database, and the provider should preserve ClickHouse semantics rather than forcing a standard relational shape where it does not fit.

Provider-specific design rules:

- Use ClickHouse-native SQL functions when translating LINQ.
- Preserve .NET observable semantics in translations, especially around nulls, indexing, and default values.
- Do not assume ClickHouse supports relational constraints, row-level transactions, `RETURNING`, identity values, or OLTP-style updates.
- ClickHouse does not support transactions, foreign keys, unique primary keys, or returned auto-increment ids.
- Prefer efficient write paths. Inserts should use the driver's native bulk APIs where possible.
- Be explicit about ClickHouse settings that affect semantics. For example, left join null semantics depend on `join_use_nulls`.
- Be careful with composite type mappings. `Array`, `Map`, `Tuple`, `Variant`, `Dynamic`, `Json`, and geo types often require store-type-driven resolution.

## Current Feature Areas

The provider supports connection setup, read-oriented LINQ queries, grouping and aggregates, string and math translations, joins, subqueries, set operations, insert-only `SaveChanges`, bulk insert, table engine configuration, migrations for supported DDL operations, and a broad ClickHouse type system.

Known unsupported or limited areas include:

- UPDATE and DELETE mutation support.
- Server-generated values such as identity columns or `RETURNING`.
- Reverse engineering/scaffolding.
- Collection method translation.
- Full EF Core specification-test coverage.
- Advanced JSON features.

## Pre-PR Checklist

Before finishing a change:

- Build the solution or the affected projects.
- Run the relevant test project or a targeted filter.
- Add or update tests for changed behavior.
- Check code coverage using the provided scripts.
- Update public docs for user-visible behavior.
- Launch a sub-agent to do a review. Evaluate the result and implement any necessary changes.
- If the changes have an implications for the long-term design of the library, make sure to mention them.
13 changes: 13 additions & 0 deletions AI_POLICY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# ClickHouse AI Policy

You can use AI for ClickHouse development. We welcome and embrace AI usage, as well as research and experiments with the frontier AI models and novel methods of AI applications for software engineering.

You don't have to disclose your usage of AI. You can tell about it, share your experience, and show the methods, but it is not required. AI is a normal developer's tool, similar to an IDE, an OS, or a keyboard. We don't judge your work on the basis of the usage of AI, but we recommend taking efforts to filter out slop before sending a pull request; otherwise, it may negatively affect your reputation as an engineer.

When sending generated code, you take the responsibility in the same way as for the code you have manually typed. Take efforts to read and review the code before sending - otherwise it is disrespectful to maintainers. Take efforts to understand the code base, with or without the help of AI. Low-effort pull requests that require high effort from maintainers will be closed. Do not use AI to automate your responses to maintainers.

Prefer using AI for improving the code base, such as removing and simplifying code, improving the build speed, improving continuous integration tools and quality checks, reverting bad modifications, security research, and bug fixing. Keep in mind that using AI for implementing big features requires as much design consideration as without AI.

When using AI, the same rules around intellectual property apply as with manually written code. Do not copy, reproduce, or include code belonging to others unless its license explicitly permits this use and all license requirements are met. You are responsible for ensuring that you have all required permissions for any submitted code, whether AI-generated or not.

We will be happy to participate in research and experiments with AI models and their application methods on top of the ClickHouse code base. It could be: - benchmarks and comparisons of models, testing of models by solving identical tasks, AI reproducibility studies, performance of agentic loops, AI sandboxing, etc. ClickHouse provides an extremely comprehensive test suite to fulfill these studies, and it is one of the most actively developed open-source software in the world. If you want to share your research, you can send a letter to ai@clickhouse.com.
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
@AGENTS.md
92 changes: 92 additions & 0 deletions scripts/coverage-summary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/usr/bin/env python3
"""Per-file coverage summary from cobertura XML, sorted worst-first."""

import os
import subprocess
import sys
import xml.etree.ElementTree as ET
from glob import glob
from collections import defaultdict


def expand_paths(paths: list[str]) -> list[str]:
expanded: list[str] = []
for path in paths:
matches = glob(path, recursive=True)
expanded.extend(matches or [path])

return [path for path in expanded if os.path.isfile(path)]


def main() -> int:
xml_paths: list[str] = []
changed_ref: str | None = None
i = 1

while i < len(sys.argv):
if sys.argv[i] == "--changed":
changed_ref = (
sys.argv[i + 1]
if i + 1 < len(sys.argv) and not sys.argv[i + 1].startswith("-")
else "HEAD"
)
if changed_ref != "HEAD":
i += 1
i += 1
else:
xml_paths.append(sys.argv[i])
i += 1

if not xml_paths:
print(f"Usage: {sys.argv[0]} <coverage.xml> ... [--changed [ref]]", file=sys.stderr)
return 1

xml_paths = expand_paths(xml_paths)
if not xml_paths:
print("No coverage XML files matched.", file=sys.stderr)
return 1

xml_path = max(xml_paths, key=os.path.getmtime)
changed_files: set[str] | None = None

if changed_ref is not None:
result = subprocess.run(
["git", "diff", "--name-only", changed_ref],
capture_output=True,
text=True,
check=False,
)
changed_files = {os.path.basename(f) for f in result.stdout.strip().splitlines()}

tree = ET.parse(xml_path)
by_file: defaultdict[str, list[int]] = defaultdict(lambda: [0, 0])

for cls in tree.getroot().findall(".//class"):
lines = cls.findall(".//line")
if not lines:
continue

filename = cls.get("filename", "")
if changed_files is not None and os.path.basename(filename) not in changed_files:
continue

by_file[filename][0] += sum(1 for line in lines if int(line.get("hits", 0)) > 0)
by_file[filename][1] += len(lines)

if changed_files is not None and not by_file:
print("No changed files found in coverage report.", file=sys.stderr)
return 0

for pct, path, covered, total in sorted(
[
(covered / total * 100 if total else 0, path, covered, total)
for path, (covered, total) in by_file.items()
]
):
print(f"{pct:5.1f}% ({covered:3d}/{total:3d}) {path}")

return 0


if __name__ == "__main__":
raise SystemExit(main())
68 changes: 68 additions & 0 deletions scripts/coverage-uncovered.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env python3
"""Find uncovered lines for a specific file from cobertura XML."""

import os
import sys
import xml.etree.ElementTree as ET
from glob import glob


def expand_paths(paths: list[str]) -> list[str]:
expanded: list[str] = []
for path in paths:
matches = glob(path, recursive=True)
expanded.extend(matches or [path])

return [path for path in expanded if os.path.isfile(path)]


def main() -> int:
if len(sys.argv) < 3:
print(
f"Usage: {sys.argv[0]} <path/to/coverage.cobertura.xml> ... <filename>",
file=sys.stderr,
)
return 1

target = sys.argv[-1]
xml_paths = expand_paths(sys.argv[1:-1])
if not xml_paths:
print("No coverage XML files matched.", file=sys.stderr)
return 1

xml_path = max(xml_paths, key=os.path.getmtime)

tree = ET.parse(xml_path)
found = False

for cls in tree.getroot().findall(".//class"):
if target in cls.get("filename", ""):
found = True
uncovered = sorted(
{
int(line.get("number", "0"))
for line in cls.findall(".//line")
if int(line.get("hits", 0)) == 0
}
)
uncovered_lines = [
str(line)
for line in uncovered
if line > 0
]

if uncovered_lines:
print(cls.get("filename"))
print(f" Uncovered lines: {', '.join(uncovered_lines)}")
else:
print(f"{cls.get('filename')}: fully covered")

if not found:
print(f"No classes matching '{target}' found in coverage report.", file=sys.stderr)
return 1

return 0


if __name__ == "__main__":
raise SystemExit(main())
Loading
Loading