DuckDB.OrmLite

DuckDB provider for ServiceStack.OrmLite - A fast, simple, and typed ORM for .NET.

Disclaimer

This is an independent, community-maintained provider for ServiceStack.OrmLite. It is not officially maintained or endorsed by ServiceStack.

About

This package enables ServiceStack.OrmLite to work with DuckDB, an in-process analytical database management system. DuckDB excels at:

Fast analytical queries - Columnar storage and vectorized execution
In-process - No separate server process required
SQL standard - Familiar SQL syntax with PostgreSQL compatibility
Data processing - Native support for Parquet, CSV, JSON
OLAP workloads - Aggregations, window functions, complex analytics

Features

✅ Full OrmLite Support

Complete CRUD operations (sync + async)
LINQ query expressions
Transactions
AutoIncrement with sequences and INSERT...RETURNING
Complex queries (JOINs, aggregations, subqueries)
Parameterized queries
Batch operations
Async/await support (pseudo-async)
High-performance bulk insert - 10-100x faster using DuckDB's Appender API
Bulk insert with deduplication - Production-safe staging table pattern for massive tables
Type-safe LINQ expressions - IntelliSense support for unique column selection
Multi-database support - Query across multiple DuckDB files transparently

✅ Complete Type Support

All .NET primitive types
DateTime, DateTimeOffset, TimeSpan
Decimal and all integer types
Guid (UUID)
byte[] (BLOB)
Nullable types

✅ Production Ready

100 comprehensive tests (100% passing)
Optimized for DuckDB 1.3.2
SQL injection prevention
Robust error handling
Connection timeout/retry for multi-process scenarios

Installation

dotnet add package DuckDB.OrmLite

Quick Start

using ServiceStack.OrmLite;
using DuckDB.OrmLite;

// Create connection factory
var dbFactory = new DuckDbOrmLiteConnectionFactory("Data Source=myapp.db");

// Define your models
public class Customer
{
    [AutoIncrement]
    public int Id { get; set; }

    [Required]
    public string Name { get; set; }

    public string Email { get; set; }

    public decimal CreditLimit { get; set; }

    public DateTime RegisteredAt { get; set; }
}

// Use OrmLite
using var db = dbFactory.Open();

// Create table
db.CreateTable<Customer>();

// Insert
var customer = new Customer
{
    Name = "Acme Corp",
    Email = "[email protected]",
    CreditLimit = 50000,
    RegisteredAt = DateTime.UtcNow
};
db.Insert(customer);

// Query with LINQ
var highValueCustomers = db.Select<Customer>(c =>
    c.CreditLimit > 10000 && c.RegisteredAt > DateTime.UtcNow.AddMonths(-6)
);

// Update
customer.CreditLimit = 75000;
db.Update(customer);

// Aggregations
var totalCredit = db.Scalar<decimal>(
    db.From<Customer>().Select(c => Sql.Sum(c.CreditLimit))
);

// JOINs
var orders = db.Select(db.From<Order>()
    .Join<Customer>((o, c) => o.CustomerId == c.Id)
    .Where<Customer>(c => c.Name == "Acme Corp")
);

// Transactions
using (var trans = db.OpenTransaction())
{
    db.Insert(customer);
    db.Insert(order);
    trans.Commit();
}

Async/Await Support

DuckDB.OrmLite supports async/await for all operations through ServiceStack.OrmLite's built-in async APIs.

Important: Since DuckDB.NET.Data does not provide native async I/O operations, these are pseudo-async implementations (similar to SQLite). Operations will block the calling thread but provide API compatibility with other OrmLite providers.

Async Examples

using var db = dbFactory.Open();

// SELECT operations
var customers = await db.SelectAsync<Customer>(c => c.CreditLimit > 10000);
var customer = await db.SingleAsync<Customer>(c => c.Id == 1);
var count = await db.CountAsync<Customer>();

// INSERT operations
await db.InsertAsync(new Customer { Name = "New Corp", ... });
await db.InsertAllAsync(customers);

// UPDATE operations
customer.CreditLimit = 75000;
await db.UpdateAsync(customer);
await db.UpdateAllAsync(customers);

// DELETE operations
await db.DeleteAsync<Customer>(c => c.CreditLimit < 1000);
await db.DeleteByIdAsync<Customer>(customerId);

// SQL operations
var results = await db.SqlListAsync<Customer>("SELECT * FROM Customer WHERE CreditLimit > 50000");
var total = await db.SqlScalarAsync<decimal>("SELECT SUM(CreditLimit) FROM Customer");

// Transactions work with async too
using (var trans = db.OpenTransaction())
{
    await db.InsertAsync(customer);
    await db.InsertAsync(order);
    trans.Commit();
}

Async Limitations

⚠️ Not true async I/O: Operations still block the calling thread internally

Suitable for maintaining consistent async/await code style
API-compatible with other OrmLite providers
Not suitable for high-concurrency scenarios expecting true async I/O benefits
Consider using synchronous methods if async benefits are not needed

Multi-Database Support

Query across multiple DuckDB database files transparently - perfect for time-series data, archival scenarios, or partitioned datasets.

Basic Configuration

var factory = new DuckDbOrmLiteConnectionFactory("Data Source=main.db")
    .WithAdditionalDatabases("archive_2024.db", "archive_2023.db")
    .WithMultiDatabaseTables("CmcPrice", "FiatPrice");

// Queries automatically span all databases
using var db = factory.Open();
var allPrices = db.Select<CmcPrice>(x => x.Symbol == "BTC");
// Returns data from main.db, archive_2024.db, and archive_2023.db

// Writes go to main database only
using var writeDb = factory.OpenForWrite();
writeDb.Insert(new CmcPrice { Date = DateTime.Today, Symbol = "ETH", Price = 3500 });

Type-Safe Configuration

// Generic factory - automatically configures multi-database support for CmcPrice
var factory = new DuckDbOrmLiteConnectionFactory<CmcPrice>("Data Source=main.db")
    .WithAdditionalDatabases("archive.db");

// Or explicit configuration
var factory = new DuckDbOrmLiteConnectionFactory("Data Source=main.db")
    .WithAdditionalDatabases("archive.db")
    .WithMultiDatabaseTable<CmcPrice>();

Use Case: Time-Series Data with Daily Updates

// Setup: Current year + archives
var factory = new DuckDbOrmLiteConnectionFactory("Data Source=prices_2025.db")
    .WithAdditionalDatabases("prices_2024.db", "prices_2023.db", "prices_2022.db")
    .WithMultiDatabaseTables("CmcPrice");

// Read: Query spans all years transparently
using (var db = factory.Open())
{
    // Get Bitcoin prices for last 2 years
    var btcPrices = db.Select<CmcPrice>(x =>
        x.Symbol == "BTC" &&
        x.Date >= DateTime.Today.AddYears(-2));

    // Aggregations work across all databases
    var avgPrice = db.Scalar<decimal>(
        db.From<CmcPrice>()
            .Where(x => x.Symbol == "BTC")
            .Select(x => Sql.Avg(x.Price))
    );
}

// Write: New data goes to current year database
using (var writeDb = factory.OpenForWrite())
{
    writeDb.Insert(new CmcPrice
    {
        Date = DateTime.Today,
        Symbol = "BTC",
        Price = 95000
    });
}

How It Works

Automatic ATTACH: Additional databases are attached on connection open
Unified Views: Creates {TableName}_Unified views with UNION ALL across all databases
Query Routing: Read queries automatically use unified views; writes go directly to main database
Zero Code Changes: Application code using db.Select<T>() works unchanged

Multi-Database Features

✅ All OrmLite operations work across databases:

SELECT with WHERE, ORDER BY, LIMIT
Aggregations (COUNT, SUM, AVG, MAX, MIN)
JOINs (multi-db table with single-db table)
Complex LINQ expressions
Async operations

✅ Smart table detection:

Only creates views for tables that exist in databases
Handles tables existing in subset of databases

✅ Flexible configuration:

Mix multi-db and single-db tables in same factory
Toggle auto-configuration with .WithAutoConfigureViews(false)

Multi-Database Limitations

⚠️ Important considerations:

Schema consistency: Tables must have identical schemas across all databases
No cross-database transactions: Transactions only work with OpenForWrite() (single database)
Read-only archives: Additional databases should be read-only for data consistency
No automatic deduplication: UNION ALL doesn't deduplicate - ensure partitioning prevents duplicates

Connection Timeout and Retry

DuckDB uses an exclusive file lock when opening a database. If another process has the database open, subsequent attempts will fail immediately. The timeout feature provides automatic retry with exponential backoff:

// Default behavior - fail immediately if database is locked
using var db = factory.Open();

// Retry for up to 30 seconds with exponential backoff
using var db = factory.Open(TimeSpan.FromSeconds(30));

// Also available for write connections
using var writeDb = factory.OpenForWrite(TimeSpan.FromSeconds(30));

// Customize retry delays (useful for high-contention scenarios)
var factory = new DuckDbOrmLiteConnectionFactory("Data Source=myapp.db")
{
    RetryDelayMs = 100,      // Start with 100ms delays (default: 50ms)
    MaxRetryDelayMs = 5000   // Cap at 5 seconds (default: 1000ms)
};

How it works:

Detects lock errors: "Could not set lock", "database is locked", "IO Error"
Exponential backoff: starts at RetryDelayMs, doubles each retry, caps at MaxRetryDelayMs
Random jitter (up to 25% of delay) to avoid thundering herd
Throws TimeoutException if lock not acquired within timeout

Use cases:

Multi-process scenarios where processes may temporarily lock the database
Automated scripts that need to wait for other processes to finish
Services that need resilient database access

Concurrency notes:

Within same process: Multiple threads can read/write concurrently (MVCC)
Across processes: Only one process can open at a time (exclusive lock)
Use timeout feature for cross-process coordination

Best Practices

Recommended partitioning strategies:

Time-based: One database per year/month
Categorical: Separate databases by data type or category
Archival: Current database + historical archives

Workflow pattern:

// Daily process: Keep current database for writes
using (var writeDb = factory.OpenForWrite())
{
    writeDb.InsertAll(todaysData);
}

// Analytics: Query across all time periods
using (var readDb = factory.Open())
{
    var historicalTrends = readDb.Select<CmcPrice>(x =>
        x.Date >= new DateTime(2020, 1, 1));
}

// Year-end: Rotate current to archive
// 1. Copy prices_2025.db to archive location
// 2. Update factory configuration to include new archive
// 3. Create fresh prices_2026.db for new year

Use Cases

Data Analysis & Reporting

// DuckDB excels at analytical queries
var salesByMonth = db.SqlList<dynamic>(@"
    SELECT
        DATE_TRUNC('month', OrderDate) as Month,
        COUNT(*) as OrderCount,
        SUM(TotalAmount) as Revenue
    FROM Orders
    GROUP BY Month
    ORDER BY Month DESC
");

ETL & Data Processing

// High-performance bulk insert (10-100x faster than InsertAll)
db.BulkInsert(largeDataset);  // Uses DuckDB's native Appender API

// For smaller datasets or when transaction control is needed
db.InsertAll(smallDataset);

// Process with SQL
db.ExecuteSql(@"
    INSERT INTO ProcessedOrders
    SELECT * FROM Orders
    WHERE Status = 'completed'
    AND OrderDate > CURRENT_DATE - INTERVAL '30 days'
");

In-Memory Analytics

// Use in-memory database for fast processing
var dbFactory = new DuckDbOrmLiteConnectionFactory("Data Source=:memory:");

High-Performance Bulk Insert

For maximum performance when inserting large datasets, use BulkInsert() which leverages DuckDB's native Appender API:

using var db = dbFactory.Open();

var products = new List<Product>();
for (int i = 0; i < 100000; i++)
{
    products.Add(new Product {
        Id = i,
        Name = $"Product {i}",
        Price = i * 1.5m
    });
}

// BulkInsert is 10-100x faster than InsertAll for large datasets
db.BulkInsert(products);  // Uses DuckDB Appender API

Performance Comparison

Method	1,000 rows	10,000 rows	100,000 rows
`InsertAll()`	~100ms	~1s	~10s
`BulkInsert()`	~10ms	~50ms	~500ms
Speed improvement	10x	20x	20-100x

BulkInsert Features

✅ Blazing fast - Uses DuckDB's native Appender API for direct memory-to-database transfer ✅ All data types supported - DateTime, Guid, decimal, byte[], TimeSpan, etc. ✅ Async support - BulkInsertAsync() available (pseudo-async wrapper) ✅ Simple API - Drop-in replacement for InsertAll()

BulkInsert Limitations

⚠️ Important considerations:

No transaction participation - Appender auto-commits on completion
No return values - Generated IDs are not returned (unlike Insert())
All-or-nothing - If any row fails, entire batch fails
For transactions, use InsertAll() - Standard inserts support explicit transactions

BulkInsert vs InsertAll

Use BulkInsert() when:

Inserting 100+ rows
Performance is critical
You don't need generated ID values returned
Auto-commit behavior is acceptable

Use InsertAll() when:

Inserting < 100 rows
You need transaction control (explicit BEGIN/COMMIT)
You need to mix inserts with other operations in a transaction
You need generated ID values returned

Async Bulk Insert

using var db = dbFactory.Open();

var products = GetLargeDataset();
await db.BulkInsertAsync(products);

// Note: This is pseudo-async (wraps sync operation)
// since DuckDB.NET doesn't provide native async Appender

Bulk Insert with Deduplication (Staging Table Pattern)

For large tables where indexes cannot fit in memory (a known DuckDB limitation), use BulkInsertWithDeduplication(). This is the RECOMMENDED approach for production use with tables containing hundreds of millions of rows.

Why Staging Tables?

DuckDB's ART indexes must fit in memory. With 845+ million rows, maintaining PRIMARY KEY or UNIQUE constraints becomes impractical. The staging table pattern provides:

✅ Zero risk to main table - Data validated in staging before touching main table ✅ Atomic duplicate detection - SQL JOIN ensures no duplicates ✅ Fast rollback - Just drop staging table on error ✅ Minimal lock time - Main table locked only during final INSERT SELECT ✅ High performance - Uses Appender API for staging table loading

Basic Usage

using var db = dbFactory.Open();

// Your 845M row table with composite unique key
public class CryptoPrice
{
    public DateTime Timestamp { get; set; }
    public string Symbol { get; set; }
    public long ExchangeId { get; set; }
    public decimal Price { get; set; }
}

// Load 70,000 new records (internally unique, but may overlap with existing)
var newPrices = LoadNewPrices(); // 70K records

// Option 1: LINQ expression (type-safe, recommended)
var insertedCount = db.BulkInsertWithDeduplication(
    newPrices,
    x => new { x.Timestamp, x.Symbol, x.ExchangeId }
);

// Option 2: String column names (flexible)
var insertedCount = db.BulkInsertWithDeduplication(
    newPrices,
    "Timestamp", "Symbol", "ExchangeId"
);

Console.WriteLine($"Inserted {insertedCount} new records (duplicates filtered)");

How It Works

// Step 1: Create temporary staging table (same schema as main)
CREATE TABLE CryptoPrice_Staging_<guid> AS SELECT * FROM CryptoPrice LIMIT 0

// Step 2: BulkInsert into staging (fast, isolated from main table)
// Uses Appender API - 10-100x faster than InsertAll

// Step 3: Atomic INSERT SELECT with deduplication (handles both internal and external duplicates)
INSERT INTO CryptoPrice
WITH DeduplicatedStaging AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY Timestamp, Symbol, ExchangeId ORDER BY (SELECT NULL)) as rn
    FROM CryptoPrice_Staging_<guid>
)
SELECT columns
FROM DeduplicatedStaging s
LEFT JOIN CryptoPrice m ON
    s.Timestamp = m.Timestamp AND
    s.Symbol = m.Symbol AND
    s.ExchangeId = m.ExchangeId
WHERE m.Timestamp IS NULL  -- Not in main table
  AND s.rn = 1             -- First occurrence from incoming data

// Step 4: DROP staging table (always executed, even on error)

Duplicate Handling:

Internal duplicates (within incoming data): Keeps first occurrence only
External duplicates (with main table): Filters out all matches
Deterministic behavior ensures consistent results

Specifying Unique Columns

Option 1: LINQ Expression (Type-Safe, Recommended)

// Single column
var insertedCount = db.BulkInsertWithDeduplication(
    users,
    x => x.Email
);

// Multiple columns
var insertedCount = db.BulkInsertWithDeduplication(
    prices,
    x => new { x.Timestamp, x.Symbol, x.ExchangeId }
);

// Async
var insertedCount = await db.BulkInsertWithDeduplicationAsync(
    prices,
    x => new { x.Timestamp, x.Symbol }
);

Option 2: String Column Names (Flexible)

var insertedCount = db.BulkInsertWithDeduplication(
    prices,
    "Timestamp", "Symbol", "ExchangeId"
);

Option 3: Auto-Detect from Attributes

// CompositeIndex attribute
[CompositeIndex(nameof(Timestamp), nameof(Symbol), nameof(ExchangeId), Unique = true)]
public class CryptoPrice { ... }

// Auto-detects unique columns from attribute
var insertedCount = db.BulkInsertWithDeduplication(newPrices);

// Single [Unique] attribute
public class User
{
    [Unique]
    public string Email { get; set; }
    ...
}

var insertedCount = db.BulkInsertWithDeduplication(newUsers);

Performance Characteristics

Operation	Time (70K records)	Notes
Append to staging	~5-10ms	Appender API - blazing fast
INSERT SELECT with JOIN	~50-200ms	Depends on main table size
Drop staging	~1ms	Cleanup
Total overhead	~60-210ms	Minimal cost for safety

Compare to risk of direct append:

Direct append failure: Hours to recover 845M row table
Duplicate corruption: Potentially unfixable without backups

845M Row Production Scenario

// Daily ETL process for massive time-series table
public async Task LoadDailyPrices()
{
    using var db = dbFactory.Open();

    // Main table: 845,000,000 rows
    // New batch: 70,000 rows (may contain internal duplicates)
    // Expected: ~500 duplicates with main table, ~69,500 new records

    var newBatch = await FetchDailyPricesAsync(); // 70K records

    var sw = Stopwatch.StartNew();
    var insertedCount = db.BulkInsertWithDeduplication(
        newBatch,
        "Timestamp", "Symbol", "ExchangeId"
    );
    sw.Stop();

    logger.LogInformation(
        "Inserted {Inserted} of {Total} records in {Ms}ms (duplicates filtered: {Duplicates})",
        insertedCount,
        newBatch.Count,
        sw.ElapsedMilliseconds,
        newBatch.Count - insertedCount
    );

    // Typical output:
    // Inserted 69,500 of 70,000 records in 180ms (duplicates filtered: 500)
}

Async Support

var insertedCount = await db.BulkInsertWithDeduplicationAsync(
    newRecords,
    "Timestamp", "Symbol", "ExchangeId"
);

When to Use Each Method

Use BulkInsertWithDeduplication() when:

✅ Table has 100M+ rows (indexes can't fit in memory)
✅ You need duplicate prevention without UNIQUE constraints
✅ Main table safety is critical (zero corruption risk)
✅ You're doing ETL/data loading with potential duplicates
✅ You want automatic cleanup and transactional safety

Use BulkInsert() when:

✅ Data is guaranteed unique (no duplicates possible)
✅ Table is small enough for UNIQUE constraints
✅ Maximum speed is priority (no duplicate checking needed)
✅ You're loading into a temporary/staging table

Use InsertAll() when:

✅ Small datasets (< 100 rows)
✅ Need explicit transaction control
✅ Need generated ID values returned

Configuration

Connection Strings

File-based database:

"Data Source=myapp.db"

In-memory database:

"Data Source=:memory:"

Read-only mode:

"Data Source=myapp.db;Read Only=true"

Required Setup

DuckDB requires parameter handling that differs slightly from other databases. The provider includes a BeforeExecFilter that handles this automatically:

// Automatically configured by DuckDbOrmLiteConnectionFactory
// Handles:
// - Parameter name conversion ($ prefix handling)
// - 1-based positional parameter indexing
// - DbType.Currency → DbType.Decimal conversion

Dependencies

This package depends on:

ServiceStack.OrmLite (>= 8.5.2) - The ORM framework
DuckDB.NET.Data.Full (>= 1.3.0) - .NET bindings for DuckDB

Both dependencies are automatically installed when you add this package.

Compatibility

.NET: 8.0+
DuckDB: 1.3.2+
ServiceStack.OrmLite: 8.5.2+

Performance

DuckDB is optimized for analytical workloads:

Fast aggregations - Columnar storage enables efficient aggregations
Vectorized execution - SIMD optimizations for bulk operations
Memory efficient - Optimized for large datasets
Zero-copy reads - Direct memory access where possible

Limitations

TimeSpan: Limited to ~24 hours when using HH:MM:SS format
Concurrent writes: DuckDB uses single-writer model

Documentation

DuckDB Official Documentation
ServiceStack.OrmLite Documentation
Development Documentation - Implementation details and history

Testing

# Run all tests
dotnet test

# Run specific test category
dotnet test --filter "FullyQualifiedName~AdvancedFeatureTests"

Test coverage:

25 core OrmLite functionality tests
15 advanced feature tests (JOINs, aggregations, edge cases)
17 async/await tests
18 multi-database tests
11 concurrency tests (thread safety, optimistic concurrency)
4 generic factory tests
90 total tests (100% passing)
Production-ready error handling and SQL injection prevention

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

ServiceStack.OrmLite - The excellent ORM framework
DuckDB - The fast in-process analytical database
DuckDB.NET - .NET bindings for DuckDB

Support

Issues: GitHub Issues
ServiceStack OrmLite: ServiceStack Support
DuckDB: DuckDB Discord

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
docs		docs
src/DuckDB.OrmLite		src/DuckDB.OrmLite
tests/DuckDB.OrmLite.Tests		tests/DuckDB.OrmLite.Tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
DuckDB.OrmLite.sln		DuckDB.OrmLite.sln
LICENSE.md		LICENSE.md
README.md		README.md

License

coinstax/DuckDB.OrmLite

Folders and files

Latest commit

History

Repository files navigation

DuckDB.OrmLite

Disclaimer

About

Features

Installation

Quick Start

Async/Await Support

Async Examples

Async Limitations

Multi-Database Support

Basic Configuration

Type-Safe Configuration

Use Case: Time-Series Data with Daily Updates

How It Works

Multi-Database Features

Multi-Database Limitations

Connection Timeout and Retry

Best Practices

Use Cases

Data Analysis & Reporting

ETL & Data Processing

In-Memory Analytics

High-Performance Bulk Insert

Performance Comparison

BulkInsert Features

BulkInsert Limitations

BulkInsert vs InsertAll

Async Bulk Insert

Bulk Insert with Deduplication (Staging Table Pattern)

Why Staging Tables?

Basic Usage

How It Works

Specifying Unique Columns

Performance Characteristics

845M Row Production Scenario

Async Support

When to Use Each Method

Configuration

Connection Strings

Required Setup

Dependencies

Compatibility

Performance

Limitations

Documentation

Testing

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Uh oh!

Languages

Packages