Skip to content

Support upsert operations for SQL datasets #5090

@ElenaKhaustova

Description

@ElenaKhaustova

Description

In the reference project kedro-agentic-workflows, we needed to create a custom dataset (SQLAlchemyEngineDataset) that returns a db_engine. This allowed us to handle writes and updates to the database within agentic workflows (e.g., creating claims, updating sessions, logging interactions).

While this approach works, it feels like a workaround. The current Kedro SQL datasets primarily focus on read-only workflows (e.g., SQLQueryDataSet) or batch inserts. They do not directly support upsert (insert + update) patterns, which are increasingly common when dealing with LLM-driven workflows, streaming data, or session management.

The goal of this ticket is to explore how Kedro can better support upserts in SQL datasets without requiring users to drop down to engine-level operations and whether it makes sense from the framework perspective.

Context

  • In LLM/agent workflows, we frequently need to:
    • Create new records (e.g., new claims).
    • Update existing records (e.g., session end timestamps, claim statuses).
    • Log events incrementally rather than in bulk.
  • Current datasets require manual SQLAlchemy connections or custom datasets.
  • A first-class Upsert-capable SQL dataset could simplify workflows and make them more idiomatic within Kedro.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue: Feature RequestNew feature or improvement to existing feature

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions