Hoptimator turns SQL into running, multi-hop data pipelines that span Kafka, Flink, Venice, and anything else you plug in. You declare what you want — a materialized view from one system into another — and Hoptimator plans the topology, generates the specs, deploys them, and reconciles them.
CREATE MATERIALIZED VIEW ADS.AUDIENCE AS
SELECT FIRST_NAME, LAST_NAME
FROM ADS.PAGE_VIEWS NATURAL JOIN PROFILE.MEMBERS;What that statement becomes depends on the templates and databases registered in your environment. With a typical Kafka + Flink setup, it expands into:
- a
Viewand aPipelineresource, - a connector configuration on each side,
- a Flink SQL job that maintains the result,
- and any intermediate hops (e.g. CDC topics) the planner determined were needed to get from sources to sink.
Swap in different templates and the same SQL can target a different stack.
The deployment target is pluggable — the bundled deployers target Kubernetes,
but hoptimator-api is the actual extension point.
- One SQL surface across many systems. Kafka, Flink, Venice, MySQL — and pluggable for the rest. The catalog is unified; joins span systems.
- Multi-hop, declarative. You don't write Flink jobs and you don't request topics. The planner figures out the topology from a query.
- Kubernetes out of the box, not as a hard requirement. The bundled
deployers target Kubernetes, so pipelines show up as first-class CRDs and
kubectl get pipelinesJust Works. TheDeployerinterface is the actual extension point — anything that knows how to materialize a spec can take the place of the defaults. - Inspectable before it deploys.
!specify(CLI) andplan(MCP) emit the exact specs Hoptimator would apply. No "magic" deploys. - Pluggable. New sources, sinks, engines, deployers, and validators are all
extension points on
hoptimator-api.
You need Docker Desktop with Kubernetes enabled (or kind), kubectl, and
JDK 17+. Then:
make build install # build the project and install the SQL CLI
make deploy-demo # install CRDs and a couple of demo databases
./hoptimator # start the SQL CLI
> !introInside the CLI, declare a materialized view:
CREATE MATERIALIZED VIEW ADS.AUDIENCE AS
SELECT FIRST_NAME, LAST_NAME
FROM ADS.PAGE_VIEWS NATURAL JOIN PROFILE.MEMBERS;Then in another terminal, watch what showed up:
kubectl get views
kubectl get pipelinesFor a full walkthrough — including how to inspect the plan before deploying and how to clean up — see the Quickstart.
SQL ──▶ Planner ──▶ Pipeline (sources, sink, job)
│
▼
Deployers
│
▼
Kubernetes resources
(Pipeline, KafkaTopic,
FlinkSessionJob, …)
│
▼
Operator
(reconcile loop)
Hoptimator plays three roles: planner (parse + optimize the SQL across the unified catalog), adapter (translate plan elements into target-system specs), and operator (apply specs to Kubernetes and reconcile drift). The same machinery powers the SQL CLI, the JDBC driver, the MCP server, and the standalone operator.
For the long version, see the Architecture overview.
The full docs live in docs/:
- Getting started — quickstart, concepts, architecture.
- User guide — SQL CLI, JDBC driver, MCP server, DDL reference, hints.
- Kubernetes guide — operator, CRD reference, templates, triggers, configuration.
- Extending Hoptimator — adding data sources, writing deployers, validators, config providers.
- Learn more — engineering blog posts and case studies.
Hoptimator is alpha. APIs — including the SQL grammar, the
hoptimator-api interfaces, and the v1alpha1 CRDs — are subject to change
without notice. The project is still early-stage and experimental from an open
source perspective; if you adopt it today, expect to follow main and pin to
specific versions deliberately.
That said, Hoptimator is not a research toy: LinkedIn runs production pipelines on it internally. Pre-release artifacts for the modules in this repo are published to LinkedIn's JFrog Artifactory.
Bug reports, feature requests, and PRs are welcome. See CONTRIBUTING.md for how to file an issue, send a pull request, or report a security vulnerability.