[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics #3267

drc-infinyon · 2023-05-20T01:27:45Z

Related RFC: #3081

Summary

This product brief describe the need for schema management functionality. There are folks in our developer community who have asked if we support functionality similar to the Kafka Schema registry. This document will describew the problem space and the functionality needed to serve InfinyOn Customer.

Opportunity

Schema is an essential input to implementing and maintaining data contracts and data quality. Majority of the data world operate on defined schemas and data models. The ability to implement a schema on the topics will enable my different features including enabling time window based aggregation and matrialized views which relies on a tabular structure.

Target audience

Schema management will be relevant for InfinyOn Cloud Developers as well as analysts to implement a schema configuration in their data flows.

Customer Insights

Among our current user feedback, we have an IoT company who described their need for schemas.

They receive data from sensors which are made and deployed by different vendors and they send similar payloads with differences in the attribute names, metric systems of dimensions. These differences need to be reconciled in the process of cleanup. Below is 5 minutes of the customer describing the use case.

Another consumption pattern shared by a SaaS company developing usage based billing who receives consumption data from their users and provides them the capability of billing and invoicing.

Experience

Currently, users may have a wide range of experiences with the schema given that schema is handled differently in different systems like databases or streaming tools like Kafka.

As we consider the experience of how the schema management would look like for the InfinyOn Cloud user we need to be informed by the data sources, the payload, and the consumption patterns.

For instance, if we are looking at semi-structured data from web pages, RSS feeds, clickstream we would expect XML, JSON inputs. As we consider the consumption patterns and the serialization deserialization requirements, we have come across customers and prospects who use Avro, Protobuf as serialization patterns and the data gets store in a flavour of Parquet like Hudi or iceberg or other optimized columnar formats like arrow.

Now the schema provides the ability to model semi-structured data in a tabular model, which enables the ability to perform aggregation, create derived columns, and model the data for analytical workflows.

For InfinyOn customers, we need to enable a schema management on the data collected from the edge to generate alerts on schema change or issues with the payload from the source and dynamic computation using smart modules based on attribute values.

Acceptance Criteria

Ability to define a schema configuration using YAML files specifying the schema type and the keys
Ability to apply the schema configuration using the Fluvio CLI
Ability to detect changes in the schema or incorrect data and generate error messages

Competitive Insights

Confluent Schema Registry: https://docs.confluent.io/platform/current/schema-registry/index.html
Slalom schema registry introduction: https://medium.com/slalom-technology/introduction-to-schema-registry-in-kafka-915ccf06b902
Confluent Schema Registry 101, Avro, JSON: https://youtu.be/ovIsHhIrie8

Interface

Configuration

Schema configuration example applied to topic:

*schema-config.yaml*
meta:
  name: column-schema-1
  version: 1.0 # semver expected
  # schema names a smart module conforming to a smart module schema interface
	schema-provider: infinyon/[email protected] # alternative include column, protobuf, parquet, arrow

# spec is a user defined custom specification string, the schema does not parse the spec is passed to the schema smartmodule
# as a opaque string
spec: |
	- name: fruit_id
	  key: true
	  type: integer
	- name: fruit_name
	  type: string
	- name: fruit_color
	  type: string

CLI

CLI Commands concept

fluvio schema create

fluvio schema list

fluvio schema describe SCHEMA_NAME[@VERSION]

fluvio schema apply SCHEMA_NAME TOPIC_NAME

fluvio schema remove SCHEMA_NAME TOPIC_NAME

fluvio schema delete

fluvio schema disable SCHEMA_NAME@VERSION

fluvio schema create --config schema-config.yaml

The text was updated successfully, but these errors were encountered:

ajhunyady · 2023-05-20T14:07:58Z

@drc-infinyon, as per our conversation, the schema should be applied at the topic level. Do you have the notes or a pick from the whiteboard session?

fluvio topic create <name> --config <config with schema definition>
fluvio topic apply <name> --config <config with schema definition>

drc-infinyon added RFC features/materialize_view labels May 20, 2023

drc-infinyon added this to InfinyOn Public Roadmap May 20, 2023

drc-infinyon moved this to 🏷 Features in InfinyOn Public Roadmap May 20, 2023

drc-infinyon moved this from 🏷 Features to 🏗 In progress in InfinyOn Public Roadmap Jun 12, 2023

drc-infinyon changed the title ~~Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics~~ [Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics #3267

[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics #3267

drc-infinyon commented May 20, 2023 •

edited by ajhunyady

Loading

ajhunyady commented May 20, 2023

[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics #3267

[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics #3267

Comments

drc-infinyon commented May 20, 2023 • edited by ajhunyady Loading

Summary

Opportunity

Target audience

Customer Insights

Experience

Acceptance Criteria

Competitive Insights

Interface

Configuration

CLI

ajhunyady commented May 20, 2023

drc-infinyon commented May 20, 2023 •

edited by ajhunyady

Loading