You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This product brief describe the need for schema management functionality. There are folks in our developer community who have asked if we support functionality similar to the Kafka Schema registry. This document will describew the problem space and the functionality needed to serve InfinyOn Customer.
Opportunity
Schema is an essential input to implementing and maintaining data contracts and data quality. Majority of the data world operate on defined schemas and data models. The ability to implement a schema on the topics will enable my different features including enabling time window based aggregation and matrialized views which relies on a tabular structure.
Target audience
Schema management will be relevant for InfinyOn Cloud Developers as well as analysts to implement a schema configuration in their data flows.
Customer Insights
Among our current user feedback, we have an IoT company who described their need for schemas.
They receive data from sensors which are made and deployed by different vendors and they send similar payloads with differences in the attribute names, metric systems of dimensions. These differences need to be reconciled in the process of cleanup. Below is 5 minutes of the customer describing the use case.
Another consumption pattern shared by a SaaS company developing usage based billing who receives consumption data from their users and provides them the capability of billing and invoicing.
Experience
Currently, users may have a wide range of experiences with the schema given that schema is handled differently in different systems like databases or streaming tools like Kafka.
As we consider the experience of how the schema management would look like for the InfinyOn Cloud user we need to be informed by the data sources, the payload, and the consumption patterns.
For instance, if we are looking at semi-structured data from web pages, RSS feeds, clickstream we would expect XML, JSON inputs. As we consider the consumption patterns and the serialization deserialization requirements, we have come across customers and prospects who use Avro, Protobuf as serialization patterns and the data gets store in a flavour of Parquet like Hudi or iceberg or other optimized columnar formats like arrow.
Now the schema provides the ability to model semi-structured data in a tabular model, which enables the ability to perform aggregation, create derived columns, and model the data for analytical workflows.
For InfinyOn customers, we need to enable a schema management on the data collected from the edge to generate alerts on schema change or issues with the payload from the source and dynamic computation using smart modules based on attribute values.
Acceptance Criteria
Ability to define a schema configuration using YAML files specifying the schema type and the keys
Ability to apply the schema configuration using the Fluvio CLI
Ability to detect changes in the schema or incorrect data and generate error messages
*schema-config.yaml*
meta:
name: column-schema-1
version: 1.0 # semver expected
# schema names a smart module conforming to a smart module schema interface
schema-provider: infinyon/[email protected] # alternative include column, protobuf, parquet, arrow
# spec is a user defined custom specification string, the schema does not parse the spec is passed to the schema smartmodule
# as a opaque string
spec: |
- name: fruit_id
key: true
type: integer
- name: fruit_name
type: string
- name: fruit_color
type: string
drc-infinyon
changed the title
Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics
[Feature] Schema management functionality on streams for InfinyOn Cloud and Fluvio Topics
Jul 5, 2023
Related RFC: #3081
Summary
This product brief describe the need for schema management functionality. There are folks in our developer community who have asked if we support functionality similar to the Kafka Schema registry. This document will describew the problem space and the functionality needed to serve InfinyOn Customer.
Opportunity
Schema is an essential input to implementing and maintaining data contracts and data quality. Majority of the data world operate on defined schemas and data models. The ability to implement a schema on the topics will enable my different features including enabling time window based aggregation and matrialized views which relies on a tabular structure.
Target audience
Schema management will be relevant for InfinyOn Cloud Developers as well as analysts to implement a schema configuration in their data flows.
Customer Insights
Among our current user feedback, we have an IoT company who described their need for schemas.
They receive data from sensors which are made and deployed by different vendors and they send similar payloads with differences in the attribute names, metric systems of dimensions. These differences need to be reconciled in the process of cleanup. Below is 5 minutes of the customer describing the use case.
Another consumption pattern shared by a SaaS company developing usage based billing who receives consumption data from their users and provides them the capability of billing and invoicing.
Experience
Currently, users may have a wide range of experiences with the schema given that schema is handled differently in different systems like databases or streaming tools like Kafka.
As we consider the experience of how the schema management would look like for the InfinyOn Cloud user we need to be informed by the data sources, the payload, and the consumption patterns.
For instance, if we are looking at semi-structured data from web pages, RSS feeds, clickstream we would expect XML, JSON inputs. As we consider the consumption patterns and the serialization deserialization requirements, we have come across customers and prospects who use Avro, Protobuf as serialization patterns and the data gets store in a flavour of Parquet like Hudi or iceberg or other optimized columnar formats like arrow.
Now the schema provides the ability to model semi-structured data in a tabular model, which enables the ability to perform aggregation, create derived columns, and model the data for analytical workflows.
For InfinyOn customers, we need to enable a schema management on the data collected from the edge to generate alerts on schema change or issues with the payload from the source and dynamic computation using smart modules based on attribute values.
Acceptance Criteria
Competitive Insights
Interface
Configuration
Schema configuration example applied to topic:
CLI
CLI Commands concept
The text was updated successfully, but these errors were encountered: