Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

https://github.com/infinyon/fluvio/issues/3267 #3081

Open
Tracked by #3231
sehz opened this issue Mar 19, 2023 · 0 comments
Open
Tracked by #3231

https://github.com/infinyon/fluvio/issues/3267 #3081

sehz opened this issue Mar 19, 2023 · 0 comments

Comments

@sehz
Copy link
Contributor

sehz commented Mar 19, 2023

Topic with Columns

Column Schema is an opt-in feature. All topic-related features (consumer and produce) will work as before. If column schema is enabled, the record and key are interpreted according to a scheme definition very similar to a relation or SQL database. The column schema can be configured using the YAML file. For example, here is a sample schema fruit.yaml

- name: fruit_id
  key: true
  type: integer
- name: fruit_name
  type: string
- name: fruit_color
  type: string

Column Rules

Each column has the following attributes:

name type mandatory comment
name string Y
type enum Y bool, string, integer
key bool N Only a single column can be marked as key
optional bool N
validate transformations N for performing validation. it needs to return Option<Value>

A key column will be used to interpret the key value. All other columns will be used to interpret the value part of the record.

Creating topic with columns

$ fluvio topic create fruits --columns fruit.yaml

topic "fruits" with the column is created

Produce with JSON

The CLI is extended to create column records using a special JSON format. It has two fields key and value JSON field. For example, fruit records can be specified using the following JSON (`fruits.json'):

[
{ "key":1, "value": { "fruit_name": "Banana", "fruit_color": "Yellow", "description": "Yum" }},
 { "key":2, "value": { "fruit_name":"Apple", "fruit_color":"Red" }}
]

Then it can be processed using this syntax:

$ fluvio produce --json-file fruits.json
2 records produced

The parsing behavior is:

  • for the mandatory field, if the column is not found, display error, skip record & continue.
  • for type parsing, display error, skip record & continue.
  • for validation error, display error, skip record & continue.
  • for undefined field, ignore field, display warning, build records & continue.

Inspecting (consuming) Data

Existing consumer behavior stays the same except following scenarios

Table Output

With column type, we can interpret records as the table directly.

$ fluvio consume metrics -Bd --output table

fruit_id  fruit_name   fruit_color   description
-------  -----------  --------       -----------
1           Banana          Yellow        Yummy
2.         Apple             Red      

JSON output

Consume with JSON output format applies columns by default (for columnar topics):

$ fluvio consume metrics -Bd --output json 
{
	"fruid_id": 1,
	"fruit_name": "Banana",
	"fruit_color": "Yellow",
	"description": "Yummy"
}
{
	"fruit_id": 2,
	"fruit_name": "Apple",
	"fruit_column": "Yellow",
}

for JSON consume, output without column applied, use --ignore-columns keyword:

Topic listing

The topic listing will have a new column: COLUMNS to indicate the topic has columns applied:

$ fluvio topic list
NAME          COLUMNS    TYPE      PARTITIONS  REPLICAS  RETENTION  COMPRESSION  STATUS                   REASON 
topic-1                 computed      1           1        7days       any      provisioned
fruits         Y       computed      1           1        7days       any      provisioned

NOTE:

  • Add a new column COLUMNS with Y if applied, nothing if not.
@sehz sehz changed the title feature: Column Schema RFC: Column Schema Apr 18, 2023
@sehz sehz added the RFC label Apr 18, 2023
@sehz sehz mentioned this issue Apr 18, 2023
6 tasks
@sehz sehz added this to the 0.10.8 milestone Apr 18, 2023
@sehz sehz modified the milestones: 0.10.8, 0.10.9 May 1, 2023
@sehz sehz removed this from the 0.10.9 milestone May 13, 2023
@drc-infinyon drc-infinyon changed the title RFC: Column Schema https://github.com/infinyon/fluvio/issues/3267 May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant