Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Datagen with Protobuf Support and Nested Key References #112

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

recursethis
Copy link

What type of PR is this?

  • ♻️ Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 👷 Optimization
  • 📝 Documentation Update
  • 🚩 Other

Description

  • updated faker-js to 8.0.0
  • added the ability to reference a value in a nested key in the json schema using the dot (.) separator
  • Kafka data generator can produced output messages in the protobuf format. Protobuf schema definitions are referenced in the _meta of the json schema.

Added to documentation?

  • 📜 readme
  • 🙅 no documentation needed

@recursethis recursethis marked this pull request as ready for review February 26, 2024 21:49
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2024

CLA assistant check
All committers have signed the CLA.

@chuck-alt-delete
Copy link
Contributor

Thanks for the PR! I’ll take a look

README.md Outdated
@@ -213,7 +213,11 @@ Here is the general syntax for a JSON input schema:
{
"_meta": {
"topic": "<my kafka topic>",
"key": "<field to be used for kafka record key>" ,
"key": "<field to be used for kafka record key>" ,
"proto": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s ok to add an example that uses protobuf output format, but we shouldn’t make breaking changes to what is required for the input json schema.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am updating the README to add a separate section called:

JSON Schema with Protobuf.

The proto {} meta field is intended to be optional.

@@ -82,6 +82,12 @@ export async function generateMegaRecord(schema: any) {
});
}

// specify the proto field for the topic
if ("proto" in _meta) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be invalid if the user doesn’t specify proto output format. In other words, if they have proto in _meta but are using -f avro for output, then _meta.proto should be ignored.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is that I would have to pass in the format into the generateMegaRecord method to skip the if("proto" in _meta) check.

the _meta.proto is effectively ignored already by all output formats, including Avro, because _meta.proto is only referenced in the protoFormat.ts

@@ -1,7 +1,11 @@
{
"_meta": {
"topic": "air_quality",
"key": "id"
"key": "id",
"proto": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s not change existing tests, but rather add a new test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding a interationIndex-proto.json file to use instead

@@ -3,6 +3,10 @@
"_meta": {
"topic": "mz_datagen_users",
"key": "id",
"proto": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. Let’s not change existing tests to require proto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding a schema-proto.json file to use instead

Comment on lines +65 to +68
async register(megaRecord: any): Promise<void> {
this.schemaFiles = await ProtoFormat.getProtoSchemaFiles(megaRecord);
this.schemas = await ProtoFormat.getProtoSchemas(megaRecord, Array.from(this.schemaFiles));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for that PR, it looks great!

One thing that I believe might be missing is that the schemas do not get registered in the CSR, so not 100% sure how consumers will decode the Protobuf messages.

Copy link
Author

@recursethis recursethis Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need a CSR to use protos with Kafka. Often, consumers will have have their own references to the schemas. The datagen tool working with protos should be independent on how consumers get the protos. The datagen tool is about message production. How consumers decrypt the messages they have the option of

  1. having the .proto files or the generated classes from the .protos
  2. CSR.

Is (2) a requirement for this PR?

@recursethis
Copy link
Author

Hi. Thank you for the feedback!

I'll reply to comments, and I'll make sure to update the PR, once I get time to do so!

Thank you so much for this program! It's great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants