Python wrappers for source creation #1016

pyalex · 2025-07-18T17:27:03Z

Summary

Adding functions to the Python SDK for source objects creation. This is a fully backward-compatible change. Users can continue to use both thrift-based classes and new Python wrappers.

Why / Goal

The primary motivation is to enable the addition of extra attributes at the source level. Similarly to how it's done in GroupBy and Join: all extra arguments are stored in the customJson attribute in Thrift.
Sources can have all sorts of metadata, ie bootstrap.server for Kafka source, which can be helpful for a streaming job.

Additional benefits:

Less verbose API
before:

my_source = ttypes.Source(
    events=ttypes.EventSource(
          table=...
    )
)

after:

my_source = source.EventSource(
    table=...
)

Improving API consistency: existing Python wrappers (ie, GroupBy, Join) use Pythonic snake case for parameter names, whereas code generated from Thrift uses camel case (ie, snapshotTable in EntitySource)
Omitting a required attribute will produce a more meaningful error

Test Plan

Added Unit Tests
[ x ] Covered by existing CI
Integration tested

Checklist

Documentation update

Reviewers

api/py/ai/chronon/source.py

nikhil-zlai

I like the source wrappers. Can you edit docs also please?

Given that there is already a way to specify stream config alongside the topic, can we skip the api change to add customJson?

For specifying streaming params - we use topicInfo:

chronon/online/src/main/scala/ai/chronon/online/DataStreamBuilder.scala

Line 32 in bf9497b

    
           // kafka://topic_name/schema=my_schema/host=X/port=Y should parse into TopicInfo(topic_name, kafka, {schema: my_schema, host: X, port Y})

case class TopicInfo(name: String, topicType: String, params: Map[String, String])

 topic="kafka://topic_name/schema=my_schema/host=X/port=Y" will parse into 

TopicInfo(topic_name, kafka, {schema: my_schema, host: X, port Y})

api/py/ai/chronon/source.py

api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1

api/py/ai/chronon/source.py

pyalex · 2025-07-18T18:51:10Z

Hey @nikhil-zlai , thanks for the review!
There's more use for those extra attributes, than just Kafka host and port. For example, I want to store the Avro JSON schema near the source definition and attach it to the source. Or specify all kinds of Kafka consumer properties.

TopicInfo has limited usage since it makes / and = special symbols, and if I were to add anything encoded with base64 to this topic string, it would simply break.

pyalex · 2025-07-18T20:21:04Z

Updated docs

nikhil-zlai · 2025-07-18T22:21:27Z

I want to store the Avro JSON schema near the source definition and attach it to the source. Or specify all kinds of Kafka consumer properties.

I see. That definitely justifies the change.

api/py/ai/chronon/source.py

nikhil-zlai

one minor nit. but lgtm!

pyalex · 2025-07-23T15:54:40Z

@hzding621, please take another look

pyalex · 2025-08-04T19:01:51Z

Ping @hzding621

pengyu-hou · 2025-09-29T21:07:15Z

api/thrift/api.thrift

+    /**
+    * Any extra attributes can be stored here. Ie, Kafka bootstrap servers for a streaming source, or AWS IAM role for accessing Iceberg table
+    **/
+    5: optional string customJson


can we change it to use the MetaData, because we already have customJson in MetaData. It is easier to wrap everything in Metadata so we don't have to add new fields in the future

Suggested change

5: optional string customJson

5: optional MetaData metaData

@pengyu-hou I think we just went straight to customJson since there are a few fields in Metadata that don't apply to sources

10: optional bool consistencyCheck // percentage of online serving requests to log to warehouse 11: optional double samplePercent

For example

pengyu-hou · 2025-09-29T21:07:50Z

api/thrift/api.thrift

+    /**
+    * Any extra attributes can be stored here. Ie, Kafka bootstrap servers for a streaming source, or AWS IAM role for accessing Iceberg table
+    **/
+    5: optional string customJson


Ditto, use MetaData instead of creating a new customJson field.

Suggested change

5: optional string customJson

5: optional MetaData metaData

python wrapper for sources

450004a

pyalex changed the title ~~Python wrappers for source objects~~ Python wrappers for source creation Jul 18, 2025

Oleksii Moskalenko added 3 commits July 18, 2025 13:32

source.py

5468b95

doc strings

ef96b80

lint

ed33b97

pyalex marked this pull request as ready for review July 18, 2025 17:59

nikhil-zlai reviewed Jul 18, 2025

View reviewed changes

api/py/ai/chronon/source.py Outdated Show resolved Hide resolved

nikhil-zlai requested changes Jul 18, 2025

View reviewed changes

api/py/ai/chronon/source.py Outdated Show resolved Hide resolved

api/py/test/sample/production/group_bys/sample_team/event_sample_group_by.v1 Outdated Show resolved Hide resolved

api/py/ai/chronon/source.py Outdated Show resolved Hide resolved

Oleksii Moskalenko added 2 commits July 18, 2025 16:09

docs

91ffc91

more docs

f754bc0

test signatures match

94c60d8

pyalex requested review from hzding621 and nikhil-zlai July 21, 2025 14:58

nikhil-zlai reviewed Jul 21, 2025

View reviewed changes

api/py/ai/chronon/source.py Outdated Show resolved Hide resolved

nikhil-zlai approved these changes Jul 21, 2025

View reviewed changes

Oleksii Moskalenko and others added 2 commits July 22, 2025 16:06

empty customJson

7fca3d2

Merge branch 'main' into netflix/custom-attributes-in-source

985d052

pengyu-hou reviewed Sep 29, 2025

View reviewed changes

Python wrappers for source creation #1016

Are you sure you want to change the base?

Python wrappers for source creation #1016

Uh oh!

Conversation

pyalex commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why / Goal

Test Plan

Checklist

Reviewers

Uh oh!

Uh oh!

nikhil-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyalex commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pyalex commented Jul 18, 2025

Uh oh!

nikhil-zlai commented Jul 18, 2025

Uh oh!

Uh oh!

nikhil-zlai left a comment

Choose a reason for hiding this comment

Uh oh!

pyalex commented Jul 23, 2025

Uh oh!

pyalex commented Aug 4, 2025

Uh oh!

pengyu-hou Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

abbywh Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

pengyu-hou Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pyalex commented Jul 18, 2025 •

edited

Loading

pyalex commented Jul 18, 2025 •

edited

Loading