-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/cassandra): Add support for Cassandra as a source #11822
base: master
Are you sure you want to change the base?
feat(ingest/cassandra): Add support for Cassandra as a source #11822
Conversation
dc5c84b
to
56b339a
Compare
Things to confirm:
|
set -> ArrayType
list -> ArrayType And UDT -> Flattened structure |
Views & Tables should have the same aspects
And ensure we have table + column comments as description fields. .....And Views have
|
Table + Column Comments should be QA'd locally. |
Also please make sure error handling is clear. And try-excepts are around areas where we need. This connector should only fail in known cases. We should not have ANY uncaught exceptions! |
d8ecb9c
to
2d8f887
Compare
2d8f887
to
f089c77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra_config.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra_utils.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra_utils.py
Outdated
Show resolved
Hide resolved
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra.py
Outdated
Show resolved
Hide resolved
if self.config.cloud: | ||
cloud_config = self.config.cloud_config | ||
assert cloud_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can replace this check as if self.config.cloud_config
if we remove empty defaults in config as suggested earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have additional optional config so this check invalid right? self.config.cloud validated token and bundle path
connect_timeout: int = Field(
default=60,
description="Timeout in seconds for establishing new connections to Cassandra.",
)
request_timeout: int = Field(
default=60, description="Timeout in seconds for individual Cassandra requests."
)
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra_api.py
Show resolved
Hide resolved
column_infos = self.cassandra_session.execute( | ||
CassandraQueries.GET_COLUMNS_QUERY, [keyspace_name, table_name] | ||
) | ||
column_infos = sorted(column_infos, key=lambda c: c.column_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does cassandra provide column order in its system view of columns? Some systems do provide it, and if so we should honour it rather than sorting by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only applied for PRIMARY KEY not for others.
metadata-ingestion/src/datahub/ingestion/source/cassandra/cassandra_api.py
Show resolved
Hide resolved
# Key interests are in setting the AllowAllNetworkAuthorizer, | ||
# and in enabling materialized view feature. | ||
# ---------------------------------------------------------------------------------------- # | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, this would be great -> https://github.com/datahub-project/datahub/pull/9680/files#r1460632601
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the commented part
31f887e
to
7929bd2
Compare
7929bd2
to
5b61cbb
Compare
5b61cbb
to
5534e0c
Compare
Checklist