-
Notifications
You must be signed in to change notification settings - Fork 574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Astra DB Source Connector Support #3212
base: main
Are you sure you want to change the base?
FEAT: Astra DB Source Connector Support #3212
Conversation
@erichare I'll check this out this week. Thanks! |
Hi @potter-potter , just wondering would it be helpful to you if i merged in the changes from #3179 into this PR? I can do so and close the other PR if so! |
@property | ||
def filename(self): | ||
return ( | ||
Path(self.processor_config.output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to the wrong folder. It needs to go to the download folder so that the Partitioner can pick it up. So something like this:
@property
def filename(self):
return (
Path(self.read_config.download_dir)
/ self.connector_config.collection_name
/ f"{self.metadata['_id']}.txt"
).resolve()
@SourceConnectionError.wrap | ||
@requires_dependencies(["astrapy"], extras="astra") | ||
@BaseSingleIngestDoc.skip_if_file_exists | ||
def get_file(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this should go to the download folder, I think we usually do something like this. Maybe check out how the mongodb is doing this.
self.filename.parent.mkdir(parents=True, exist_ok=True)
with open(self.filename, "w") as f:
@BaseSingleIngestDoc.skip_if_file_exists | ||
def get_file(self): | ||
with open(self.filename, "w") as f: | ||
f.write(self.metadata["content"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see a "content" key in the data that I was downloading from the collection we are using.
Also, can you have each field's value write to a separate line like we do in Mongodb? Or were you mimicing a different source connector.
For mongodb connector the download .txt file just looks like this:
1901
Kansas Saloon Smashers
American
Unknown
nan
unknown
https://en.wikipedia.org/wiki/Kansas_Saloon_Smashers
@erichare This looks like it's pretty close to being done. |
@potter-potter pushed some updates just now based on the feedback! Thanks so much. EDIT: Also just wanted to check if you want me to just merge in the changes from #3179 into this one, and close 3179? just let me know. |
|
||
@property | ||
def _output_filename(self): | ||
return Path(self.processor_config.output_dir) / f"{self.filename}.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting saved in wrong directory due to self.filename being a path. Should be something like:
return (Path(self.processor_config.output_dir)
/ self.connector_config.collection_name
/ f"{self.metadata['_id']}.json"
)
@erichare You can merge in the changes from 3179. |
Bring in Astra Environment Variable Updates to Source Connector PR
Thanks @potter-potter ! I've done that, made the update to the output paths, and did some little cleanup of the imports. Let me know if this looks good or if there's anything else to address. Thanks so much for the reviews. |
@erichare This all looks good. I'll take over and get it into PR today or tomorrow. Nice work! |
Thanks @potter-potter ! |
This Pull Request adds support for Astra DB as a source connector. The idea being that some collections in Astra DB may not be vector collections. Here, we can pull structured data, and use the destination connector to produce a structured table with vector support. CC @potter-potter