-
Notifications
You must be signed in to change notification settings - Fork 35
ZeroBus - File Mode Prototype DAB #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a "ZeroBus - File Mode Prototype DAB" that provides a lightweight, no-code file ingestion workflow for Databricks Unity Catalog tables using Auto Loader. The implementation allows users to configure tables via JSON, deploy resources through Databricks Asset Bundles (DAB), and drop files into volume paths for automatic ingestion.
- Implements a complete file push workflow with configuration management, table validation, and auto-loader integration
- Provides debugging capabilities through dev mode deployment and interactive notebook for table configuration refinement
- Includes comprehensive documentation and examples for both quick start and troubleshooting scenarios
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
filepush/dab/src/utils/tablemanager.py | Core table management utilities for config validation, volume path handling, and DataFrame creation with Auto Loader |
filepush/dab/src/utils/initialization.py | Initialization script for setting up workspace resources, volume structure, and environment configuration |
filepush/dab/src/utils/formatmanager.py | Format-specific Auto Loader configuration management for CSV and JSON with validation and option merging |
filepush/dab/src/utils/envmanager.py | Environment configuration management and catalog storage validation utilities |
filepush/dab/src/ingestion.py | DLT pipeline implementation for streaming data ingestion with dynamic table creation |
filepush/dab/src/debug_table_config.py | Interactive debugging notebook for testing and refining table configurations |
filepush/dab/src/configs/tables.json | Example table configuration file defining ingestion parameters |
filepush/dab/resources/*.yml | DAB resource definitions for volume, schema, pipeline, and job configurations |
filepush/dab/databricks.yml | Main DAB configuration with deployment targets and variables |
filepush/README.md | Comprehensive documentation with quick start guide and debugging instructions |
filepush/.gitignore | Git ignore configuration for the project |
CODEOWNERS | Adds code ownership assignment for the filepush directory |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
ws.schemas.update(full_name=f"{catalog_name}.{schema_name}", properties={ | ||
"filepush.volume_path_root": volume_path_root, | ||
"filepush.volume_path_data": volume_path_data, | ||
"filepush.volume_path_data": volume_path_archive |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect property assignment - should be 'filepush.volume_path_archive' instead of 'filepush.volume_path_data' for the archive path.
"filepush.volume_path_data": volume_path_archive | |
"filepush.volume_path_archive": volume_path_archive |
Copilot uses AI. Check for mistakes.
def get_configs() -> list: | ||
json_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "configs", "tables.json") | ||
if not os.path.exists(json_path): | ||
raise RuntimeError(f"Missing table configs file: {json_path}. Please following README.md to create one, deploy and run configuration_job.") |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected 'following' to 'follow' in the error message.
raise RuntimeError(f"Missing table configs file: {json_path}. Please following README.md to create one, deploy and run configuration_job.") | |
raise RuntimeError(f"Missing table configs file: {json_path}. Please follow README.md to create one, deploy and run configuration_job.") |
Copilot uses AI. Check for mistakes.
# schema hints | ||
schema_hints = table_config.get("schema_hints") | ||
if schema_hints: | ||
reader = reader.option("cloudFiles.schemaHints", ", ".join({schema_hints} | fmt_mgr.default_schema)) |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set union operation with string and set will fail. Should convert schema_hints to a set first: {schema_hints}.union(fmt_mgr.default_schema)
or use fmt_mgr.default_schema | {schema_hints}
.
reader = reader.option("cloudFiles.schemaHints", ", ".join({schema_hints} | fmt_mgr.default_schema)) | |
if isinstance(schema_hints, str): | |
schema_hints_set = {schema_hints} | |
else: | |
schema_hints_set = set(schema_hints) | |
reader = reader.option("cloudFiles.schemaHints", ", ".join(schema_hints_set | fmt_mgr.default_schema)) |
Copilot uses AI. Check for mistakes.
|
||
_supported_formats: dict[str, AutoLoaderFormat] = {f.name: f for f in (CSV(), JSON())} | ||
|
||
def get_format_manager(fmt: str) -> dict[str, str]: |
Copilot
AI
Oct 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return type annotation is incorrect. The function returns an AutoLoaderFormat instance, not a dict[str, str].
def get_format_manager(fmt: str) -> dict[str, str]: | |
def get_format_manager(fmt: str) -> AutoLoaderFormat: |
Copilot uses AI. Check for mistakes.
Submission of Lab Project ZeroBus - File Mode
Proposal
CUJ
This is a DAB config that deploys resources to customer's workspace and invoke script jobs for setup of the file push endpoints. No new API or SQL syntax is introduced.