π DataOps Agent β Turning Data into Actionable Insights with Googleβs ADK, VertexAI and Gemini
Unlock data intelligence without the writing SQL. The DataOps ADK Agent enables data scientists to extract insights from BigQuery and Cloud Storage using natural language
git clone <repository-url>
cd dataops-adk-agent
# Configure environment
cp .env.docker.example .env.docker
# Edit .env.docker with your Google Cloud settings
# Run with Docker
./run_docker.sh# Install dependencies
pip install uv
uv sync
source .venv/bin/activate
# Configure environment
cp dataops/.env.example dataops/.env
# Edit dataops/.env with your Google Cloud settings
# Run locally
streamlit run app/app.pyπ Access the application at http://localhost:8501
Ask natural language questions about GitHub repositories and get instant insights powered by BigQuery's massive GitHub dataset:
- "What are the top 10 languages by bytes for tensorflow/tensorflow?"
- "Find files in microsoft/vscode that contain the term 'TODO' and show snippets"
- "Who are the top committers in the last year for facebook/react?"
- "Show the top repositories by watch count"
- "Search for security-related code patterns across repositories"
- π§ Natural Language Processing: Converts your question into optimized BigQuery SQL
- π° Cost Analysis: Performs dry-run analysis and shows estimated costs
- β User Approval: Asks for your permission before executing expensive queries
- π Smart Execution: Runs the query and provides intelligent insights
- π Results Visualization: Displays results in an easy-to-understand format
This project implements a sophisticated 3-stage AI agent pipeline:
graph LR
A[Natural Language Query] --> B[SQL Generator Agent]
B --> C[Query Explainer Agent]
C --> D[Query Executor Agent]
D --> E[Insights & Results]
F[BigQuery GitHub Dataset] --> D
G[Cost Estimation] --> C
| Component | Purpose | Technology |
|---|---|---|
| π€ Agent Pipeline | Core AI logic & orchestration | Google ADK, Python |
| π Web Interface | Interactive user interface | Streamlit |
| βοΈ Cloud Deployment | Scalable agent hosting | Vertex AI Agent Engine |
| ποΈ Infrastructure | Cloud resource management | Terraform, Google Cloud |
| π³ Containerization | Consistent deployments | Docker, Docker Compose |
BigQuery Public Dataset: bigquery-public-data.github_repos
- π 265M+ commits across open-source repositories
- π 280M+ file contents (text files under 1MB)
- ποΈ 2.3B+ file metadata entries
- π¦ 3.3M+ repositories with detailed information
- π·οΈ Programming languages, licenses, and contributor data
- π Python 3.12+ - Modern Python with latest features
- β‘ UV Package Manager - Fast, reliable dependency management
- π€ Google ADK - Agent Development Kit for AI agents
- π¨ Streamlit - Interactive web applications
- π³ Docker - Containerization platform
- π§ Vertex AI Agent Engine - Managed AI agent hosting
- π BigQuery - Serverless data warehouse
- βοΈ Cloud Storage - Object storage for artifacts
- π IAM - Identity and access management
- ποΈ Terraform - Infrastructure as Code
- π Docker Compose - Local development orchestration
- π GitHub Actions - CI/CD pipelines
Perfect for development and testing:
./run_docker.shScalable cloud deployment:
# Deploy infrastructure
cd infra && terraform apply
# Deploy agent
cd agent-deployment && python deploy.pyDirect Python execution:
source .venv/bin/activate
streamlit run app/app.py- Python 3.12+ - Download
- UV Package Manager -
pip install uv - Docker - Install Docker
- Google Cloud CLI - Install gcloud
- Create a Google Cloud Project
- Enable required APIs:
- Vertex AI API
- BigQuery API
- Cloud Storage API
- Set up authentication:
- Service account or Application Default Credentials
- Configure IAM roles:
- BigQuery User
- BigQuery Job User
- AI Platform User
Create dataops/.env (local) or .env.docker (Docker):
# Google Cloud Configuration
GOOGLE_GENAI_USE_VERTEXAI=1
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_CLOUD_STORAGE_BUCKET=gs://your-bucket-name
# Agent Configuration (populated after deployment)
AGENT_ENGINE_ID=projects/.../locations/.../reasoningEngines/...Edit infra/terraform.tfvars:
project_id = "your-project-id"
region = "us-central1"
agent_bucket_name = "your-unique-bucket-name"cd dataops/
adk run # Interactive testing
adk web # Web interface testingcd agent-deployment/
python test_deployment.py# Local testing
streamlit run app/app.py
# Docker testing
./run_docker.sh
# Production testing
./run_production.sh- π Developer's Guide - Comprehensive development documentation
- ποΈ Architecture Details - System design and components
- π Deployment Guide - Production deployment instructions
- π Troubleshooting - Common issues and solutions
We welcome contributions! Please see our Developer's Guide for detailed information on:
- ποΈ Project architecture and components
- π» Setting up the development environment
- π§ͺ Running tests and validation
- π¦ Building and deploying changes
- π Debugging and troubleshooting
This project is licensed under the MIT License - see the LICENSE file for details.
- π Documentation: Check the Developer's Guide
- π Issues: Report bugs via GitHub Issues
- π¬ Discussions: Join GitHub Discussions for questions
- π§ Contact: Reach out to the development team
- Google Cloud for the Agent Development Kit and BigQuery
- GitHub for the public repositories dataset
- Streamlit for the amazing web app framework
- Open Source Community for the tools and libraries used
π Ready to explore GitHub repositories like never before? Get started with DataOps ADK Agent today!
