Skip to content

mikaeelkhalid/dataops-adk-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” DataOps Agent β€” Turning Data into Actionable Insights with Google’s ADK, VertexAI and Gemini

Unlock data intelligence without the writing SQL. The DataOps ADK Agent enables data scientists to extract insights from BigQuery and Cloud Storage using natural language

Python 3.12+ Google ADK Streamlit Docker Terraform Vertex AI Agent Engine

πŸš€ Quick Start

🐳 Docker (Recommended)

git clone <repository-url>
cd dataops-adk-agent

# Configure environment
cp .env.docker.example .env.docker
# Edit .env.docker with your Google Cloud settings

# Run with Docker
./run_docker.sh

πŸ’» Local Development

# Install dependencies
pip install uv
uv sync
source .venv/bin/activate

# Configure environment
cp dataops/.env.example dataops/.env
# Edit dataops/.env with your Google Cloud settings

# Run locally
streamlit run app/app.py

🌐 Access the application at http://localhost:8501

✨ What This Agent Does

Ask natural language questions about GitHub repositories and get instant insights powered by BigQuery's massive GitHub dataset:

πŸ’¬ Example Queries

  • "What are the top 10 languages by bytes for tensorflow/tensorflow?"
  • "Find files in microsoft/vscode that contain the term 'TODO' and show snippets"
  • "Who are the top committers in the last year for facebook/react?"
  • "Show the top repositories by watch count"
  • "Search for security-related code patterns across repositories"

πŸ”„ How It Works

  1. 🧠 Natural Language Processing: Converts your question into optimized BigQuery SQL
  2. πŸ’° Cost Analysis: Performs dry-run analysis and shows estimated costs
  3. βœ… User Approval: Asks for your permission before executing expensive queries
  4. πŸ“Š Smart Execution: Runs the query and provides intelligent insights
  5. πŸ“ˆ Results Visualization: Displays results in an easy-to-understand format

πŸ—οΈ Architecture

DataOps Agent Architecture

This project implements a sophisticated 3-stage AI agent pipeline:

graph LR
    A[Natural Language Query] --> B[SQL Generator Agent]
    B --> C[Query Explainer Agent]
    C --> D[Query Executor Agent]
    D --> E[Insights & Results]
    
    F[BigQuery GitHub Dataset] --> D
    G[Cost Estimation] --> C
Loading

🧩 Components

Component Purpose Technology
πŸ€– Agent Pipeline Core AI logic & orchestration Google ADK, Python
🌐 Web Interface Interactive user interface Streamlit
☁️ Cloud Deployment Scalable agent hosting Vertex AI Agent Engine
πŸ—οΈ Infrastructure Cloud resource management Terraform, Google Cloud
🐳 Containerization Consistent deployments Docker, Docker Compose

πŸ“Š Data Sources

BigQuery Public Dataset: bigquery-public-data.github_repos

  • πŸ“ 265M+ commits across open-source repositories
  • πŸ“„ 280M+ file contents (text files under 1MB)
  • πŸ—‚οΈ 2.3B+ file metadata entries
  • πŸ“¦ 3.3M+ repositories with detailed information
  • 🏷️ Programming languages, licenses, and contributor data

πŸ› οΈ Technology Stack

Core Technologies

  • 🐍 Python 3.12+ - Modern Python with latest features
  • ⚑ UV Package Manager - Fast, reliable dependency management
  • πŸ€– Google ADK - Agent Development Kit for AI agents
  • 🎨 Streamlit - Interactive web applications
  • 🐳 Docker - Containerization platform

Google Cloud Services

  • 🧠 Vertex AI Agent Engine - Managed AI agent hosting
  • πŸ“Š BigQuery - Serverless data warehouse
  • ☁️ Cloud Storage - Object storage for artifacts
  • πŸ” IAM - Identity and access management

Infrastructure & DevOps

  • πŸ—οΈ Terraform - Infrastructure as Code
  • πŸ™ Docker Compose - Local development orchestration
  • πŸ“‹ GitHub Actions - CI/CD pipelines

πŸš€ Deployment Options

🐳 Local Docker Development

Perfect for development and testing:

./run_docker.sh

☁️ Google Cloud Production

Scalable cloud deployment:

# Deploy infrastructure
cd infra && terraform apply

# Deploy agent
cd agent-deployment && python deploy.py

πŸ–₯️ Local Development

Direct Python execution:

source .venv/bin/activate
streamlit run app/app.py

πŸ“‹ Prerequisites

Required Tools

Google Cloud Setup

  1. Create a Google Cloud Project
  2. Enable required APIs:
    • Vertex AI API
    • BigQuery API
    • Cloud Storage API
  3. Set up authentication:
    • Service account or Application Default Credentials
  4. Configure IAM roles:
    • BigQuery User
    • BigQuery Job User
    • AI Platform User

πŸ”§ Configuration

Environment Variables

Create dataops/.env (local) or .env.docker (Docker):

# Google Cloud Configuration
GOOGLE_GENAI_USE_VERTEXAI=1
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1
GOOGLE_CLOUD_STORAGE_BUCKET=gs://your-bucket-name

# Agent Configuration (populated after deployment)
AGENT_ENGINE_ID=projects/.../locations/.../reasoningEngines/...

Terraform Variables

Edit infra/terraform.tfvars:

project_id = "your-project-id"
region = "us-central1"
agent_bucket_name = "your-unique-bucket-name"

πŸ§ͺ Testing

Local Agent Testing

cd dataops/
adk run  # Interactive testing
adk web  # Web interface testing

Deployed Agent Testing

cd agent-deployment/
python test_deployment.py

Web Application Testing

# Local testing
streamlit run app/app.py

# Docker testing
./run_docker.sh

# Production testing
./run_production.sh

πŸ“š Documentation

🀝 Contributing

We welcome contributions! Please see our Developer's Guide for detailed information on:

  • πŸ—οΈ Project architecture and components
  • πŸ’» Setting up the development environment
  • πŸ§ͺ Running tests and validation
  • πŸ“¦ Building and deploying changes
  • πŸ› Debugging and troubleshooting

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

  • πŸ“– Documentation: Check the Developer's Guide
  • πŸ› Issues: Report bugs via GitHub Issues
  • πŸ’¬ Discussions: Join GitHub Discussions for questions
  • πŸ“§ Contact: Reach out to the development team

πŸ™ Acknowledgments

  • Google Cloud for the Agent Development Kit and BigQuery
  • GitHub for the public repositories dataset
  • Streamlit for the amazing web app framework
  • Open Source Community for the tools and libraries used

πŸ” Ready to explore GitHub repositories like never before? Get started with DataOps ADK Agent today!