Skip to content
View subhayu99's full-sized avatar
:octocat:
Getting into Open Source
:octocat:
Getting into Open Source

Organizations

@dscciem @CodeChef-CIEM @givemyresume

Block or report subhayu99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
subhayu99/README.md

Hi there, I'm Subhayu Kumar Bala! πŸ‘‹

Typing SVG

πŸš€ About Me

Data and Infrastructure Engineer with 3+ years of experience bridging traditional data engineering with modern AI systems. I specialize in building scalable architectures through event-driven design, containerization, and agentic LLM systems.

  • πŸ† Performance Champion: Reduced a 27-hour SQL procedure to 5 seconds using Python-based solutions for 80M+ rows
  • πŸ€– AI/LLM Expert: Building agentic architectures with fine-tuned models achieving 95%+ accuracy
  • ☁️ Multi-Cloud Specialist: Experienced across AWS, Azure, and GCP ecosystems
  • πŸ“Š Data Pipeline Architect: Designed Bronze-Silver-Gold data layers processing millions of records
  • πŸ”¬ Published Researcher: Co-authored paper on quantum computing simulation systems
  • πŸ“ Currently based in Gurugram, India

πŸ’Ό Professional Experience

πŸ”Ή Data Engineer @ FiftyFive Technologies (Jun 2022 - Present)

  • Data Engineering: Designed optimized pipelines using Python, PySpark, SQL, and Airflow
  • AI Integration: Built AI-powered applications with FastAPI, OpenAI, and LangChain
  • Performance: Achieved 99.9% runtime reduction (27.5 hours β†’ 5 seconds) on 80M+ row processing
  • Impact: Improved campaign outcomes by 16% through real-time dashboards and APIs

πŸ”Ή Software Engineer Intern @ FiftyFive Technologies (Jan 2022 - May 2022)

  • Platform Development: Built cloud-native tools for 50k+ daily users
  • Migration: Successfully migrated PostgreSQL to MongoDB systems
  • DevOps: Implemented CI/CD pipelines with Azure DevOps

πŸ› οΈ Tech Stack & Expertise

Core Programming & Data Processing

Python SQL PySpark Pandas DuckDB

AI & LLM Engineering

OpenAI LangChain TensorFlow Hugging Face

Cloud Platforms & Infrastructure

AWS Azure GCP Terraform

Data Orchestration & Workflow

Apache Airflow dbt Apache Kafka

DevOps & Containerization

Docker Kubernetes Jenkins

Databases & Vector Stores

MongoDB PostgreSQL BigQuery ChromaDB


πŸ† Notable Projects & Achievements

🎯 Johnson & Johnson - Enterprise Data Pipeline Architecture

  • Built scalable Bronze-Silver-Gold data layers in Treasure Data
  • Engineered 12-scenario truth table for multichannel consent processing
  • Integrated data across CDP, S3, Treasure Data, and SFMC platforms

πŸš€ QxLab - SOTA Agent-Based LLM System

  • Built agent-based LLM system with 95%+ accuracy using fine-tuned Mistral7B and Llama 13B
  • Developed advanced CLI tool processing 10B+ tokens in minutes
  • Deployed FastAPI on Docker with GPU-accelerated inference

⚑ CV Advisors - Performance Optimization Breakthrough

  • Reduced 27.5-hour SQL procedure to 5 seconds using Python/Pandas/DuckDB
  • Processed 80M+ rows for 150 clients on a single machine
  • Showcased Python's capability for high-performance data processing

πŸ€– Prospexs - AI-Powered Outreach Platform

  • Reduced manual prospecting effort by 60%
  • Improved client response rates by 45% with AI-generated communications
  • Integrated multiple APIs (OpenAI, Perplexity, LinkedIn) for profile validation

πŸ“Š GitHub Statistics

GitHub Streak

πŸ† GitHub Achievements

GitHub Trophies

πŸ“ˆ Contribution Graph

Contribution Graph

πŸ“š Research & Publications

πŸ”¬ Published Research: "QuDiet: A Classical Simulation Platform for Qubit-Qudit Hybrid Quantum Systems" - IET Quantum Communication (2023)

ORCID


🎯 Current Focus Areas

  • 🧠 Agentic AI Systems: Building sophisticated multi-agent LLM architectures
  • ⚑ Performance Engineering: Optimizing data processing at massive scale
  • ☁️ Multi-Cloud Architecture: Designing platform-agnostic solutions
  • πŸ”„ Real-time Data Streaming: Event-driven architectures with Kafka
  • πŸ€– LLM Fine-tuning: Custom model optimization for domain-specific tasks

🌐 Connect with Me

LinkedIn GitHub Email ORCID Phone


🎨 Profile Stats

Profile Views GitHub followers GitHub stars



"Transforming complex business requirements into scalable technical reality, one optimized pipeline at a time." πŸš€βœ¨

Pinned Loading

  1. datasetpipeline datasetpipeline Public

    A data processing and analysis pipeline designed to handle various jobs related to data transformation, quality assessment, deduplication, and formatting.

    Python 1

  2. smart-commit smart-commit Public

    An AI-powered git commit message generator with repository context awareness, built with Python and Typer.

    Python 1

  3. creatree creatree Public

    A Python package and CLI tool for creating directory structures from a tree-like string.

    Python 1

  4. finadict finadict Public

    A webapp to predict financial prices

    Python 14 18

  5. BetterPassphrase BetterPassphrase Public

    A Python library to generate secure, meaningful passphrases.

    Python 1

  6. PyScripts PyScripts Public

    A collection of some great Python scripts from basic to advance levels for automating some monotonous tasks. This project is for those who have no idea of Open Source contribution but want to get i…

    Python 3 5