exp by Eamon2009 · Pull Request #58 · Eamon2009/Quadtrix.cpp

Eamon2009 · 2026-05-31T13:52:10Z

No description provided.

…ions

## Summary Structural Changes and Technical Rationale ### 1. Fixed Broken Multi-Stage Conditional Branching * **The Issue:** The original Dockerfile attempted to resolve the final stage dynamically using `FROM deps-${CUDA:-1}` or `FROM deps-${CUDA}`. Standard Docker builders evaluate `FROM` lines at the very beginning of parsing, before stage targets are explicitly created or named dynamically via evaluated shell variables. Furthermore, the conditional default fallback syntax (`:-1`) does not resolve cleanly within base target selectors. * **The Solution:** Standardized stage resolution by implementing explicitly mapped, strict literal intermediate target stages (`deps-cpu` and `deps-cuda`). A unified `final` target is then cleanly resolved using an un-nested `ARG TARGET_ENV` selector. ### 2. Multi-Stage Build Layer Separations * **The Issue:** The previous implementation left heavy development headers, tooling binaries (`build-essential`, `git`), and cached installer metadata embedded directly into the final execution layers. * **The Solution:** Extracted all shared, system-level compilation utilities into isolated intermediate compiler stages. The final target images inherit exclusively from minimal runtime bases, stripping unnecessary build-essential tooling away from the final deliverable. ### 3. PyTorch Dependency Layer Pinning and Caching Controls * **The Issue:** Bundling `requirements.txt` alongside massive framework downloads (`torch`, `torchvision`) within a single dense execution command limits layer caching capability. Modifying a minor backend requirement would discard the entire layer, forcing a complete download of PyTorch's multi-gigabyte files every build cycle. * **The Solution:** Isolated the massive PyTorch setup operations into distinct cache-pinned execution sequences. This separates transient python packages from heavy ML frameworks, minimizing build time and pipeline failures. ### 4. Consolidated Python Binary and Symlinking Alignment * **The Issue:** Operating system discrepancies across `python:3.11-slim` (which uses a default global `python` and `pip` command) and `ubuntu22.04` (which requires explicit `python3.11` version tags and manual `python3-pip` mappings) caused path collisions, missing dependencies, and system-package management errors. * **The Solution:** Uniformly aligned execution runtimes. The CUDA target environment now establishes automated symlinks mapping local references (`python` -> `python3.11` and `pip` -> `pip3`) ensuring standard execution profiles function uniformly across environments. ### 5. Automated Build Layer Cleanup * **The Issue:** Minor storage leaks accumulated via system package manager footprints (`/var/lib/apt/lists/*`) and internal user pip caching structures (`~/.cache/pip`). * **The Solution:** Implemented zero-cache directives (`--no-cache-dir`) on all pip installation pipelines and appended file cleanup hooks natively onto system installation scripts.

…strategy to reduce duplication

…keyframes

…implementation

…nent

…g using AVX/SSE - Added SIMD vectorization support (`__AVX__` and `__SSE__`) for element-wise `add`, `add_inplace`, and `scale` operations. - Maintained scalar fallback paths for non-vectorized bounds and platforms lacking hardware extensions. - Explicitly defined rule-of-five constructors (`default` and `noexcept` moves) within the `Tensor` struct layout. - Optimized vector initialization across the core construct layer via `std::move` and `std::vector::reserve`.

…ing evaluation - Replaced the periodic block evaluation layout with standard, per-step logging metrics (`loss`, `ms`, and `tok/s`). - Shifted initial validation loss calculation out of the iteration cycle to establish a zero-state baseline. - Restructured token streaming so that generations are triggered conditionally inside the training loop post-evaluation windows. - Streamlined architecture parameter reporting and consolidated command-line configuration visual prints.

…ions - Add advanced memory footprint optimization using forward-activation recomputation for LayerNorm and GeLU. - Optimize layer-wise activation buffer layout using a centralized `TensorSpec` registry to support large batch scaling. - Integrate cuBLASLt matmul fusions, optional cuDNN attention layers, and stochastic rounding options. - Fall back gracefully to `cudaMallocManaged` under heavy loads to prevent Outlier/OOM crashes.

Eamon2009 and others added 30 commits May 29, 2026 11:35

feat(ci): optimize workflow pipeline and update docker configurations

de5c112

feat(ci): optimize workflow pipeline and update docker configurations

e91fff5

feat(ci): optimize workflow pipeline and update docker configurations

576c2b8

feat(ci): optimize workflow pipeline and update docker configurations

f1e4bb8

feat(ci): optimize workflow pipeline and update docker configurations

43d51c8

feat(ci): optimize workflow pipeline and update docker configurations

d96de27

feat(ci): optimize workflow pipeline and update docker configurations

c7d669b

feat(ci): optimize workflow pipeline and update docker configurations

a652ec8

refactor(ci): optimize workflow pipeline and update docker configurat…

ebd8e20

…ions

refactor : optimize workflow pipeline and update docker configurations

c2d78c8

refactor : optimize workflow pipeline and update docker configurations

ff173b4

refactor : optimize workflow pipeline and update docker configurations

8db6cd1

Added MIT LICENSE to this project Quadtrix.cpp

07288d9

Refactor Dockerfile to use ARG for CUDA version

d01af15

Refactor Dockerfile for backend dependencies

ed37774

refactor : Dockerfile.backend optimize workflow pipeline

068cdb7

refactor : Dockerfile.backend optimize workflow pipeline

07826e1

refactor : Dockerfile.backend optimize workflow pipeline

7f4d25a

refactor : Dockerfile.backend optimize workflow pipeline

dcd14e1

Delete .devops/Dockerfile.frontend

2139c1d

Delete .devops/Dockerfile.dev.frontend

74b46ec

refactor : Dockerfile.backend optimize workflow pipeline

0770909

refactor : Dockerfile.backend optimize workflow pipeline

f0ae40b

refactored (CI): consolidated manual Docker build jobs into a matrix …

9f909f4

…strategy to reduce duplication

refactored (CI): consolidated manual Docker build jobs into a matrix …

0ff70b2

…strategy to reduce duplication

refactor(ui): rewrite ThinkingIndicator to use inline styles and CSS …

31e960d

…keyframes

refactor : message bubble layout to use inline styles

31ef90d

refactor(ui): complete inline-style migration and update auto-scroll …

9b19a92

…implementation

refactor(ui): complete inline-style migration for MessageAvatar compo…

250b2fc

…nent

Eamon2009 added 5 commits May 30, 2026 20:54

refactor(ui): rewrite EmptyState component using pure inline styles

7e4270d

Update README.md with new banner for qudtrix.cpp

6519631

Eamon2009 self-assigned this May 31, 2026

Eamon2009 requested a review from codeaddict-119 May 31, 2026 13:54

Eamon2009 merged commit 4ebd73f into exp May 31, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp#58

exp#58
Eamon2009 merged 35 commits into
expfrom
master

Eamon2009 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eamon2009 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants