Abstract

Many MLOps issues are caused by data-related problems. Unfortunately, hype surrounding algorithms overshadows them although they cause data leakage and pipeline jungle, leading to model failure. I propose a streamlined and adaptive data-centric ML pipeline for a domain like hardware verification, where schemas are absent, data types are inaccurate, and data drift is extreme (shape and type changes). Here, schemas are inferred from raw data, and used for monitoring and preprocessing. During serving, schema mismatches are resolved, which increases robustness. It also easily allows “data tuning” (preprocessing optimization), which improved model performance in real-world benchmark testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

abstract.md

abstract.md

Abstract

Files

abstract.md

Latest commit

History

abstract.md

File metadata and controls

Abstract