Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 711 Bytes

abstract.md

File metadata and controls

3 lines (2 loc) · 711 Bytes

Abstract

Many MLOps issues are caused by data-related problems. Unfortunately, hype surrounding algorithms overshadows them although they cause data leakage and pipeline jungle, leading to model failure. I propose a streamlined and adaptive data-centric ML pipeline for a domain like hardware verification, where schemas are absent, data types are inaccurate, and data drift is extreme (shape and type changes). Here, schemas are inferred from raw data, and used for monitoring and preprocessing. During serving, schema mismatches are resolved, which increases robustness. It also easily allows “data tuning” (preprocessing optimization), which improved model performance in real-world benchmark testing.