The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
Updated
Dec 19, 2024 - Python
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Refine high-quality datasets and visual AI models
A light-weight, flexible, and expressive statistical data testing library
Jupyter notebook and datasets from the pandas video series
General Assembly's 2015 Data Science course in Washington, DC
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
simple tools for data cleaning in R
The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Prepping tables for machine learning
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM
Schema-Inspector is a simple JavaScript object sanitization and validation module.
Easy to use Python library of customized functions for cleaning and analyzing data.
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
🗣️ A book and repo to get you started programming voice computing applications in Python (10 chapters and 200+ scripts).
Data Science Feature Engineering and Selection Tutorials
Exploratory data analysis 📊using python 🐍of used car 🚘 database taken from ⓚ𝖆𝖌𝖌𝖑𝖊
Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.
To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."