An open-source compound AI toolchain for fast and accurate entity matching, powered by LLMs.
-
Updated
Jul 3, 2024 - Python
Entity resolution (also known as data matching, data linkage, record linkage, and many other terms) is the task of finding entities in a dataset that refer to the same entity across different data sources (e.g., data files, books, websites, and databases). Entity resolution is necessary when joining different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, National identification number), which may be due to differences in record shape, storage location, or curator style or preference.
An open-source compound AI toolchain for fast and accurate entity matching, powered by LLMs.
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.
Emulates the methods the US Census Bureau uses to link people across multiple data sources, using open-source software (Splink) and simulated data (from pseudopeople).
An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.
Fuzzy string matching in R. Inspired by Python's thefuzz (but without the Python).
🔎 Finds fuzzy matches between datasets
🔎 Finds fuzzy matches between CSV files
This projects aims to provide lists containing only great movies to users based only a gew filters and search parameters.
Link Wikidata items to large catalogs
Resources for tackling record linkage / deduplication / data matching problems
A powerful and modular toolkit for record linkage and duplicate detection in Python
A list of free data matching and record linkage software.
Weka Comparator to match rules to test data with filtering abilites
ProxCluster is a framework for Incremental Entity Resolution that leverages concepts similar to K-Means for clustering duplicates. This work was developed as the final paper for my Bachelor degree in Computer Science
A browser user interface for manual labeling of record pairs.
AdapterEM: Pre-trained Language Model Adaptation for Generalized Entity Matching using Adapter-tuning
An extension for ASReview Lab to preprocess the dataset before importing in ASReview
A collection of awesome resources regarding Record Linkage.
Welcome to Snowman App – a Data Matching Benchmark Platform.
Created by Halbert L. Dunn
Released 1946