Releases · mitmedialab/sherlock-project

22 Feb 10:57

madelonhulsebos

v1.0.0

5b3ac69

Feature extraction speedup, bugfixes and model code. Latest

Latest

This release provides:

a significant speedup and memory reduction of the feature extraction phase,
bugfixes in the feature extraction pipeline,
the code of the original model architecture (tensorflow keras),
alignment of the SherlockModel class with the scikit-learn API (i.e. w/ fit, predict, predict_proba methods),
improved notebooks demonstrating 1) full reproduction of the feature extraction and model training/evaluation pipelines, 2) out-of-the-box usage of the Sherlock model for a given table, 3) how performance can be improved with additional classifiers.

Contributions by:
@lowecg
@madelonhulsebos

Contributors

lowecg and madelonhulsebos

Assets 2

09 Feb 11:57

madelonhulsebos

v0.1.0

6254a62

Original code Pre-release

Pre-release

This release reflects the code that was used for the experiments in the paper "Sherlock: a deep learning approach to semantic data type detection" (link to the paper on arXiv). This release provides code for:

Download of the original train and test data used for the experiment results as reported in the paper.
Feature extraction to numerically represent new columns.
Evaluating a trained Sherlock model on unseen table columns.
Retraining the original Sherlock model.

This release consists inefficiencies and bugs, hence it is recommended to use the latest release of this project in production settings or new research projects. More about this project can be found on this website.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Releases: mitmedialab/sherlock-project

Feature extraction speedup, bugfixes and model code.

Contributors

Original code