This is the library for the scikit-learn client for ModelDB. It is responsible for storing machine learning operations in scikit-learn,
like LogisticRegression().fit(x_train, y_train)
, in ModelDB. You can explore the modeldb folder and look through the library.
First, make sure you have followed the setup instructions for ModelDB and have built the client.
Next, put the python client on your PYTHONPATH:
export PYTHONPATH=[path_to_modedb_dir]/client/python:$PYTHONPATH
You can also permanently put it on your PYTHONPATH by putting the above line in your ~/.bashrc
. Afterwards, run:
source ~/.bashrc
Or, you can put the following lines in the beginning of your Python files:
import sys
sys.path.append("[path_to_modeldb]/client/python")
This assumes that you have an ML workflow that you want to instrument with ModelDB. We only highlight the ModelDB specific steps here. Aside from importing modules and initialization, most of the incorporation includes appending _sync
to the scikit-learn function names.
from modeldb.sklearn_native import *
from modeldb.sklearn_native.ModelDbSyncer import *
ModelDBSyncer is the object that logs models and operations to the ModelDB backend. You can initialize the Syncer with your specified configurations as shown below. Explore the ModelDBSyncer here for more details on the Syncer object and the different ways to initialize it.
# initialize syncer explicitly
syncer_obj = Syncer(
NewOrExistingProject("proj_name", "username", "proj_description"),
NewOrExistingExperiment("exp_name", "exp_desc"),
NewExperimentRun("simple sample test"))
Next, when you want to log an operation to ModelDB, use the ModelDB sync variants of functions by appending _sync
to the method call. So the original fit calls from scikit-learn would turn into fit_sync, save calls would turn into save_sync and so on.
x_train, x_test, y_train, y_test = cross_validation.train_test_split_sync(df, target, test_size=0.3)
lr = LogisticRegression()
lr.fit_sync(x_train, y_train) # instead of the usual lr.fit(x_train, y_train)
y_pred = lr.predict_sync(x_test)
Use the ModelDB metrics class SyncableMetrics.
SyncableMetrics.compute_metrics(model, scoring_function, labels, predictions, dataframe, predictionCol, labelCol)
syncer_obj.sync()
The full code for this example can be found here. You can also compare it with the code with the original workflow without ModelDB here. Run the sample model and all the model information will be stored in ModelDB.
python samples/sklearn/SimpleSampleWithModelDB.py
You should now be able to view the model in ModelDB.
The samples/sklearn folder contains the scikit-learn examples. These include common models with the ModelDB workflow incorporated into them. You may need to install any missing external Python modules used in the samples in order to run them.
Try running samples as in:
python samples/sklearn/LabelEncoding.py
Try running the unittests as:
python -m unittest discover modeldb/tests/sklearn/
Note: unittests have been run with scikit-learn version 0.17.