using SHAP with petastorm dataset #776

sdaza · 2022-09-09T10:39:50Z

Hello,
I am trying to estimate SHAP values from a neural network model (Keras) that is estimated using petastorm (specifically, make_tf_dataset from a spark df).

Here is an example:

def train_and_evaluate(): 
    with train_converter.make_tf_dataset(transform_spec=transform_train, batch_size=batch_size, workers_count=workers_count) as train_dataset, \
        test_converter.make_tf_dataset(transform_spec=transform_test, batch_size=batch_size, workers_count=workers_count) as test_dataset:
    
        train_dataset = train_dataset.map(lambda x: (tuple(getattr(x, col) for col in all_features), getattr(x, target)))
        test_dataset = test_dataset.map(lambda x: (tuple(getattr(x, col) for col in all_features), getattr(x, target)))

        steps_per_epoch = int(len(train_converter) / batch_size)
        validation_steps = int(len(test_converter) / batch_size)
    
        print(f"steps_per_epoch: {steps_per_epoch}, validation_steps: {validation_steps}")

        callbacks_list = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)]
        
        history = model.fit(train_dataset, 
            steps_per_epoch=steps_per_epoch,
            epochs=max_epochs,
            shuffle=True,
            validation_data=test_dataset,
            validation_steps=validation_steps,
            callbacks=callbacks_list,
            verbose=2)

        explainer = shap.DeepExplainer(model, train_dataset)
        shap_values = explainer.shap_values(test_dataset)

    return {'history':history, 'shap_values':shap_values}

The error I get is AttributeError: 'DatasetV1Adapter' object has no attribute 'shape'. , probably because SHAP doesn't accept the train_dataset format: if framework == ‘tensorflow’: [numpy.array] or [pandas.DataFrame]

Any suggestions or ideas on how to deal with this? Thanks!

The text was updated successfully, but these errors were encountered:

selitvin · 2022-09-14T04:01:18Z

Would be happy to look into this if you can provide a runnable snippet to reproduce the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using SHAP with petastorm dataset #776

using SHAP with petastorm dataset #776

sdaza commented Sep 9, 2022 •

edited

Loading

selitvin commented Sep 14, 2022

using SHAP with petastorm dataset #776

using SHAP with petastorm dataset #776

Comments

sdaza commented Sep 9, 2022 • edited Loading

selitvin commented Sep 14, 2022

sdaza commented Sep 9, 2022 •

edited

Loading