Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using SHAP with petastorm dataset #776

Open
sdaza opened this issue Sep 9, 2022 · 1 comment
Open

using SHAP with petastorm dataset #776

sdaza opened this issue Sep 9, 2022 · 1 comment

Comments

@sdaza
Copy link

sdaza commented Sep 9, 2022

Hello,
I am trying to estimate SHAP values from a neural network model (Keras) that is estimated using petastorm (specifically, make_tf_dataset from a spark df).

Here is an example:

def train_and_evaluate(): 
    with train_converter.make_tf_dataset(transform_spec=transform_train, batch_size=batch_size, workers_count=workers_count) as train_dataset, \
        test_converter.make_tf_dataset(transform_spec=transform_test, batch_size=batch_size, workers_count=workers_count) as test_dataset:
    
        train_dataset = train_dataset.map(lambda x: (tuple(getattr(x, col) for col in all_features), getattr(x, target)))
        test_dataset = test_dataset.map(lambda x: (tuple(getattr(x, col) for col in all_features), getattr(x, target)))

        steps_per_epoch = int(len(train_converter) / batch_size)
        validation_steps = int(len(test_converter) / batch_size)
    
        print(f"steps_per_epoch: {steps_per_epoch}, validation_steps: {validation_steps}")

        callbacks_list = [tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)]
        
        history = model.fit(train_dataset, 
            steps_per_epoch=steps_per_epoch,
            epochs=max_epochs,
            shuffle=True,
            validation_data=test_dataset,
            validation_steps=validation_steps,
            callbacks=callbacks_list,
            verbose=2)

        explainer = shap.DeepExplainer(model, train_dataset)
        shap_values = explainer.shap_values(test_dataset)

    return {'history':history, 'shap_values':shap_values}

The error I get is AttributeError: 'DatasetV1Adapter' object has no attribute 'shape'. , probably because SHAP doesn't accept the train_dataset format: if framework == ‘tensorflow’: [numpy.array] or [pandas.DataFrame]

Any suggestions or ideas on how to deal with this? Thanks!

@selitvin
Copy link
Collaborator

Would be happy to look into this if you can provide a runnable snippet to reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants