tabular_trees.calculate_shapley_values

tabular_trees.calculate_shapley_values(tabular_trees, row)[source]

Calculate shapley values from TabularTrees model for row of data.

This is Algorithm 1 as presented in https://arxiv.org/pdf/1802.03888.pdf.

Parameters:
  • tree_df (pd.DataFrame) – Model (multiple trees) in tabular structure. Should be the output of pygbm.expl.xgb.extract_model_predictions.

  • row (pd.Series) – Single row of data to explain prediction for. It is the users responsibility to pass the relevant columns in row (i.e. the columns used by the model). If extra columns are added this will exponentially increase the number of runs - even if they are not relevant to the model.

Returns:

results – Shapley values for prediction from model on the input row.

Return type:

ShapleyValues

Notes

This algorithm has O(TL2^M) complexity (where M is the number of features) and this implementation is not intended to be efficient - rather it is intended to illustrate the algorithm. Beware of using this on models or datasets (specifically columns) of any significant size.

Examples

>>> import xgboost as xgb
>>> import pandas as pd
>>> from sklearn.datasets import load_diabetes
>>> from tabular_trees import export_tree_data
>>> from tabular_trees import calculate_shapley_values
>>> # get data in DMatrix
>>> diabetes = load_diabetes()
>>> data = xgb.DMatrix(
...     diabetes["data"][:,:3],
...     label=diabetes["target"],
...     feature_names=diabetes["feature_names"][:3]
... )
>>> # build model
>>> params = {"max_depth": 3, "verbosity": 0}
>>> model = xgb.train(params, dtrain=data, num_boost_round=10)
>>> # export to TabularTrees
>>> xgboost_tabular_trees = export_tree_data(model)
>>> tabular_trees = xgboost_tabular_trees.to_tabular_trees()
>>> # get data to score
>>> scoring_data = pd.DataFrame(
...     diabetes["data"][:,:3],
...     columns=diabetes["feature_names"][:3]
... )
>>> row_to_score = scoring_data.iloc[0]
>>> # calculate shapley values
>>> results = calculate_shapley_values(tabular_trees, row=row_to_score)
>>> type(results)
<class 'tabular_trees.explain.shapley_values.ShapleyValues'>