tabular_trees.decompose_prediction

tabular_trees.decompose_prediction(tabular_trees, row)[source]

Decompose prediction from tree based model with Saabas method[1].

This method attributes the change in prediction from moving to a lower node to the variable that was split on. This can then be summed over all splits in a tree and all trees in a model.

Parameters:

tabular_trees (TabularTrees) – Tree based model to explain prediction for.
row (pd.DataFrame) – Single row of data to explain prediction from tabular_trees object.

Returns:

results – Prediction decomposed into change attributed to each feature.

Return type:

PredictionDecomposition

Notes

[1] Saabas, Ando (2014) ‘Interpreting random forests’, Diving into data blog, 19 October. Available at http://blog.datadive.net/interpreting-random-forests/ (Accessed 26 February 2023).

Examples

>>> import xgboost as xgb
>>> import pandas as pd
>>> from sklearn.datasets import load_diabetes
>>> from tabular_trees import export_tree_data
>>> from tabular_trees import decompose_prediction
>>> # get data in DMatrix
>>> diabetes = load_diabetes()
>>> data = xgb.DMatrix(
...     diabetes["data"],
...     label=diabetes["target"],
...     feature_names=diabetes["feature_names"]
... )
>>> # build model
>>> params = {"max_depth": 3, "verbosity": 0}
>>> model = xgb.train(params, dtrain=data, num_boost_round=10)
>>> # export to TabularTrees
>>> xgboost_tabular_trees = export_tree_data(model)
>>> tabular_trees = xgboost_tabular_trees.to_tabular_trees()
>>> # get data to score
>>> scoring_data = pd.DataFrame(diabetes["data"], columns=diabetes["feature_names"])
>>> row_to_score = scoring_data.iloc[[0]]
>>> # decompose prediction
>>> results = decompose_prediction(tabular_trees, row=row_to_score)
>>> type(results)
<class 'tabular_trees.explain.prediction_decomposition.PredictionDecomposition'>