Quick Start

Welcome to the quick start guide for tabular_trees.

This package contains various bits of functionality for working with trees from common ML packages.

Installation

The easiest way to get tabular_trees is to install directly with pip;

pip install tabular_trees

Editable LightGBM Booster

Warning

The EditableBooster class is experimental and has not been tested with all options available within LightGBM.

The EditableBooster class provides an object that can be converted to and from a lgb.Booster. The EditableBooster can be modified to change the lgb.Booster, for example defining specific trees.

First build a lgb.Booster model:

import numpy as np
import lightgbm as lgb

data = np.random.rand(500, 10)
label = np.random.randint(2, size=500)
train_data = lgb.Dataset(data, label=label)
param = {'num_leaves': 31, 'objective': 'binary'}
num_round = 10
bst = lgb.train(param, train_data, num_round)

Then convert to EditableBooster:

from tabular_trees import EditableBooster

editable_booster = EditableBooster.from_booster(bst)

Add an extra tree to the EditableBooster ensemble:

from tabular_trees import BoosterTree

extra_tree = BoosterTree(
    tree=0,
    num_leaves=2,
    num_cat=0,
    split_feature=[0],
    split_gain=[0],
    threshold=[0],
    decision_type=[2],
    left_child=[-1],
    right_child=[-2],
    leaf_value=[1, 2],
    leaf_weight=[1, 1],
    leaf_count=[1, 1],
    internal_value=[0],
    internal_weight=[1],
    internal_count=[1],
    is_linear=0,
    shrinkage=1,
)

editable_booster.trees.append(extra_tree)

extra_tree_size = len(extra_tree.get_booster_sting()) + 1
editable_booster.header.tree_sizes.append(extra_tree_size)

This example adds a simple tree structure with only a single split on the first feature.

Convert back to lgb.Booster object:

new_booster = editable_booster.to_booster()

Now that we have a lgb.Booster object we can make predictions with the modified model.

Tabular Tree Data

Tree based models (specifically GBMs) from xgboost, lightgbm or scikit-learn can be exported to tabular data objects for further analysis.

The following models are supported:

Prediction Explanation

The decompose_prediction and calculate_shapley_values functions can be used to explain each feature’s contribution to a single prediction.