Skip to content

yoshoku/softdtree

Repository files navigation

softdtree

Test Status BSD 3-Clause License PyPI

softdtree is a Python library that implements classifier and regressor with Soft Decision Tree.

Installation

softdtree requires Eigen3, so install it beforehand,

macOS:

$ brew install eigen cmake

Ubuntu:

$ sudo apt-get install libeigen3-dev cmake

Then, install softdtree from PyPI:

$ pip install -U softdtree

Usage

The API of softdtree is compatible with scikit-learn.

Classifier:

from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from softdtree import SoftDecisionTreeClassifier

X, y = load_digits(n_class=4, return_X_y=True)

clf = Pipeline([
    ("scaler", StandardScaler()),
    ("tree", SoftDecisionTreeClassifier(
        max_depth=4, eta=0.01, max_epoch=100, random_seed=42))
])

scores = cross_val_score(clf, X, y, cv=5)
print(f"Accuracy: {scores.mean():.3f} ± {scores.std():.3f}")

Regressor:

from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from softdtree import SoftDecisionTreeRegressor

X, y = load_diabetes(return_X_y=True)

reg = Pipeline([
    ("scaler", MinMaxScaler()),
    ("tree", SoftDecisionTreeRegressor(
        max_depth=4, eta=0.1, max_epoch=100, random_seed=42))
])

scores = cross_val_score(reg, X, y, cv=5)
print(f"R^2: {scores.mean():.3f} ± {scores.std():.3f}")

Parameters

  • max_depth (int): The maximum depth of the tree. The default is 8.
  • max_features (float): The ratio of the number of features used at each node. The number of features used is max(1, min(n_features, n_features * max_features)). The default is 1.0.
  • max_epoch (int): The maximum number of epochs to train. The default is 100.
  • batch_size (int): The number of samples used in one iteration. The default is 5.
  • eta (float): The learning rate. The default is 0.1.
  • beta1 (float): The exponential decay rate for the first moment estimates. The default is 0.9.
  • beta2 (float): The exponential decay rate for the second moment estimates. he default is 0.999.
  • epsilon (float): The term added to the denominator for numerical stability. The default is 1e-8.
  • tol (float): The tolerance for the optimization. The default is 1e-4.
  • verbose (int): If it is set to a value greater than 0, the estimator outputs a log. The default is 0.
  • random_seed (int): The random seed. If -1, then it will be set to a number generated by a uniformly-distributed integer random number generator. The default is -1.

References

  • O. Irsoy, O. T. Yildiz, and E. Alpaydin, "Soft Decision Trees," In Proc. ICPR2012, 2012.

License

softdtree is available as open source under the terms of the BSD-3-Clause License.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/softdtree This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

About

softdtree is a Python library that implements classifier and regressor with Soft Decision Tree.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published