softdtree is a Python library that implements classifier and regressor with Soft Decision Tree.
softdtree requires Eigen3, so install it beforehand,
macOS:
$ brew install eigen cmake
Ubuntu:
$ sudo apt-get install libeigen3-dev cmake
Then, install softdtree from PyPI:
$ pip install -U softdtree
The API of softdtree is compatible with scikit-learn.
Classifier:
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from softdtree import SoftDecisionTreeClassifier
X, y = load_digits(n_class=4, return_X_y=True)
clf = Pipeline([
("scaler", StandardScaler()),
("tree", SoftDecisionTreeClassifier(
max_depth=4, eta=0.01, max_epoch=100, random_seed=42))
])
scores = cross_val_score(clf, X, y, cv=5)
print(f"Accuracy: {scores.mean():.3f} ± {scores.std():.3f}")
Regressor:
from sklearn.datasets import load_diabetes
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from softdtree import SoftDecisionTreeRegressor
X, y = load_diabetes(return_X_y=True)
reg = Pipeline([
("scaler", MinMaxScaler()),
("tree", SoftDecisionTreeRegressor(
max_depth=4, eta=0.1, max_epoch=100, random_seed=42))
])
scores = cross_val_score(reg, X, y, cv=5)
print(f"R^2: {scores.mean():.3f} ± {scores.std():.3f}")
max_depth
(int): The maximum depth of the tree. The default is8
.max_features
(float): The ratio of the number of features used at each node. The number of features used ismax(1, min(n_features, n_features * max_features))
. The default is1.0
.max_epoch
(int): The maximum number of epochs to train. The default is100
.batch_size
(int): The number of samples used in one iteration. The default is5
.eta
(float): The learning rate. The default is0.1
.beta1
(float): The exponential decay rate for the first moment estimates. The default is0.9
.beta2
(float): The exponential decay rate for the second moment estimates. he default is0.999
.epsilon
(float): The term added to the denominator for numerical stability. The default is1e-8
.tol
(float): The tolerance for the optimization. The default is1e-4
.verbose
(int): If it is set to a value greater than0
, the estimator outputs a log. The default is0
.random_seed
(int): The random seed. If-1
, then it will be set to a number generated by a uniformly-distributed integer random number generator. The default is-1
.
- O. Irsoy, O. T. Yildiz, and E. Alpaydin, "Soft Decision Trees," In Proc. ICPR2012, 2012.
softdtree is available as open source under the terms of the BSD-3-Clause License.
Bug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/softdtree This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.