forked from scikit-learn/scikit-learn
-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Description
The current random trees embedding in Sklearn gives the leaf node that samples land into as the output. It is only used as a way to transform the dataset to a higher dimension in the given examples. To make it a better algorithm for unsupervised clustering and classification, I plan to introduce three distance metrics that can help boost the classification performance.
Planned Enhancement in the Form of PR
- Implement the algorithm to generate three different distance metrics: 1) depth of nearest common ancestor; 2) length of shortest path; 3) proximity matrix from random trees embedding estimators using scikit-learn package
- Give examples on how to choose clustering algorithm and parameters to be used on the output of RandomTreesEmbedding
Metadata
Metadata
Assignees
Labels
No labels