Currently, there is only one outlier detection comparison in sklearn 2.7.1. And there is no performance measurement in the example.
I have worked on outlier detection comparison in 3-D toy dataset. There are 3 dimensional information similar to figure 1 in this paper. I also add a noise dimension variable D_noise because most neurodata has high noise dimension. The example measures algorithms' performance using AUC score from sklearn.metrics.roc_auc_score.