mutualinfo with KSG1/KSG2's results different from scikit-learn and about 2x slower for large arrays

In Python + scikit-learn

```Python
from sklearn import feature_selection
import numpy as np

X = np.arange(10)
X = X / np.pi
X = X.reshape((10,1))
y =  np.array([0.1, 0.3, 0.2, 0.4, 0.5, 0.7, 0.6, 0.8, 1.0, 0.9])
# By defaut: n_neighbors = 3
feature_selection.mutual_info_regression(X, y)   # = 0.69563492
```

In Julia + CausalityTools

```Julia
import CausalityTools: mutualinfo, KSG1, KSG2

X = (0:9) ./ π
y = [0.1, 0.3, 0.2, 0.4, 0.5, 0.7, 0.6, 0.8, 1.0, 0.9]
mutualinfo(KSG1(k=3), X, y)  # = -0.43223601423458935
mutualinfo(KSG2(k=3), X, y)  # = 0.4746008686099013
```

The results are clearly different. In addition, in scikit-learn, if mi<0, 0 will be reported.

For running time, I have

In Python

```Python
X = np.random.rand(171522,1)
y = np.random.rand(171522,)
tic=time.perf_counter(); feature_selection.mutual_info_regression(X,y); toc=time.perf_counter()
array([0.00072401])
toc - tic  # = 0.96151989325881 seconds
```

In Julia

```Julia
mutualinfo(KSG1(k=3), rand(171522), rand(171522))
@time mutualinfo(KSG1(k=3), rand(171522), rand(171522))  # 2.245426 seconds (857.67 k allocations: 83.230 MiB)

mutualinfo(KSG2(k=3), rand(171522), rand(171522))
@time mutualinfo(KSG2(k=3), rand(171522), rand(171522))  # 2.113910 seconds (857.68 k allocations: 91.082 MiB)
```

So, Julia implementation is much slower than Python implementation. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mutualinfo with KSG1/KSG2's results different from scikit-learn and about 2x slower for large arrays #367

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mutualinfo with KSG1/KSG2's results different from scikit-learn and about 2x slower for large arrays #367

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions