Skip to content

Can't predict on randomForest when test set contains NA's in features #1515

@florianfendt

Description

@florianfendt

I don't know if this is a bug in some sort or if I'm overlooking something, but this baffled @ja-thomas and me a bit this morning.
Consider a simple case where you have a missing value somewhere in your test set like in this example:

lrn.rf = makeLearner("classif.randomForest")
mod = train(lrn.rf, iris.task)
test.df = getTaskData(iris.task)
test.df[1L, 1L] = NA

mlr then throws an error when you try to predict on this set, randomForest's predict method doesn't though:

# throws error: row names contain missing values
predict(mod, newdata = test.df)
# if I'm directly using the predict method from randomForest it works
predict(mod$learner.model, test.df)

I tried printing out .newdata in predictLearner.classif.randomForest to see if we do sth unwanted with the data.frame before sending it to the learner's predict method but row names / str etc. looks fine.
Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions