-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hi! First of all, thanks for making this. I just did some quick benchmarking and {fastDummies} really is fast! I have a use case where this will actually make a big difference: simulating several million combinations of models using simulated data where each iteration requires creating new dummy variables. So this will help a lot!
In my case though I also need to create interaction variables, and it would be great if there was a way to build on this package to make them. Here is an example to show what I mean.
df <- data.frame(
price = runif(100, 5, 10),
brand = sample(c("Nike", "Adidas"), 100, replace = TRUE)
)
df_without_ints <- fastDummies::dummy_cols(df, "brand")
df_with_ints <- as.data.frame(
model.matrix(
data = df,
object = ~price + brand + price*brand - 1)
)
The df_without_ints
data frame uses fastDummies::dummy_cols()
to generate dummies, but it doesn't include interactions between price
and the dummied brand
coefficients. In contrast, I can use model.matrix()
to generate both (see the df_with_ints
object). model.matrix()
isn't as fast, but it works well if you need both dummies and interactions with other columns. Does this make sense, and do you think it might be something others might be interested in?