-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I looked through the issues but didn't see something comparable, excuse me if I missed something and duplicate old discussions.
Whenever I work with categorical data, it's usually something simple like "male"/"female", but often coded in the original dataset with placeholders such as 1
and 2
or 'm'
and 'f'
. So if I want a categorical array with "male"
"female"
I have to take two steps, create the array and then recode. I feel like it would be more straightforward to allow recoding at creation of the data, that could also be faster if there's a lot of data. I'm thinking about an API with a vector of pairs like this:
arr = [1, 2, 2, 1, 2, 1]
cat = categorical(arr, levels = [2 => "female", 1 => "male"])
So you can see that this both allows to set the categorical values that I want, and at the same time allows to set the ordering that differs from the natural 1, 2 sequence.
I think usually one would need to do something like this:
cat = recode(categorical(arr, levels = [2, 1]), 1 => "male", 2 => "female")
This gets more cumbersome the more levels there are and two full arrays need to be created.