-
Notifications
You must be signed in to change notification settings - Fork 424
Adds perceptive actor-critic class #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pascal-roth, very cool, thanks a lot!
22cd1ec to
8a0a959
Compare
rsl_rl/networks/cnn.py
Outdated
| if self.flatten: | ||
| x = x.flatten(start_dim=1) | ||
| elif self.avgpool is not None: | ||
| x = self.avgpool(x) | ||
| x = x.flatten(start_dim=1) | ||
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this logic. Why do we always flatten when we do avgpool and why do we not do avgpool when we also set the flatten flag?
61038de to
d69435a
Compare
42e406d to
df9e1ad
Compare
d69435a to
c871703
Compare
df9e1ad to
0a1756d
Compare
|
@pascal-roth I modified the CNN quite a bit to have more flexibility. Could you quickly check if this version works for you and maybe also give it a quick check? I also modified your perceptive environment configs to work with the changes here, it would be awesome if you could run the env to see if it still trains the same :) |
|
Hello! I was following the progress of this PR and I had a question whether it makes sense to extend this ActorCriticPerceptive to have a shared CNN backbone for both critic and actor? Do you think it should be an option or should it be needed at all? I thought it can greatly simplify the complexity of neural network and speed up the training process. |
|
Hi @lvjonok, I would assume the speedup is negligible. But if you want to test this please share the results :) |
|
Hi @ClemensSchwarke in my tests, it seems using a shared CNN backbone between actor and critic can help save VRAM. I find this particularly helpful to use more environments per GPU |
| from tensordict import TensorDict | ||
|
|
||
| from rsl_rl.modules import ActorCritic, ActorCriticRecurrent | ||
| from rsl_rl.modules import ActorCritic, ActorCriticPerceptive, ActorCriticRecurrent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a better name for the actor to differentiate it between CNN perceptive and RayCaster perceptive
| "PerceptiveActorCritic.__init__ got unexpected arguments, which will be ignored: " | ||
| + str([key for key in kwargs]) | ||
| ) | ||
| nn.Module.__init__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably do
super(ActorCritic, self).__init__()
instead
| self.obs_groups = obs_groups | ||
| num_actor_obs_1d = 0 | ||
| self.actor_obs_groups_1d = [] | ||
| actor_in_dims_2d = [] | ||
| actor_in_channels_2d = [] | ||
| self.actor_obs_groups_2d = [] | ||
| for obs_group in obs_groups["policy"]: | ||
| if len(obs[obs_group].shape) == 4: # B, C, H, W | ||
| self.actor_obs_groups_2d.append(obs_group) | ||
| actor_in_dims_2d.append(obs[obs_group].shape[2:4]) | ||
| actor_in_channels_2d.append(obs[obs_group].shape[1]) | ||
| elif len(obs[obs_group].shape) == 2: # B, C | ||
| self.actor_obs_groups_1d.append(obs_group) | ||
| num_actor_obs_1d += obs[obs_group].shape[-1] | ||
| else: | ||
| raise ValueError(f"Invalid observation shape for {obs_group}: {obs[obs_group].shape}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of having a list given to the class constructor that tells you directly the names of the 2D groups? If not, at least i feel like all this code should be its own private method of the class for clarity
| # Check if multiple 2D actor observations are provided | ||
| if len(self.actor_obs_groups_2d) > 1 and all(isinstance(item, dict) for item in actor_cnn_cfg.values()): | ||
| assert len(actor_cnn_cfg) == len(self.actor_obs_groups_2d), ( | ||
| "The number of CNN configurations must match the number of 2D actor observations." | ||
| ) | ||
| elif len(self.actor_obs_groups_2d) > 1: | ||
| print( | ||
| "Only one CNN configuration for multiple 2D actor observations given, using the same configuration " | ||
| "for all groups." | ||
| ) | ||
| actor_cnn_cfg = dict(zip(self.actor_obs_groups_2d, [actor_cnn_cfg] * len(self.actor_obs_groups_2d))) | ||
| else: | ||
| actor_cnn_cfg = dict(zip(self.actor_obs_groups_2d, [actor_cnn_cfg])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic can be simplified
| # Actor MLP | ||
| self.state_dependent_std = state_dependent_std | ||
| if self.state_dependent_std: | ||
| self.actor = MLP(num_actor_obs_1d + encoding_dim, [2, num_actions], actor_hidden_dims, activation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not 2*num_actions directly?
| if self.critic_obs_groups_2d: | ||
| assert critic_cnn_cfg is not None, " A critic CNN configuration is required for 2D critic observations." | ||
|
|
||
| # check if multiple 2D critic observations are provided | ||
| if len(self.critic_obs_groups_2d) > 1 and all(isinstance(item, dict) for item in critic_cnn_cfg.values()): | ||
| assert len(critic_cnn_cfg) == len(self.critic_obs_groups_2d), ( | ||
| "The number of CNN configurations must match the number of 2D critic observations." | ||
| ) | ||
| elif len(self.critic_obs_groups_2d) > 1: | ||
| print( | ||
| "Only one CNN configuration for multiple 2D critic observations given, using the same configuration" | ||
| " for all groups." | ||
| ) | ||
| critic_cnn_cfg = dict(zip(self.critic_obs_groups_2d, [critic_cnn_cfg] * len(self.critic_obs_groups_2d))) | ||
| else: | ||
| critic_cnn_cfg = dict(zip(self.critic_obs_groups_2d, [critic_cnn_cfg])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| num_critic_obs_1d = 0 | ||
| self.critic_obs_groups_1d = [] | ||
| critic_in_dims_2d = [] | ||
| critic_in_channels_2d = [] | ||
| self.critic_obs_groups_2d = [] | ||
| for obs_group in obs_groups["critic"]: | ||
| if len(obs[obs_group].shape) == 4: # B, C, H, W | ||
| self.critic_obs_groups_2d.append(obs_group) | ||
| critic_in_dims_2d.append(obs[obs_group].shape[2:4]) | ||
| critic_in_channels_2d.append(obs[obs_group].shape[1]) | ||
| elif len(obs[obs_group].shape) == 2: # B, C | ||
| self.critic_obs_groups_1d.append(obs_group) | ||
| num_critic_obs_1d += obs[obs_group].shape[-1] | ||
| else: | ||
| raise ValueError(f"Invalid observation shape for {obs_group}: {obs[obs_group].shape}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| Normal.set_default_validate_args(False) | ||
|
|
||
| def _update_distribution(self, mlp_obs: torch.Tensor, cnn_obs: dict[str, torch.Tensor]) -> None: | ||
| if self.actor_cnns is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to assume that the user can use the ActorCriticPerceptvie as a normal ActorCritic? i would prefer an assertion in the init in such case and tell them to use the default ActorCrititc
| mlp_obs = torch.cat([mlp_obs, cnn_enc], dim=-1) | ||
|
|
||
| if self.state_dependent_std: | ||
| return self.actor(obs)[..., 0, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be mlp_obs no? also this logic is superflous if you have the output be 2*num_actions and just do
self.actor(obs)[..., :self.num_actions]
| obs_list_1d = [obs[obs_group] for obs_group in self.actor_obs_groups_1d] | ||
| obs_dict_2d = {} | ||
| for obs_group in self.actor_obs_groups_2d: | ||
| obs_dict_2d[obs_group] = obs[obs_group] | ||
| return torch.cat(obs_list_1d, dim=-1), obs_dict_2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cant obs_dict_2d be also a list where you assume that the order is the same as self.actor_obs_groups_2d. is the dict faster?
Adds a new perceptive actor-critic class, that can define CNN layers for every 2D observation term.
Example usage shown in IsaacLab isaac-sim/IsaacLab#3467