Skip to content

Conversation

@pascal-roth
Copy link
Contributor

Adds a new perceptive actor-critic class, that can define CNN layers for every 2D observation term.

Example usage shown in IsaacLab isaac-sim/IsaacLab#3467

Copy link
Collaborator

@ClemensSchwarke ClemensSchwarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @pascal-roth, very cool, thanks a lot!

@ClemensSchwarke ClemensSchwarke changed the base branch from main to fix/recurrent_symmetry October 22, 2025 15:39
@ClemensSchwarke ClemensSchwarke force-pushed the feature/perceptive-nav-rl branch 2 times, most recently from 22cd1ec to 8a0a959 Compare October 22, 2025 15:57
Comment on lines 98 to 103
if self.flatten:
x = x.flatten(start_dim=1)
elif self.avgpool is not None:
x = self.avgpool(x)
x = x.flatten(start_dim=1)
return x
Copy link
Collaborator

@ClemensSchwarke ClemensSchwarke Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this logic. Why do we always flatten when we do avgpool and why do we not do avgpool when we also set the flatten flag?

@ClemensSchwarke ClemensSchwarke force-pushed the feature/perceptive-nav-rl branch from 42e406d to df9e1ad Compare October 24, 2025 12:03
Base automatically changed from fix/recurrent_symmetry to main October 24, 2025 12:40
@ClemensSchwarke ClemensSchwarke force-pushed the feature/perceptive-nav-rl branch from df9e1ad to 0a1756d Compare October 24, 2025 12:44
@ClemensSchwarke
Copy link
Collaborator

ClemensSchwarke commented Oct 29, 2025

@pascal-roth I modified the CNN quite a bit to have more flexibility. Could you quickly check if this version works for you and maybe also give it a quick check? I also modified your perceptive environment configs to work with the changes here, it would be awesome if you could run the env to see if it still trains the same :)

@lvjonok
Copy link

lvjonok commented Oct 30, 2025

Hello!

I was following the progress of this PR and I had a question whether it makes sense to extend this ActorCriticPerceptive to have a shared CNN backbone for both critic and actor? Do you think it should be an option or should it be needed at all?

I thought it can greatly simplify the complexity of neural network and speed up the training process.

@ClemensSchwarke
Copy link
Collaborator

Hi @lvjonok, I would assume the speedup is negligible. But if you want to test this please share the results :)

@garylvov
Copy link

garylvov commented Nov 5, 2025

Hi @ClemensSchwarke in my tests, it seems using a shared CNN backbone between actor and critic can help save VRAM.

I find this particularly helpful to use more environments per GPU

from tensordict import TensorDict

from rsl_rl.modules import ActorCritic, ActorCriticRecurrent
from rsl_rl.modules import ActorCritic, ActorCriticPerceptive, ActorCriticRecurrent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a better name for the actor to differentiate it between CNN perceptive and RayCaster perceptive

"PerceptiveActorCritic.__init__ got unexpected arguments, which will be ignored: "
+ str([key for key in kwargs])
)
nn.Module.__init__(self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably do
super(ActorCritic, self).__init__()
instead

Comment on lines +45 to +60
self.obs_groups = obs_groups
num_actor_obs_1d = 0
self.actor_obs_groups_1d = []
actor_in_dims_2d = []
actor_in_channels_2d = []
self.actor_obs_groups_2d = []
for obs_group in obs_groups["policy"]:
if len(obs[obs_group].shape) == 4: # B, C, H, W
self.actor_obs_groups_2d.append(obs_group)
actor_in_dims_2d.append(obs[obs_group].shape[2:4])
actor_in_channels_2d.append(obs[obs_group].shape[1])
elif len(obs[obs_group].shape) == 2: # B, C
self.actor_obs_groups_1d.append(obs_group)
num_actor_obs_1d += obs[obs_group].shape[-1]
else:
raise ValueError(f"Invalid observation shape for {obs_group}: {obs[obs_group].shape}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of having a list given to the class constructor that tells you directly the names of the 2D groups? If not, at least i feel like all this code should be its own private method of the class for clarity

Comment on lines +81 to +93
# Check if multiple 2D actor observations are provided
if len(self.actor_obs_groups_2d) > 1 and all(isinstance(item, dict) for item in actor_cnn_cfg.values()):
assert len(actor_cnn_cfg) == len(self.actor_obs_groups_2d), (
"The number of CNN configurations must match the number of 2D actor observations."
)
elif len(self.actor_obs_groups_2d) > 1:
print(
"Only one CNN configuration for multiple 2D actor observations given, using the same configuration "
"for all groups."
)
actor_cnn_cfg = dict(zip(self.actor_obs_groups_2d, [actor_cnn_cfg] * len(self.actor_obs_groups_2d)))
else:
actor_cnn_cfg = dict(zip(self.actor_obs_groups_2d, [actor_cnn_cfg]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic can be simplified

# Actor MLP
self.state_dependent_std = state_dependent_std
if self.state_dependent_std:
self.actor = MLP(num_actor_obs_1d + encoding_dim, [2, num_actions], actor_hidden_dims, activation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not 2*num_actions directly?

Comment on lines +130 to +145
if self.critic_obs_groups_2d:
assert critic_cnn_cfg is not None, " A critic CNN configuration is required for 2D critic observations."

# check if multiple 2D critic observations are provided
if len(self.critic_obs_groups_2d) > 1 and all(isinstance(item, dict) for item in critic_cnn_cfg.values()):
assert len(critic_cnn_cfg) == len(self.critic_obs_groups_2d), (
"The number of CNN configurations must match the number of 2D critic observations."
)
elif len(self.critic_obs_groups_2d) > 1:
print(
"Only one CNN configuration for multiple 2D critic observations given, using the same configuration"
" for all groups."
)
critic_cnn_cfg = dict(zip(self.critic_obs_groups_2d, [critic_cnn_cfg] * len(self.critic_obs_groups_2d)))
else:
critic_cnn_cfg = dict(zip(self.critic_obs_groups_2d, [critic_cnn_cfg]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +61 to +75
num_critic_obs_1d = 0
self.critic_obs_groups_1d = []
critic_in_dims_2d = []
critic_in_channels_2d = []
self.critic_obs_groups_2d = []
for obs_group in obs_groups["critic"]:
if len(obs[obs_group].shape) == 4: # B, C, H, W
self.critic_obs_groups_2d.append(obs_group)
critic_in_dims_2d.append(obs[obs_group].shape[2:4])
critic_in_channels_2d.append(obs[obs_group].shape[1])
elif len(obs[obs_group].shape) == 2: # B, C
self.critic_obs_groups_1d.append(obs_group)
num_critic_obs_1d += obs[obs_group].shape[-1]
else:
raise ValueError(f"Invalid observation shape for {obs_group}: {obs[obs_group].shape}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Normal.set_default_validate_args(False)

def _update_distribution(self, mlp_obs: torch.Tensor, cnn_obs: dict[str, torch.Tensor]) -> None:
if self.actor_cnns is not None:
Copy link

@epalmaEth epalmaEth Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to assume that the user can use the ActorCriticPerceptvie as a normal ActorCritic? i would prefer an assertion in the init in such case and tell them to use the default ActorCrititc

mlp_obs = torch.cat([mlp_obs, cnn_enc], dim=-1)

if self.state_dependent_std:
return self.actor(obs)[..., 0, :]
Copy link

@epalmaEth epalmaEth Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be mlp_obs no? also this logic is superflous if you have the output be 2*num_actions and just do
self.actor(obs)[..., :self.num_actions]

Comment on lines +250 to +254
obs_list_1d = [obs[obs_group] for obs_group in self.actor_obs_groups_1d]
obs_dict_2d = {}
for obs_group in self.actor_obs_groups_2d:
obs_dict_2d[obs_group] = obs[obs_group]
return torch.cat(obs_list_1d, dim=-1), obs_dict_2d

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant obs_dict_2d be also a list where you assume that the order is the same as self.actor_obs_groups_2d. is the dict faster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants