-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hi RVT team,
First, thank you for open-sourcing this impressive work! I'm attempting to reproduce RVT-2's RLBench results and encountered an inconsistency in the "slide block to color target" task that I'd like to discuss.
Issue Description
Reported vs Reproduced Results:
While the paper reports 92% success rate, my implementation achieves only 48%.
Failure Analysis:
Through video analysis of failed cases, I observed that the model frequently mispredicts the ignore_collision flag. When this flag is manually forced to True, success rate aligns with the paper's results (92-100%).
Key Insight:
Since collision prediction should be a simpler subtask, I suspect a potential discrepancy in the data generation logic.
In dataset.py
(specifically the _add_keypoints_to_replay
function), I noticed the following implementation:
(
trans_indicies,
rot_grip_indicies,
ignore_collisions, # this ignore collisions flage corresponds to keypoint but not used
action,
attention_coordinates,
) = _get_action(
obs_tp1,
obs_tm1,
rlbench_scene_bounds,
voxel_sizes,
rotation_resolution,
crop_augmentation,
)
terminal = k == len(episode_keypoints) - 1
reward = float(terminal) * 1.0 if terminal else 0
obs_dict = extract_obs(
obs,
CAMERAS,
t=k - next_keypoint_idx,
prev_action=prev_action,
episode_length=25,
) # the ignore_collisions in the dict corresponds to current observation
Request
Could you please clarify:
Whether this is a known implementation-paper discrepancy?
If modifying the ignore_collision source to keypoint observations would align with your original design?
Thank you for your time and insights!