The action evaluation in PO-rollout

Hi, I find that in po_rollout.pyx, in _search, action is iterated but never used in the rollout call:
```
for action in legal_actions:
    rewards = []
    for i in range(self._num_sims // len(legal_actions)):
        state = self._agent.belief.random()
        total_discounted_reward = self._rollout(state, 0)
        rewards.append(total_discounted_reward)
```
and inside _rollout, the action is chosen by the rollout policy, not by the action under evaluation:
```
while depth <= self._max_depth:
    action = self._rollout_policy.rollout(state, history=history)
    next_state, observation, reward, nsteps = sample_generative_model(self._agent, state, action)
    ...
```
So I think every action gets evaluated by the same distribution of rollouts driven only by the rollout policy, so the averages are essentially i.i.d. and do not reflect the action being  evaluated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The action evaluation in PO-rollout #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The action evaluation in PO-rollout #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions