Skip to content

When equilibrated data is not provided in stored_data to a PreequilibratedSimulation workflow, it triggers error "A protocol with the same id already exists in this workflow" #698

@lilyminium

Description

@lilyminium

Describe the bug

A PreequilibratedSimulation workflow depends on sourcing from a file storage system (named stored_data by default) that contains equilibrated data. When stored_data is empty, running a PreequilibratedSimulation workflow with multiple protocols [?] triggers the error A protocol with the same id already exists in this workflow.

To Reproduce

Run a re-fit and forget to copy over stored_data or provide it as a path to LocalFileStorage.

Output

Link to gist, also my own error below:

  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/server/server.py", line 697, in _handle_connections
    self._handle_stream(connection, connection.getpeername())
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/server/server.py", line 678, in _handle_stream
    self._handle_job_submission(connection, address, message_length)
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/server/server.py", line 617, in _handle_job_submission
    self._launch_batch(batch)
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/server/server.py", line 541, in _launch_batch
    current_layer.schedule_calculation(
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/layers/layers.py", line 416, in schedule_calculation
    futures = cls._schedule_calculation(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/layers/workflow.py", line 229, in _schedule_calculation
    workflow_graph, provenance = cls._build_workflow_graph(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/layers/workflow.py", line 155, in _build_workflow_graph
    workflow_graph.add_workflows(*workflows)
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/workflow/workflow.py", line 913, in add_workflows
    workflow.replace_protocol(original_protocol, new_protocol, True)
  File "/data/homezvol3/lilyw7/miniforge3/envs/evaluator-050/lib/python3.11/site-packages/openff/evaluator/workflow/workflow.py", line 487, in replace_protocol
    raise ValueError(
ValueError: ('A protocol with the same id already exists in this workflow: ', 'dens_2188924364196997512|unpack_data')

Computing environment (please complete the following information):

  • Operating system
  • Output of running conda list

Additional context

What's supposed to happen is that the protocol unpack_data_mixture just errors when it runs. However, this error is triggering on workflow setup, before protocols get executed (or even passed to the dask scheduler).

(thinking out loud below)
Why are these protocols mergeable? I think mergeability checks the inputs and when simulation_data_path is undefined that confuses Evaluator. simulation_data_path is the only input to UnpackStoredEquilibrationData so it's guaranteed to be confusing when one or more boxes has no simulation_data_path. I think this has not been caught by tests because due to trying to keep them small and minimal, I've only tested the case where only one box is missing.

We could probably band-aid this with the same solution as #637 but for UnpackStoredEquilibrationData.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions