-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug Report
Description
Hi, first of all, thanks to all your team for the amazing tool you've provided to the community !
While working on multiple versions of one dataset, I've tried to get metrics for both versions and I've stumbled upon an issue specific to my setup : when I execute dvc repro
on each value of my templated dataset_dirpath
, all my checks pass since it execution space is my workspace ; however, when I use dvc exp run --queue
, since my workspace is copied in tmp/exps
WITHOUT the gitignored files (including data tracked with dvc
), the pipeline breaks.
Reproduce
I've created a minimal repository to reproduce the issue :
git clone https://github.com/Gwenn-LR/dvc_exp_run_with_templated_input_dataset.git
cd dvc_exp_run_with_templated_input_dataset
dvc init
mkdir data/dataset_1 data/dataset_2
dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data/dataset_1/data.xml
cp data/dataset_1 data/dataset_2
dvc add data/dataset_1 data/dataset_2
dvc repro pipelines/default/dvc.yaml
# Works.
dvc exp run --queue -S "pipelines/default/params.yaml:dataset_dirpath=dataset_1, dataset_2" pipelines/default/dvc.yaml
dvc exp run --run-all
# Does not work !
Expected
After what I've found on multiple discussions on this repository, I know that those gitignored files are not copied since it could interfere with the Git status of the associated experiment. However, I think that this behavior should not be ignored and integrated or clearly mentioned in the docs.
I know that the linked repository is not the simplest one, but its structure is meant to reproduce a more complex one which I am working on. So my structure might be improved and I also think that composition with hydra
might be a solution, but I would like a quick solution focusing on the specific point of my issue.
It could "only" be an environment variable that might help me introduce a different behavior between an execution with dvc repro
and another with dvc exp run --queue
or even an answer like "It is not possible as it is". I'll be really glad to discuss all those improvements after a clear answer to this specific issue !
Thank you for your consideration.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.59.1 (pip)
-------------------------
Platform: Python 3.10.16 on Linux-6.11.0-25-generic-x86_64-with-glibc2.39
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.11
Supports:
http (aiohttp = 3.11.18, aiohttp-retry = 2.9.1),
https (aiohttp = 3.11.18, aiohttp-retry = 2.9.1)
Config:
Global: /home/leroch/.config/dvc
System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-ubuntu--lv
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/30f0eb604189e8e37e4eb38ec5c8d890
Additional Information (if any):