You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLI_GUIDE.md
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,9 +33,7 @@ For `img2trajvid_s-prob` task, we are generating a trajectory video following pr
33
33
34
34
For all the other tasks, we use a folder for each scene that is parsable by `ReconfusionParser` (see `seva/data_io.py`). It contains (1) a subdirectory containing all views; (2) `transforms.json` defining the intrinsics and extrinsics for each image; and (3) `train_test_split_*.json` file splitting the input and target views, with `*` indicating the number of the input views.
35
35
36
-
Target views is available if you the data are from academic sources, but in the case where target views is unavailble, we will create dummy black images as placeholders.
37
-
38
-
We provide in `demo` folder (hosted <ahref="">here</a>) two examplar scenes for you to take reference from. The general data structure follows
36
+
We provide <ahref="https://github.com/Stability-AI/stable-virtual-camera/releases/tag/data">in this release</a> several examplar scenes for you to take reference from. Target views is available if you the data are from academic sources, but in the case where target views is unavailble, we will create dummy black images as placeholders (e.g., the `garden_flythrough` scene). The general data structure follows
39
37
```
40
38
<data_path>/
41
39
├── scene_1/
@@ -68,7 +66,7 @@ python demo.py \
68
66
--video_save_fps 10
69
67
```
70
68
71
-
-`--num_inputs <P>` is only necessary if there are multiple `<data_path>/train_test_split_*.json` files.
69
+
-`--num_inputs <P>` is only necessary if there are multiple `train_test_split_*.json` files in the scene folder.
72
70
- The above command works for the dataset without trajectory prior (e.g., DL3DV-140). When the trajectory prior is available given a benchmarking dataset, for example, `orbit` trajectory prior for the CO3D dataset, we use the `nearest-gt` chunking strategy by setting `--use_traj_prior True --traj_prior orbit --chunking_strategy nearest-gt`. We find this leads to more 3D consistent results.
73
71
- For all the single-view conditioning test scenarios: we set `--camera_scale <camera_scale>` with `<camera_scale>` sweeping 20 different camera scales `0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0`.
74
72
- In single-view regime for the RealEstate10K dataset, we find increasing `cfg` is helpful: we additionally set `--cfg 6.0` (`cfg` is `2.0` by default).
@@ -88,7 +86,7 @@ python demo.py \
88
86
--replace_or_include_input True
89
87
```
90
88
91
-
-`--num_inputs <P>` is only necessary if there are multiple `<data_path>/train_test_split_*.json` files.
89
+
-`--num_inputs <P>` is only necessary if there are multiple `train_test_split_*.json` files in the scene folder.
92
90
- We use `interp` chunking strategy by default.
93
91
- For the evaluation on ViewCrafter split (including the RealEastate10K, CO3D, and Tanks and Temples dataset), we find zero-shot extending `T` to `25` to fit all input and target views in one forward is better. Also, the V split uses the original image resolutions: we therefore set `--T 25 --L_short 576`.
94
92
@@ -124,7 +122,7 @@ python demo.py \
124
122
--chunk_strategy interp-gt
125
123
```
126
124
127
-
-`--num_inputs <P>` is only necessary if there are multiple `<data_path>/train_test_split_*.json` files.
125
+
-`--num_inputs <P>` is only necessary if there are multiple `train_test_split_*.json` files in the scene folder.
128
126
- Default `cfg` should be set to `3,2` (`3` being `cfg` for the first pass, and `2` being the `cfg` for the second pass). Try to increase the `cfg` for the first pass from `3` to higher values if you observe blurry areas (usually happens for harder scenes with a fair amount of unseen regions).
129
127
- Default chunking strategy should be set to `interp+gt` (instead of `interp`, `interp` can work but usually a bit worse).
130
128
- The `--chunk_strategy_first_pass` is set as `gt-nearest` by default. So it can automatically adapt when $P$ is large (up to a thousand frames).
@@ -142,7 +140,7 @@ python demo.py \
142
140
--chunk_strategy interp
143
141
```
144
142
145
-
-`--num_inputs <P>` is only necessary if there are multiple `<data_path>/train_test_split_*.json` files.
143
+
-`--num_inputs <P>` is only necessary if there are multiple `train_test_split_*.json` files in the scene folder.
146
144
- Default `cfg` should be set to `3`.
147
145
- Default chunking strategy should be set to `interp` (instead of `interp-gt`, `interp-gt` is also supported but the results do not look good).
148
146
-`T` can be overwritten by `--T <N>,21` (X being extended `T` for the first pass, and `21` being the default `T` for the second pass). `<N>` is dynamically decided now in the code but can also be manually updated. This is useful when you observe that there exist two very dissimilar adjacent anchors which make the interpolation in the second pass impossible. There exist two ways:
0 commit comments