Skip to content

make save_output_files create files with all comment lines at the start #1112

@ChrisHIV

Description

@ChrisHIV

A simple preprocessing with grep -v '^#' is one way to solve this issue, and maybe it wouldn't be a simple fix inside cmdstanr if it's related to how these files are written during sampling, but just in case it would be simple inside cmdstanr...

The files created by save_output_files have some comment lines at the start, some more in between the column headers (parameter names) and the values, and some more at the end.
This is too much for poor data.table::fread() to handle: pending its long awaited comment.char argument, it can only reliably skip lines that come together at the start of the file. Since data.table::fread() is go-to for huge csv files, it would be nice if all the comment lines were put together at the start of the file, such that these files can be read as-is by fread.

Example

library(data.table)
library(cmdstanr)

code <- "
data {
int N;
vector[N] x;
vector[N] y;
}
parameters {
real m;
real c;
real sigma;
}
model {
y ~ normal(m * x + c, sigma);
}
"
file <- write_stan_file(code)
model <- cmdstan_model(file)
samples <- model$sample(data = list(N = 1, x = 1, y = 1), iter_sampling = 10, iter_warmup = 10)
samples$save_output_files("~/", basename = "foo", timestamp = FALSE, random = FALSE)

df_ <- fread("~/foo-1.csv")

gives

Warning messages:
1: In fread("~/foo-1.csv") :
  Detected 3 column names but the data has 10 columns (i.e. invalid file). Added 7 extra default column names at the end.
2: In fread("~/foo-1.csv") :
  Stopped early on line 63. Expected 10 fields but found 1. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<# >>

and df_ is

         # 1        1       1    V4    V5    V6      V7        V8         V9       V10
       <num>    <num>   <num> <int> <int> <int>   <num>     <num>      <num>     <num>
 1: -4.76425 0.999885 2.74896     7   127     0 6.27136   184.340 -244.86300   95.1088
 2: -4.66059 0.999970 2.74896     6    63     0 5.03261   213.799 -171.11100   96.2315
 3: -5.09712 0.981220 2.74896     6    66     1 6.39704   166.980 -125.97500  158.4170
 4: -6.11659 0.999955 2.74896     7   127     0 6.23478   281.608   -7.43179  252.5850
 5: -6.35803 0.999807 2.74896     8   255     0 9.37554   242.192 -442.52700  538.0940
 6: -7.64953 0.999985 2.74896    10  1023     0 8.23180   444.349 -868.81500 2055.1400
 7: -7.33027 1.000000 2.74896    10  1023     0 8.37176  -251.299  922.23900 1348.7000
 8: -6.84893 1.000000 2.74896    10  1023     0 7.82562 -1589.190 1803.51000  917.7360
 9: -6.91315 0.999973 2.74896     8   511     0 7.26564 -1565.040 1840.16000  965.7080
10: -6.92680 0.999974 2.74896     9   831     0 7.81856 -2004.660 2241.34000  990.8020

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions