Skip to content

Commit 7832ddb

Browse files
authored
Merge pull request #201 from cmu-delphi/release-v1.12.1
Release v1.12.1
2 parents 5ae973e + 38e8776 commit 7832ddb

File tree

19 files changed

+176
-99
lines changed

19 files changed

+176
-99
lines changed

config.toml

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,29 +38,49 @@ relativeURLs = false
3838
weight = 2
3939
[[menu.main]]
4040
parent = "covidcast"
41-
name = "Map"
41+
name = "Map Overview"
4242
url = "/covidcast"
4343
weight = 1
44+
[[menu.main]]
45+
parent = "covidcast"
46+
name = "Timelapse"
47+
url = "/covidcast/timelapse"
48+
weight = 2
49+
[[menu.main]]
50+
parent = "covidcast"
51+
name = "Top 10"
52+
url = "/covidcast/top10"
53+
weight = 3
54+
[[menu.main]]
55+
parent = "covidcast"
56+
name = "Single Region"
57+
url = "/covidcast/single"
58+
weight = 4
4459
[[menu.main]]
4560
parent = "covidcast"
4661
name = "Surveys"
4762
url = "/covidcast/surveys"
48-
weight = 2
63+
weight = 5
4964
[[menu.main]]
5065
parent = "covidcast"
5166
name = "Survey Results"
5267
url = "/covidcast/survey-results"
53-
weight = 3
68+
weight = 6
69+
[[menu.main]]
70+
parent = "covidcast"
71+
name = "Export Data"
72+
url = "/covidcast/export"
73+
weight = 7
5474
[[menu.main]]
5575
parent = "covidcast"
5676
name = "Release Log"
5777
url = "/covidcast/release-log"
58-
weight = 4
78+
weight = 8
5979
[[menu.main]]
6080
parent = "covidcast"
6181
name = "Terms Of Use"
6282
url = "/covidcast/terms-of-use"
63-
weight = 5
83+
weight = 9
6484
[[menu.main]]
6585
identifier = "flu"
6686
name = "Flu and Other Diseases"

content/blog/2020-09-21-forecast-demo.Rmd

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -111,20 +111,16 @@ We evaluate the following four models:
111111

112112
$$
113113
\begin{aligned}
114-
&\text{Cases:} \\
115-
& h(Y_{\ell,t+d})
116-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
117-
&\text{Cases + Facebook:} \\
118-
& h(Y_{\ell,t+d})
119-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
114+
h(Y_{\ell,t+d})
115+
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
116+
h(Y_{\ell,t+d})
117+
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
120118
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
121-
&\text{Cases + Google:} \\
122-
& h(Y_{\ell,t+d})
123-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
119+
h(Y_{\ell,t+d})
120+
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
124121
\sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
125-
&\text{Cases + Facebook + Google:} \\
126-
& h(Y_{\ell,t+d})
127-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
122+
h(Y_{\ell,t+d})
123+
&\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
128124
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
129125
\sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
130126
\end{aligned}
@@ -134,14 +130,15 @@ Here $d=7$ or $d=14$, depending on the target value
134130
(number of days we predict ahead),
135131
and $h$ is a transformation to be specified later.
136132

137-
Informally, the first model bases its predictions of future case rates
138-
on the following three features:
133+
Informally, the first model, which we'll call the "Cases" model,
134+
bases its predictions of future case rates on the following three features:
139135
current COVID-19 case rates, and those 1 and 2 weeks back.
140-
The second model additionally incorporates the current Facebook signal,
141-
and the Facebook signal from 1 and 2 weeks back.
142-
The third model is exactly same but substitutes the Google signal
143-
instead of the Facebook one.
144-
Finally, the fourth model uses both Facebook and Google signals.
136+
The second model, "Cases + Facebook", additionally incorporates the
137+
current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
138+
The third model, "Cases + Google", is exactly the same but substitutes the
139+
Google signal instead of the Facebook one.
140+
Finally, the fourth model, "Cases + Facebook + Google",
141+
uses both Facebook and Google signals.
145142
For each model, in order to make a forecast at time $t_0$
146143
(to predict case rates at time $t_0+d$),
147144
we fit a linear model using least absolute deviations (LAD) regression,
@@ -293,8 +290,8 @@ is much bigger but still below 0.01.
293290
test](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test)
294291
(for paired data, as we have here) is more popular,
295292
because it tends to be more powerful than the sign test.
296-
Applied here, it does indeed give smaller p-values pretty much across the board.
297-
However, it assumes symmetry of the distribution in question
293+
Applied here, it does indeed give smaller p-values pretty much across the
294+
board. However, it assumes symmetry of the distribution in question
298295
(in our case, the difference in scaled errors),
299296
whereas the sign test does not, and thus we show results from the latter.
300297

content/blog/2020-09-21-forecast-demo.html

Lines changed: 60 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@
1414
summary: |
1515
Building on our previous two posts (on our COVID-19 symptom surveys through
1616
Facebook and Google)
17-
this post offers a deeper dive into empirical analysis, examining whether the
18-
% CLI-in-community indicators from our two surveys can be used to improve
17+
this post offers a deeper dive into empirical analysis, examining whether the
18+
% CLI-in-community indicators from our two surveys can be used to improve
1919
the accuracy of short-term forecasts of county-level COVID-19 case rates.
2020
acknowledgements: |
2121
Delphi's forecasting effort involves many people from our
22-
modeling team, from forecaster design, to implementation, to evaluation. The
22+
modeling team, from forecaster design, to implementation, to evaluation. The
2323
broader insights on forecasting shared in this post certainly cannot be
24-
attributable to Ryan's work alone, and are a reflection of the work carried out
24+
attributable to Ryan's work alone, and are a reflection of the work carried out
2525
by all these team members.
2626
related:
2727
- 2020-09-18-google-survey
@@ -120,35 +120,32 @@ <h2>Problem Setup</h2>
120120
We evaluate the following four models:</p>
121121
<p><span class="math display">\[
122122
\begin{aligned}
123-
&amp;\text{Cases:} \\
124-
&amp; h(Y_{\ell,t+d})
125-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
126-
&amp;\text{Cases + Facebook:} \\
127-
&amp; h(Y_{\ell,t+d})
128-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
123+
h(Y_{\ell,t+d})
124+
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) \\
125+
h(Y_{\ell,t+d})
126+
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
129127
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) \\
130-
&amp;\text{Cases + Google:} \\
131-
&amp; h(Y_{\ell,t+d})
132-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
128+
h(Y_{\ell,t+d})
129+
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
133130
\sum_{j=0}^2 \gamma_j h(G_{\ell,t-7j}) \\
134-
&amp;\text{Cases + Facebook + Google:} \\
135-
&amp; h(Y_{\ell,t+d})
136-
\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
131+
h(Y_{\ell,t+d})
132+
&amp;\approx \alpha + \sum_{j=0}^2 \beta_j h(Y_{\ell,t-7j}) +
137133
\sum_{j=0}^2 \gamma_j h(F_{\ell,t-7j}) +
138134
\sum_{j=0}^2 \tau_j h(G_{\ell,t-7j}).
139135
\end{aligned}
140136
\]</span></p>
141137
<p>Here <span class="math inline">\(d=7\)</span> or <span class="math inline">\(d=14\)</span>, depending on the target value
142138
(number of days we predict ahead),
143139
and <span class="math inline">\(h\)</span> is a transformation to be specified later.</p>
144-
<p>Informally, the first model bases its predictions of future case rates
145-
on the following three features:
140+
<p>Informally, the first model, which we’ll call the “Cases” model,
141+
bases its predictions of future case rates on the following three features:
146142
current COVID-19 case rates, and those 1 and 2 weeks back.
147-
The second model additionally incorporates the current Facebook signal,
148-
and the Facebook signal from 1 and 2 weeks back.
149-
The third model is exactly same but substitutes the Google signal
150-
instead of the Facebook one.
151-
Finally, the fourth model uses both Facebook and Google signals.
143+
The second model, “Cases + Facebook”, additionally incorporates the
144+
current Facebook signal, and the Facebook signal from 1 and 2 weeks back.
145+
The third model, “Cases + Google”, is exactly the same but substitutes the
146+
Google signal instead of the Facebook one.
147+
Finally, the fourth model, “Cases + Facebook + Google”,
148+
uses both Facebook and Google signals.
152149
For each model, in order to make a forecast at time <span class="math inline">\(t_0\)</span>
153150
(to predict case rates at time <span class="math inline">\(t_0+d\)</span>),
154151
we fit a linear model using least absolute deviations (LAD) regression,
@@ -217,17 +214,17 @@ <h2>Forecasting Code</h2>
217214
as.Date(max(time_value)),
218215
by = &quot;day&quot;)) %&gt;% ungroup()
219216
df = full_join(df, df_all, by = c(&quot;geo_value&quot;, &quot;time_value&quot;))
220-
217+
221218
# Group by geo value, sort rows by increasing time
222-
df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value)
223-
219+
df = df %&gt;% group_by(geo_value) %&gt;% arrange(time_value)
220+
224221
# Load over shifts, and add lag value or lead value
225222
for (shift in shifts) {
226223
fun = ifelse(shift &lt; 0, lag, lead)
227224
varname = sprintf(&quot;value%+d&quot;, shift)
228225
df = mutate(df, !!varname := fun(value, n = abs(shift)))
229226
}
230-
227+
231228
# Ungroup and return
232229
return(ungroup(df))
233230
}
@@ -261,40 +258,40 @@ <h2>Forecasting Code</h2>
261258
case_num = 200
262259
geo_values = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_cumulative_num&quot;,
263260
&quot;2020-05-14&quot;, &quot;2020-05-14&quot;) %&gt;%
264-
filter(value &gt;= case_num) %&gt;% pull(geo_value)
261+
filter(value &gt;= case_num) %&gt;% pull(geo_value)
265262

266263
# Fetch county-level Google and Facebook % CLI-in-community signals, and JHU
267264
# confirmed case incidence proportion
268265
start_day = &quot;2020-04-11&quot;
269266
end_day = &quot;2020-09-01&quot;
270267
g = covidcast_signal(&quot;google-survey&quot;, &quot;smoothed_cli&quot;) %&gt;%
271-
filter(geo_value %in% geo_values) %&gt;%
272-
select(geo_value, time_value, value)
273-
f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;,
268+
filter(geo_value %in% geo_values) %&gt;%
269+
select(geo_value, time_value, value)
270+
f = covidcast_signal(&quot;fb-survey&quot;, &quot;smoothed_hh_cmnty_cli&quot;,
274271
start_day, end_day) %&gt;%
275-
filter(geo_value %in% geo_values) %&gt;%
276-
select(geo_value, time_value, value)
272+
filter(geo_value %in% geo_values) %&gt;%
273+
select(geo_value, time_value, value)
277274
c = covidcast_signal(&quot;jhu-csse&quot;, &quot;confirmed_7dav_incidence_prop&quot;,
278275
start_day, end_day) %&gt;%
279-
filter(geo_value %in% geo_values) %&gt;%
276+
filter(geo_value %in% geo_values) %&gt;%
280277
select(geo_value, time_value, value)
281278

282-
# Find &quot;complete&quot; counties, present in all three data signals at all times
279+
# Find &quot;complete&quot; counties, present in all three data signals at all times
283280
geo_values_complete = intersect(intersect(g$geo_value, f$geo_value),
284281
c$geo_value)
285282

286-
# Filter to complete counties, transform the signals, append 1-2 week lags to
283+
# Filter to complete counties, transform the signals, append 1-2 week lags to
287284
# all three, and also 1-2 week leads to case rates
288-
lags = 1:2 * -7
285+
lags = 1:2 * -7
289286
leads = 1:2 * 7
290-
g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
291-
mutate(value = trans(value * rescale_g)) %&gt;%
292-
append_shifts(shifts = lags)
293-
f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
294-
mutate(value = trans(value * rescale_f)) %&gt;%
295-
append_shifts(shifts = lags)
287+
g = g %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
288+
mutate(value = trans(value * rescale_g)) %&gt;%
289+
append_shifts(shifts = lags)
290+
f = f %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
291+
mutate(value = trans(value * rescale_f)) %&gt;%
292+
append_shifts(shifts = lags)
296293
c = c %&gt;% filter(geo_value %in% geo_values_complete) %&gt;%
297-
mutate(value = trans(value * rescale_c)) %&gt;%
294+
mutate(value = trans(value * rescale_c)) %&gt;%
298295
append_shifts(shifts = c(lags, leads))
299296

300297
# Rename columns
@@ -310,55 +307,55 @@ <h2>Forecasting Code</h2>
310307

311308
# Use quantgen for LAD regression (this package supports quantile regression and
312309
# more; you can find it on GitHub here: https://github.com/ryantibs/quantgen)
313-
library(quantgen)
310+
library(quantgen)
314311

315312
res_list = vector(&quot;list&quot;, length = length(leads))
316313

317314
# Loop over lead, forecast dates, build models and record errors (warning: this
318315
# computation takes a while)
319-
for (i in 1:length(leads)) {
316+
for (i in 1:length(leads)) {
320317
lead = leads[i]; if (verbose) cat(&quot;***&quot;, lead, &quot;***\n&quot;)
321-
318+
322319
# Create a data frame to store our forecast results. Code below populates its
323-
# rows in a way that breaks from typical dplyr operations, done for efficiency
324-
res_list[[i]] = z %&gt;%
325-
filter(between(time_value, as.Date(start_day) - min(lags) + lead,
320+
# rows in a way that breaks from typical dplyr operations, done for efficiency
321+
res_list[[i]] = z %&gt;%
322+
filter(between(time_value, as.Date(start_day) - min(lags) + lead,
326323
as.Date(end_day) - lead)) %&gt;%
327324
select(geo_value, time_value) %&gt;%
328-
mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA),
329-
err3 = as.double(NA), err4 = as.double(NA), lead = lead)
325+
mutate(err0 = as.double(NA), err1 = as.double(NA), err2 = as.double(NA),
326+
err3 = as.double(NA), err4 = as.double(NA), lead = lead)
330327
valid_dates = unique(res_list[[i]]$time_value)
331-
328+
332329
for (k in 1:length(valid_dates)) {
333330
date = valid_dates[k]; if (verbose) cat(format(date), &quot;... &quot;)
334-
331+
335332
# Filter down to training set and test set
336333
z_tr = z %&gt;% filter(between(time_value, date - lead - n, date - lead))
337334
z_te = z %&gt;% filter(time_value == date)
338335
inds = which(res_list[[i]]$time_value == date)
339-
336+
340337
# Create training and test responses
341338
y_tr = z_tr %&gt;% pull(paste0(&quot;case+&quot;, lead))
342339
y_te = z_te %&gt;% pull(paste0(&quot;case+&quot;, lead))
343-
340+
344341
# Strawman model
345342
if (verbose) cat(&quot;0&quot;)
346343
y_hat = z_te %&gt;% pull(case)
347344
res_list[[i]][inds,]$err0 = abs(inv_trans(y_hat) - inv_trans(y_te))
348-
345+
349346
# Cases only model
350347
if (verbose) cat(&quot;1&quot;)
351348
x_tr_case = z_tr %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
352349
x_te_case = z_te %&gt;% select(starts_with(&quot;case&quot;) &amp; !contains(&quot;+&quot;))
353-
x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows
350+
x_tr = x_tr_case; x_te = x_te_case # For symmetry wrt what follows
354351
ok = complete.cases(x_tr, y_tr)
355352
if (sum(ok) &gt; 0) {
356353
obj = quantile_lasso(as.matrix(x_tr[ok,]), y_tr[ok], tau = 0.5,
357354
lambda = 0, lp_solver = lp_solver)
358355
y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
359356
res_list[[i]][inds,]$err1 = abs(inv_trans(y_hat) - inv_trans(y_te))
360357
}
361-
358+
362359
# Cases and Facebook model
363360
if (verbose) cat(&quot;2&quot;)
364361
x_tr_fb = z_tr %&gt;% select(starts_with(&quot;fb&quot;))
@@ -386,7 +383,7 @@ <h2>Forecasting Code</h2>
386383
y_hat = as.numeric(predict(obj, newx = as.matrix(x_te)))
387384
res_list[[i]][inds,]$err3 = abs(inv_trans(y_hat) - inv_trans(y_te))
388385
}
389-
386+
390387
# Cases, Facebook, and Google model
391388
if (verbose) cat(&quot;4\n&quot;)
392389
x_tr = cbind(x_tr_case, x_tr_fb, x_tr_goog)
@@ -401,7 +398,7 @@ <h2>Forecasting Code</h2>
401398
}
402399
}
403400

404-
# Bind results over different leads into one big data frame, and save
401+
# Bind results over different leads into one big data frame, and save
405402
res = do.call(rbind, res_list)
406403
save(list = ls(), file = &quot;demo.rda&quot;)</code></pre>
407404
</div>
@@ -1036,8 +1033,8 @@ <h2>Wrap-Up</h2>
10361033
test</a>
10371034
(for paired data, as we have here) is more popular,
10381035
because it tends to be more powerful than the sign test.
1039-
Applied here, it does indeed give smaller p-values pretty much across the board.
1040-
However, it assumes symmetry of the distribution in question
1036+
Applied here, it does indeed give smaller p-values pretty much across the
1037+
board. However, it assumes symmetry of the distribution in question
10411038
(in our case, the difference in scaled errors),
10421039
whereas the sign test does not, and thus we show results from the latter.<a href="#fnref2" class="footnote-back">↩︎</a></p></li>
10431040
<li id="fn3"><p>Delphi’s “production” forecasters are still based on relatively simple

content/covidcast/_index.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
---
22
title: COVIDcast
3-
layout: covidcast_app
43
description: COVIDcast tracks and forecasts the spread of COVID-19. By Carnegie Mellon's Delphi Research Group.
4+
layout: covidcast_app
5+
app_mode: overview
6+
order: 1
7+
modeTitle: Map Overview
8+
icon: solid/map
59
heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
610
---

content/covidcast/export.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,8 @@ title: COVIDCast Export Data
33
linkTitle: Export Data
44
description: Use COVIDcast data in your own analysis
55
layout: covidcast_app
6+
app_mode: export
7+
order: 6
8+
icon: solid/download
69
heroImage: /images/landing-page/hero-images/covidcast_withfill.jpg
710
---

0 commit comments

Comments
 (0)