@@ -20,7 +20,7 @@ grand_parent: COVIDcast Epidata API
2020* ** Earliest issue available:** July 29, 2020
2121* ** Number of data revisions since May 19, 2020:** 1
2222* ** Date of last change:** October 22, 2020
23- * ** Available for:** hrr, msa, state (see [ geography coding docs] ( ../covidcast_geography.md ) )
23+ * ** Available for:** county, hrr, msa, state, HHS, nation (see [ geography coding docs] ( ../covidcast_geography.md ) )
2424* ** Time type:** day (see [ date format docs] ( ../covidcast_times.md ) )
2525* ** License:** [ CC BY] ( ../covidcast_licensing.md#creative-commons-attribution )
2626
6868p = \frac{100 x}{n}
6969$$
7070
71- We estimate p across 3 temporal-spatial aggregation schemes:
71+ We estimate p across 6 temporal-spatial aggregation schemes:
72+ - daily, at the county level;
7273- daily, at the MSA (metropolitan statistical area) level;
7374- daily, at the HRR (hospital referral region) level;
74- - daily, at the state level.
75+ - daily, at the state level;
76+ - daily, at the HHS level;
77+ - daily, at the US national level.
7578
76- ** MSA and HRR levels** : In a given MSA or HRR, suppose $$ N $$ COVID tests are taken
77- in a certain time period, $$ X $$ is the number of tests taken with positive
78- results.
79+ #### Standard Error
7980
80- For raw signals:
81- - if $$ N \geq 50 $$ , we simply use :
81+ We assume the estimates for each time point follow a binomial distribution. The
82+ estimated standard error then is :
8283
8384$$
84- p = \ frac{100 X}{N}
85+ \text{se} = 100 \sqrt{ \ frac{\frac{p}{ 100}(1- \frac{p}{100})}{N} }
8586$$
8687
87- For smoothed signals, before taking the temporal pooling average,
88- - if $$ N \geq 50 $$ , we also use:
88+ #### Smoothing
89+
90+ We add two kinds of smoothing to the smoothed signals:
91+
92+ ##### Temporal Smoothing
93+ Smoothed estimates are formed by pooling data over time. That is, daily, for
94+ each location, we first pool all data available in that location over the last 7
95+ days, and we then recompute everything described in the two subsections above.
96+
97+ Pooling in this way makes estimates available in more geographic areas, as many areas
98+ report very few tests per day, but have enough data to report when 7 days are considered.
99+
100+ ##### Geographical Smoothing
101+
102+ ** County, MSA and HRR levels** : In a given County, MSA or HRR, suppose $$ N $$ COVID tests
103+ are taken in a certain time period, $$ X $$ is the number of tests taken with positive
104+ results.
105+
106+
107+ For smoothed signals, after taking the temporal pooling,
108+ - if $$ N \geq 50 $$ , we still use:
89109$$
90110p = \frac{100 X}{N}
91111$$
92- - if $$ 25 \leq N < 50 $$ , we lend $$ 50 - N $$ fake samples from its home state to shrink the
112+ - if $$ 25 \leq N < 50 $$ , we lend $$ 50 - N $$ fake samples from its parent state to shrink the
93113estimate to the state's mean, which means:
94114$$
95115p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50} \frac{X_s}{N_s} \right)
96116$$
97117where $$ N_s, X_s $$ are the number of COVID tests and the number of COVID tests
98- taken with positive results taken in its home state in the same time period.
118+ taken with positive results taken in its parent state in the same time period.
119+ A parent state is defined as the state with the largest proportion of the population
120+ in this county/MSA/HRR.
99121
100- ** State level** : the states with fewer than 50 tests are discarded. For the
101- rest of the states with sufficient samples,
122+ Counties with sample sizes smaller than 50 are merged into megacounties for
123+ the raw signals; counties with sample sizes smaller than 25 are merged into megacounties for
124+ the smoothed signals.
102125
126+ ** State level, HHS level, National level** : locations with fewer than 50 tests are discarded. For the remaining locations,
103127$$
104128p = \frac{100 X}{N}
105129$$
106130
107- #### Standard Error
108-
109- We assume the estimates for each time point follow a binomial distribution. The
110- estimated standard error then is:
111-
112- $$
113- \text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} }
114- $$
115-
116- #### Smoothing
117-
118- Smoothed estimates are formed by pooling data over time. That is, daily, for
119- each location, we first pool all data available in that location over the last 7
120- days, and we then recompute everything described in the last two
121- subsections. Pooling in this way makes estimates available in more geographic
122- areas, as many areas report very few tests per day, but have enough data to
123- report when 7 days are considered.
124-
125131### Lag and Backfill
126132
127133Because testing centers may report their data to Quidel several days after they
@@ -142,13 +148,13 @@ This data source is based on data provided to us by a lab testing company. They
142148
143149### Missingness
144150
145- When fewer than 50 tests are reported in a state on a specific day, no data is
151+ When fewer than 50 tests are reported in a state/a HHS region/US on a specific day, no data is
146152reported for that area on that day; an API query for all reported states on that
147153day will not include it.
148154
149- When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
150- not enough samples can be filled in from the parent state, no data is reported
151- for that area on that day; an API query for all reported geographic areas on
155+ When fewer than 50 tests are reported in a county, HRR or MSA on a specific day, and
156+ not enough samples can be filled in from the parent state for smoothed signals specifically,
157+ no data is reported for that area on that day; an API query for all reported geographic areas on
152158that day will not include it.
153159
154160## Flu Tests
0 commit comments