Merge pull request #839 from cmu-delphi/add_more_details_to_quidel_doc

krivard · web-flow · commit aba49137e323 · 2022-02-17T16:01:01.000-05:00
Update Quidel doc
diff --git a/docs/api/covidcast-signals/quidel.md b/docs/api/covidcast-signals/quidel.md
@@ -20,7 +20,7 @@ grand_parent: COVIDcast Epidata API
 * **Earliest issue available:** July 29, 2020 
 * **Number of data revisions since May 19, 2020:** 1
 * **Date of last change:** October 22, 2020
-* **Available for:** hrr, msa, state (see [geography coding docs](../covidcast_geography.md))
+* **Available for:** county, hrr, msa, state, HHS, nation (see [geography coding docs](../covidcast_geography.md))
 * **Time type:** day (see [date format docs](../covidcast_times.md))
 * **License:** [CC BY](../covidcast_licensing.md#creative-commons-attribution)
 
@@ -68,60 +68,66 @@ $$
 p = \frac{100 x}{n}
 $$
 
-We estimate p across 3 temporal-spatial aggregation schemes:
+We estimate p across 6 temporal-spatial aggregation schemes:
+- daily, at the county level;
 - daily, at the MSA (metropolitan statistical area) level;
 - daily, at the HRR (hospital referral region) level;
-- daily, at the state level.
+- daily, at the state level;
+- daily, at the HHS level;
+- daily, at the US national level.
 
-**MSA and HRR levels**: In a given MSA or HRR, suppose $$N$$ COVID tests are taken
-in a certain time period, $$X$$ is the number of tests taken with positive
-results. 
+#### Standard Error
 
-For raw signals:
-- if $$N \geq 50$$, we simply use:
+We assume the estimates for each time point follow a binomial distribution. The
+estimated standard error then is:
 
 $$
-p = \frac{100 X}{N}
+\text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} } 
 $$
 
-For smoothed signals, before taking the temporal pooling average,
-- if $$N \geq 50$$, we also use:
+#### Smoothing
+
+We add two kinds of smoothing to the smoothed signals:
+
+##### Temporal Smoothing
+Smoothed estimates are formed by pooling data over time. That is, daily, for
+each location, we first pool all data available in that location over the last 7
+days, and we then recompute everything described in the two subsections above. 
+
+Pooling in this way makes estimates available in more geographic areas, as many areas 
+report very few tests per day, but have enough data to report when 7 days are considered.
+
+##### Geographical Smoothing
+
+**County, MSA and HRR levels**: In a given County, MSA or HRR, suppose $$N$$ COVID tests 
+are taken in a certain time period, $$X$$ is the number of tests taken with positive
+results. 
+
+
+For smoothed signals, after taking the temporal pooling,
+- if $$N \geq 50$$, we still use:
 $$
 p = \frac{100 X}{N}
 $$
-- if $$25 \leq N < 50$$, we lend $$50 - N$$ fake samples from its home state to shrink the
+- if $$25 \leq N < 50$$, we lend $$50 - N$$ fake samples from its parent state to shrink the
 estimate to the state's mean, which means:
 $$
 p = 100 \left( \frac{N}{50} \frac{X}{N} + \frac{50 - N}{50}  \frac{X_s}{N_s} \right) 
 $$
 where $$N_s, X_s$$ are the number of COVID tests and the number of COVID tests
-taken with positive results taken in its home state in the same time period.
+taken with positive results taken in its parent state in the same time period.
+A parent state is defined as the state with the largest proportion of the population 
+in this county/MSA/HRR.
 
-**State level**: the states with fewer than 50 tests are discarded. For the
-rest of the states with sufficient samples,
+Counties with sample sizes smaller than 50 are merged into megacounties for 
+the raw signals; counties with sample sizes smaller than 25 are merged into megacounties for
+the smoothed signals.
 
+**State level, HHS level, National level**: locations with fewer than 50 tests are discarded. For the remaining locations,
 $$
 p = \frac{100 X}{N}
 $$
 
-#### Standard Error
-
-We assume the estimates for each time point follow a binomial distribution. The
-estimated standard error then is:
-
-$$
-\text{se} = 100 \sqrt{ \frac{\frac{p}{100}(1- \frac{p}{100})}{N} } 
-$$
-
-#### Smoothing
-
-Smoothed estimates are formed by pooling data over time. That is, daily, for
-each location, we first pool all data available in that location over the last 7
-days, and we then recompute everything described in the last two
-subsections. Pooling in this way makes estimates available in more geographic
-areas, as many areas report very few tests per day, but have enough data to
-report when 7 days are considered.
-
 ### Lag and Backfill
 
 Because testing centers may report their data to Quidel several days after they
@@ -142,13 +148,13 @@ This data source is based on data provided to us by a lab testing company. They
 
 ### Missingness
 
-When fewer than 50 tests are reported in a state on a specific day, no data is
+When fewer than 50 tests are reported in a state/a HHS region/US on a specific day, no data is
 reported for that area on that day; an API query for all reported states on that
 day will not include it.
 
-When fewer than 50 tests are reported in an HRR or MSA on a specific day, and
-not enough samples can be filled in from the parent state, no data is reported
-for that area on that day; an API query for all reported geographic areas on
+When fewer than 50 tests are reported in a county, HRR or MSA on a specific day, and
+not enough samples can be filled in from the parent state for smoothed signals specifically, 
+no data is reported for that area on that day; an API query for all reported geographic areas on
 that day will not include it.
 
 ## Flu Tests