The omicron variant of concern (VOC) has been circulating world wide for about two months. I thought it would be interesting to look at its growth rate compared to the currently predominate delta strain.
On December 29 2021, I downloaded metadata for 6,553,878 sequences from GISAID. A question arises about whether GISAID sequence data is a representative sample of the world COVID-19 situation. Here are the top ten countries submitting sequences:
Country |
Count |
USA |
2077490 |
United
Kingdom |
1586855 |
Germany |
309670 |
Denmark |
278104 |
Canada |
230413 |
Japan |
185716 |
France |
174125 |
Sweden |
138151 |
Switzerland |
102008 |
India |
99823 |
temp <- meta_data %>%
select(Location) %>% separate(Location, c("Region", "Country", "State"), sep = "\\s*/\\s*", extra = "drop", fill = "right") %>% select(-c("Region", "State")) %>% group_by(Country) %>% count() %>% rename(Count = n)
arrange(temp, -Count)[1:10, ]
The USA and the United Kingdom make up about 56% of the total sequencing effort. For the most part the top ten are WEIRD countries (Western, Educated, Industrialized, Rich and Democratic). Note the absence of China. That's the data we have, so that's the data we use even though it's not representative of the whole world.
Here are plots of the log Counts by Variants of Concern (VOC) and Variants of Interest (VOI) for the entire data set and for the USA.
Exponential Growth
delta_fit <-subsequence_fit(variants_2021_12_19, "VOC Delta", start = "2021-04-01", end = "2021-08-01", title = "Delta")
> summary(delta_fit$fit) Call: lm(formula = log(Count) ~ Collection.date, data = df) Residuals: Min 1Q Median 3Q Max -0.8220 -0.1270 0.0498 0.1784 0.5980 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -7.257e+02 1.296e+01 -55.99 <2e-16 *** Collection.date 3.906e-02 6.902e-04 56.59 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2718 on 121 degrees of freedom Multiple R-squared: 0.9636, Adjusted R-squared: 0.9633 F-statistic: 3202 on 1 and 121 DF, p-value: < 2.2e-16 >
> summary(omicron_fit$fit) Call: lm(formula = log(Count) ~ Collection.date, data = df) Residuals: Min 1Q Median 3Q Max -1.20756 -0.17818 0.06492 0.28299 1.08133 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.024e+03 1.228e+02 -32.76 <2e-16 *** Collection.date 2.125e-01 6.479e-03 32.80 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4839 on 38 degrees of freedom Multiple R-squared: 0.9659, Adjusted R-squared: 0.965 F-statistic: 1076 on 1 and 38 DF, p-value: < 2.2e-16 >
> log(2)/delta_fit$fit$coefficients[2] Collection.date 17.7457 > log(2)/omicron_fit$fit$coefficients[2] Collection.date 3.261681 >
No comments:
Post a Comment