Spring has actually arrived in upstate NY and along with it a small COVID-19 surge.
Lineages
In a previous post, I commented on the difficulty of getting a clear picture of the state of the COVID-19 pandemic from publicly available data. Sequence and meta-data from GISAID is a rich and valuable source of information about the progress of the pandemic. Unfortunately, sequence data tends to lag case data. In addition, as public testing declines, opportunities to get viral material from infected individuals decline.
That said, I still wanted to see what kind of picture the GISAID data could provide for our local area. On April,20 2022, I downloaded 10,298,748 records for 22 variables.
I searched the GISAID data collected from the five local counties for Pango lineages submitted so far during 2022. There were a total of 63 different lineages found, most with only a few counts.
Here's how I counted.
county_variants <- function(meta_data, host = 'human', state = 'New York', counties = c('Albany', 'Columbia', 'Rensselaer', 'Saratoga', 'Schenectady'), start = "2022-01-01", end = NULL, plot = TRUE, title = NULL, min_count = 10) { require(tidyverse) if(is.null(end)) { end <- Sys.Date() } # filter date range and state df <- meta_data %>% filter(Host == {{ host }}) %>% filter(Collection.date >= as.Date({{ start }}) & Collection.date <= as.Date({{ end }})) %>% filter(str_detect(Location, {{ state }})) %>% select(Collection.date, Pango.lineage, Location) df_county_variants <- data.frame() for(county in counties) { county_name <- paste(county, "County") # get county data and count Pango lineages for the county. df_county <- df %>% filter(str_detect(Location, {{ county_name }})) %>% select(Collection.date, Pango.lineage) %>% group_by(Pango.lineage) %>% count() %>% rename(Count = n) %>% mutate(County = {{ county}}) df_county_variants <- bind_rows(df_county_variants, df_county) } if(plot) { # plot counts > min_count p <- df_county_variants %>% filter(Count > {{ min_count}}) %>% ggplot() + geom_bar(aes(x = Pango.lineage, y = Count, fill = County), stat = 'identity', width = 0.5) + labs(caption = paste('Minimum Count =', {{ min_count }})) + xlab('Pango Lineange') if(! is.null(title)) { p <- p + ggtitle(title) } print(p) } return(df_county_variants) }
The code can be downloaded from GitHub.
No comments:
Post a Comment