It's been a while since I looked at this data. I was updating my version of R and decided this would be a good exercise to test it and also to see what Sars-COV-2 variants were on the rise.
I downloaded metadata from GISAID on April 20, 2023. The file contained records for 15,398,189 sequences. I took Pango lineages for the top four emerging and used the method described here to plot the accumulation of sequences for the variants.
system.time(df_variants_usa <- get_selected_variants_ch("data/metadata.tsv", selected_variants = selected_variants, title = "Selected Variants USA 2023-04-20", start_date = as.Date("2022-09-01")))
Here's the plot.
You can find the code to produce the plot here.
No comments:
Post a Comment