COVID-19 Variants and Chunking
On November 22, 2022, I downloaded a metadata file containing 13,932,236 records from GISAID in order to view the growth of several emerging SARS-CoV-2 variants of interest. First, the results. I plotted the growth of the variants BA.2.75, BQ.1, BQ.1.1, BQ.1.18, CL.1, and XBB in the USA from June 1 to the present. BQ.1. and BQ.1.1 are growing in the data. It's unclear yet whether they will have a large impact. We'll see after the Thanksgiving holidays. Memory Issues The metadata file is large, 9.8 GB on the disk. Loading into R on Windows takes a little over two minutes and used about 8 GB for the dataframe. For the purposes of demonstration, I'm using Windows 10 22H2, RStudio 2022.07.2, and R 4.2.2. > system.time(meta_data <- read_tsv( 'data/metadata.tsv' , name_repair = 'universal' )) user system elapsed 197.84 9.44 146.04 > object.size(meta_data) 8087686552 bytes > RStudio uses 9.6 GB once the data is loaded. PS...