Posts

Showing posts from September, 2022

September BA Update

Image
 President Biden says the pandemic is over . Maybe, but still ~400 Americans are dying each day from COVID-19 and excess deaths are still 10% above “expected.” It seems SARS-CoV-2 is not done with us. See the wonderful  Your Local Epidemiologist blog for details. In fact, I would like to put in a plug for Katelyn Jetelina's blog . It's one of the best sources for explanations of the current epidemiological situations that I have encountered. On September 13 2022, I downloaded a metadata file from  GISAID  containing 13,061,086 rows and 22 columns.  For this post, I take a brief look at the progress of the BA variants in the US. It's hard to know if the GISAID data gives a representative sample of the COVID-19 situation as, like other dats sources, it's likely that the actual situation is underreported. Still, the data you have is the dats that you have. Here's what the data shows for variants of concern. As we have seen before the Omicron variant is firm...

Speed Demons

This post describes a little experiment in loading large file with Julia, Python, and R. On September 13, 2022, I downloaded a metadata file from GISAID . The file 13,061,086 rows and 22 columns. As a file for the PC, it's big. My usual approach to analyzing GISAID data is using a combination of R for counting and plotting variables such as linages over time and using Python, particularly BioPython, for manipulating sequence data.  When using R, I tend to use the tidyverse  tools for manipulating tabular data. Transforming dataframes by piping then through functions seems like a natural approach. Julia, the other hand, is often fast enough that writing simple loops to manipulate data is feasible and can lead to simpler more readable code. Loading a file with more than 13 million rows is slow in R. I wondered if Julia or Python/Pandas could do better.  What follows is an unscientific exercise in reading a large tab delimited file. All the tests were run on a PC with a...