Data Wrangling with Julia, R, and Python
What we have is a data glut. ~ Vernor Vinge It is a capital mistake to theorize before one has data. ~ Sherlock Holmes GISAID provides a metafile describing the contents of genome sequence files. The metafile is a tab delimited table. The version I downloaded on April 15, 2021 contains 1,107,140 records, each with 22 columns. The data columns are [ 1 ] "Virus.name" "Type" "Accession.ID" [ 4 ] "Collection.date" "Location" "Additional.location.information" [ 7 ] "Sequence.length" "Host" "Patient.age" [ 10 ] "Gender" "Clade" "Pango.lineage" [ 13 ] "Pangolin.version" "Variant" ...