H5N1 Update

By Ahmed Mostafa, Elsayed M. Abdelwhab, Thomas C. Mettenleiter, and Stephan Pleschka - mdpi.com/1999-4915/10/9/497/htm, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=92987475

H5N1, bird flu, continues to infect wild and domestic birds, cattle, cats, and humans. So far, luck has been with us and H5N1 hasn't become a serious threat to humans. That is unless you don't consider the price of eggs and the contribution that issue made to the current chaos and incompetence in Washington DC.

On Feb. 28, 2025, I downloaded 6,623 H5N1 HA sequences in FASTA format from GISAID in order to look at the current state of mutations in the virus data. The analysis below is similar to posts here and here.

I read the sequences into a dataframe with the FASTA header information becoming the columns of the dataframe. 

> df_2025_02_28 <- fasta2dataframe("data/gisaid_epiflu_sequence_HA_2025_02_28.fasta")

The fasta2dataframe function is described in this post

fasta2dataframe filters out all sequences with length below 1700. Next I wrote the filtered sequences to a new FASTA file.

> library(LocaTT)
> write.fasta(names = df_2025_02_28$seq.name, sequences = df_2025_02_28$seq.text, file = "data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta")

The sequences were aligned with MAFFT.

time mafft --6merpair --maxambiguous 0.05 --preservecase --thread -1 --addfragments data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta data/HA_reference.fasta > data/gisaid_epiflu_sequence_HA_2025_02_28_aln.fasta

I made some modification to simplify the Python program gisaid_H5N1_mutations.py.The updated program is available on GitHub. The first 10 rows of the output CSV file are show below.

reference_positionalignment_posref_freqalt_freqref_nucleotidecodon_posref_aaaa_nameref_codonalt_nucleotidealt_aaalt_aa_namealt_codonmutationsynonomous
4370.80402010050251260.1564070351758794G4EGluGAGAKLysAAGE4Knon_syn
6390.79868090452261310.1727386934673367G4EGluGAGAEGluGAAE4Esyn
18510.61055276381909550.37170226130653267A16LLeuCTATLLeuCTTL16Lsyn
31640.78423366834170850.198178392G31VValGTTAIIleATTV31Inon_syn
931260.65263819095477380.3473618090452261A91QGlnCAAGQGlnCAGQ91Qsyn
1231560.79569723618090450.20100502512562815T121TThrACTCTThrACCT121Tsyn
1541870.82851758793969850.17132537688442212A154TThrACAGAAlaGCAT154Anon_syn
1712060.58558417085427140.39777010050251255A169LLeuCTACLLeuCTCL169Lsyn
1742100.58605527638190960.41363065326633164C172CCysTGCTCCysTGTC172Csyn
1772130.59108040201005030.40891959798994976C175DAspGACTDAspGATD175Dsyn

Luckily, again no new reported mutations that seem likely to increase the ability of the virus to invade human cells.

The code can be found on GitHub

No comments:

Post a Comment