By Ahmed Mostafa, Elsayed M. Abdelwhab, Thomas C. Mettenleiter, and Stephan Pleschka - mdpi.com/1999-4915/10/9/497/htm, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=92987475
H5N1, bird flu, continues to infect wild and domestic birds, cattle, cats, and humans. So far, luck has been with us and H5N1 hasn't become a serious threat to humans. That is unless you don't consider the price of eggs and the contribution that issue made to the current chaos and incompetence in Washington DC.
On Feb. 28, 2025, I downloaded 6,623 H5N1 HA sequences in FASTA format from GISAID in order to look at the current state of mutations in the virus data. The analysis below is similar to posts here and here.
I read the sequences into a dataframe with the FASTA header information becoming the columns of the dataframe.
> df_2025_02_28 <- fasta2dataframe("data/gisaid_epiflu_sequence_HA_2025_02_28.fasta")
The fasta2dataframe function is described in this post.
fasta2dataframe filters out all sequences with length below 1700. Next I wrote the filtered sequences to a new FASTA file.
> library(LocaTT) > write.fasta(names = df_2025_02_28$seq.name, sequences = df_2025_02_28$seq.text, file = "data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta")
time mafft --6merpair --maxambiguous 0.05 --preservecase --thread -1 --addfragments data/gisaid_epiflu_sequence_HA_2025_02_28_len.fasta data/HA_reference.fasta > data/gisaid_epiflu_sequence_HA_2025_02_28_aln.fasta
I made some modification to simplify the Python program gisaid_H5N1_mutations.py.The updated program is available on GitHub. The first 10 rows of the output CSV file are show below.
reference_position | alignment_pos | ref_freq | alt_freq | ref_nucleotide | codon_pos | ref_aa | aa_name | ref_codon | alt_nucleotide | alt_aa | alt_aa_name | alt_codon | mutation | synonomous |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 37 | 0.8040201005025126 | 0.1564070351758794 | G | 4 | E | Glu | GAG | A | K | Lys | AAG | E4K | non_syn |
6 | 39 | 0.7986809045226131 | 0.1727386934673367 | G | 4 | E | Glu | GAG | A | E | Glu | GAA | E4E | syn |
18 | 51 | 0.6105527638190955 | 0.37170226130653267 | A | 16 | L | Leu | CTA | T | L | Leu | CTT | L16L | syn |
31 | 64 | 0.7842336683417085 | 0.198178392 | G | 31 | V | Val | GTT | A | I | Ile | ATT | V31I | non_syn |
93 | 126 | 0.6526381909547738 | 0.3473618090452261 | A | 91 | Q | Gln | CAA | G | Q | Gln | CAG | Q91Q | syn |
123 | 156 | 0.7956972361809045 | 0.20100502512562815 | T | 121 | T | Thr | ACT | C | T | Thr | ACC | T121T | syn |
154 | 187 | 0.8285175879396985 | 0.17132537688442212 | A | 154 | T | Thr | ACA | G | A | Ala | GCA | T154A | non_syn |
171 | 206 | 0.5855841708542714 | 0.39777010050251255 | A | 169 | L | Leu | CTA | C | L | Leu | CTC | L169L | syn |
174 | 210 | 0.5860552763819096 | 0.41363065326633164 | C | 172 | C | Cys | TGC | T | C | Cys | TGT | C172C | syn |
177 | 213 | 0.5910804020100503 | 0.40891959798994976 | C | 175 | D | Asp | GAC | T | D | Asp | GAT | D175D | syn |
Luckily, again no new reported mutations that seem likely to increase the ability of the virus to invade human cells.
The code can be found on GitHub.
No comments:
Post a Comment