Boltz-1 is pretty cool. Boltz-1 is an open-source deep learning model for predicting biomolecular structures based on their sequences. According to the developers, Boltz-1 achieves AlphaFold3 level accuracy. They have released training and inference code, model weights, datasets, and benchmarks under the MIT open license. They're democratizing biomolecular modeling. You can read the introductory paper here and a press article about it here.
I just downloaded Boltz-1 two days ago, so this will not be an in depth look into Boltz-1. Maybe that will come later. Right now, I just wanted to try it out.
Downloading and installing Boltz-1 was easy; clone the GitHub repo and you are ready to go.
I used the reference H5N1 HA amino acid sequence that I used for this post. I extracted the sequence from the GenBank file.
#!/usr/bin/env python # -*- coding: utf-8 -*- """ aa_from_gb.py - extract the amino acid sequence from a GenBank file author: Bill Thompson license: GPL 3 copyright: 2025-01-09 """ from Bio import SeqIO from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq def main(): gb_file = "/home/bill/boltz/boltz/H5N1/data/HA_reference.gb" aa_file = "/home/bill/boltz/boltz/H5N1/data/HA_reference.fasta" with open(gb_file, "r") as handle: for record in SeqIO.parse(handle, "genbank"): # Each 'record' in this loop is a full GenBank record for feature in record.features: # We look for the 'CDS' feature, which usually contains the protein translation if feature.type == "CDS": # Check if the 'translation' qualifier is present if "translation" in feature.qualifiers: protein_seq = feature.qualifiers["translation"][0] protein_id = "A|protein" # use Boltz format # make a record for the amino acid sequence record = SeqRecord(Seq(protein_seq), id = protein_id, description = '') SeqIO.write(record, aa_file, 'fasta') if __name__ == "__main__": main()
$ time boltz predict H5N1/data/HA_reference.fasta --use_msa_server --num_workers 12 --output_format pdb Checking input data. Running predictions for 1 structure Processing input data. 0%| | 0/1 [00:00<?, ?it/s]Generating MSA for H5N1/data/HA_reference.fasta with 1 protein entities. COMPLETE: 100%|████████████████████████████████████████████████| 150/150 [elapsed: 00:02 remaining: 00:00] 100%|███████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.48s/it] GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs /home/bill/miniforge3/envs/boltz/lib/python3.13/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Predicting DataLoader 0: 100%|████████████████████████████████████████████| 1/1 [3:28:45<00:00, 0.00it/s]Number of failed examples: 0 Predicting DataLoader 0: 100%|████████████████████████████████████████████| 1/1 [3:28:45<00:00, 0.00it/s] [2]+ Done emacs examples/prot.fasta &> /dev/null real 211m4.420s user 83m39.415s sys 112m0.400s
No comments:
Post a Comment