Posts

Showing posts from July, 2023

Protein Classification Using Compression

 A few days ago I saw this post on Hacker News,  Ziplm: Gzip-Backed Language Model . It described ziplm , a simple language model built using lossless compression. The Hacker news article led me to this paper, “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors by Jiang et al. This paper describes a text classification model based on compression methods such as gzip . The authors claim that the results are competitive with non-pretrained deep learning methods on some datasets. It even outperforms BERT in tests.  The most interesting feature of the paper is that it contains Python code implementing the method in fourteen lines of code. When compared to the code of a deep learning language model like BERT, that is some impressive compression. Using a lossless compression method like gzip as a language model makes sense. Lossless compression implements a simple form of " understanding " of the text. It tries to reference ...