Hyperdimensional Computing

I write because I don't know what I think until I read what I say.
- Flannery O'Connor

I program because I don't understand a subject until I see the code that I have written.
- me

Artificial neural networks (ANNs) are sometimes described by analogy to biological neurons. However, their mode of operation is not similar to biological brains. It's impossible to deny the impressive success of NN models like chatGPT-3 and 4 or DALL-E-2. Unfortunately, training these models requires very large computing infrastructure available to only a few institutions. chatGPT was trained using Microsoft's Azure cloud system. OpenAI has not disclosed the length of time required to train GPT-3, complicating the researchers’ estimations². However, Microsoft has built supercomputers for AI training and says that its latest supercomputer contains 10,000 graphics cards and over 285,000 processor cores². The process of training ChatGPT involves a combination of data preparation, model design, and machine learning algorithms, and can take several days to several weeks depending on the size and quality of the training data and the computing resources available¹. 

The data requirements to train something like chatGPT-3 are equally impressively large. According to GPT Blogs, ChatGPT-3 was trained using datasets of text from the internet, totaling 570GB and 300 billion words. By contrast, the adult human brain weighs on average about 1.5 kg (3.3 lb)². The weight of the brain varies between men and women, with an average weight of about 1370 g in men and 1200 g in women². The brain is about 60% fat¹.  For the average adult in a resting state, the brain consumes about 20 percent of the body’s energy⁴. The brain's primary function - processing and transmitting information through electrical signals - is very expensive in terms of energy use⁴. An adult brain uses approximately 110 calories per pound, per day¹.  For an individual human, that's not much, but if you multiply by 8 billion humans a lot of calorie consumption is required to keep the world's human brains functioning.

Comparing the human brain to something like chatGPT is a category mistake. Large language models are nothing like brains and are not even close in terms of functionality.

Hyperdimensional computing (HDC)is an emerging learning paradigm that computes with high dimensional binary vectors. It combines very high-dimensional vector spaces (e.g. 10,000 dimensional) with a set of carefully designed operators to perform symbolic computations with large numerical vectors². HDC is attractive because of its energy efficiency and low latency, especially on emerging hardware¹. HDC is based on the observation that key aspects of human memory, perception and cognition can be explained by the mathematical properties of hyperdimensional spaces comprising high-dimensional binary vectors known as hypervectors³.

I got the previous information from Bing's AI chat bot

HDC is also called Vector-Symbolic Architecture (VSA).

In HDC, high dimension vectors can be used to represent words or concepts. Using simple operations, such as add, multiply, XOR, sign, and majority, these vectors can be combined in order to represent new concepts. The vectors are typically binary (0, 1) or bipolar (-1, 1) and can be stored and manipulated efficiently.

In what follows, I won't try to be efficient either in terms of memory or computing. I'll try to develop a few simple HDC operations and a couple of sample applications.

Hyperdimensional Vectors

A hyperdimensional vector (hdv) is simply a large vector. In our case, it will  contain bipolar contents and typically contain 10,000 elements. I will use numpy ndarrays to implement the vectors. 

What follows is inspired by Michiel Stock's GitHub tutorial and Pentti Kanerva's 2009 article Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors DOI 10.1007/s12559-009-9009-8. Both are well worth reading.

In HDC, concepts are assigned to random vectors. Creating a random high dimensional vector is simple.

def hdc(N: int = 10_000) -> np.ndarray:
    return np.random.choice([-1, 1], size = N)

x = hdv()

In [3]: x
Out[3]: array([ 1,  1,  1, ..., -1,  1, -1])

High Dimensions Can Be Surprising

High dimension vectors (hdvs) have useful properties. 

About half the elements of two random hdvs match.

x = hdv()
y = hdv()

sum(x == y)
Out[8]: 4906

Despite the fact that they have half of their elements in common, random high-D vectors are effectively mutually orthogonal. That is, the angle between two random hdvs is approximately 90 degrees. We can compute the similarity between two hdvs by calculating the cosine of the angle between the vectors.

\[\cos (x,y) = \frac{{x \bullet y}}{{\left\| x \right\|\left\| y \right\|}}\]

or in Python terms:

def cos(x: np.ndarray, y: np.ndarray) -> np.float64:
    return np.dot(x, y) / (np.linalg.norm(x) * np.linalg.norm(y))

The cosine varies between 1, identical vectors, and -1, antiparallel vectors. Orthogonal vectors have a cosine value of zero. Let's create some high and low dimension and see how the similarity varies.

import numpy as np
import matplotlib.pyplot as plt
from hdc import hdc, cos
import seaborn as sns
import pandas as pd

# compute a simularity matrix
def calc_hdv_distances(num_dimensions: int, num_samples: int = 1000) -> np.ndarray:
    hdvs = np.zeros(num_samples, dtype = object)
    for i in range(num_samples):
        hdvs[i] = hdv(N = num_dimensions)

    dist = np.zeros((num_samples, num_samples))
    for i in range(num_samples-1):
        dist[i, i] = 1
        for j in range(i+1, num_samples):
            dist[i, j] = cos(hdvs[i], hdvs[j])
            dist[j, i] = dist[i, j]

    return dist

def main():
    dist10 = calc_hdv_distances(10)         # 10 dim vectors
    dist10_000 = calc_hdv_distances(10_000) # 10,000 dim vectors

    # plot off diagonal elements
    df = pd.DataFrame({"dim: 10": np.ndarray.flatten(dist10), 
                       "dim: 10_000": np.ndarray.flatten(dist10_000)})
    sns.histplot(data = df, bins = 30)
    plt.xlabel("cos")
    plt.show()

    sns.heatmap(dist10)
    plt.title("dim: 10")
    plt.show()

    sns.heatmap(dist10_000)
    plt.title("dim: 10_000")
    plt.show()

if __name__ == "__main__":
    main()



The high-d vector similarities are clustered around cos ~ 0, while the low-d vector similarities are spread throughout the range.

We can see similar results in the heat maps.


The low-d inter-vector cosine values appear to be around -0.25 or 0.25.


When the vectors are 100 long, cosine values are slightly lower.
By contrast, the pairs of high-d vectors have cosine similarities of approximately zero, except for the comparisons of the vectors against themselves along the diagonal.

Operations on Hyperdimensional Vectors

There are three main operations that are important for HDC: bundling (addition, and sign, or majority for Boolean vectors), binding (multiplication, or XOR for Boolean vectors), and permutation (shifting).

Bundling is used to combine hdvs into a single hdv. The result is similar to the input hdvs. Binding combines two hdvs into a single hdv that is orthogonal to both. Shifting creates a new hdv from an existing one by a circular shift of  the elements.

The Python implementations are simple.

def bundle(x: np.ndarray, y: np.ndarray) -> np.ndarray:
    return np.sign(x + y)
    
def bind(x: np.ndarray, y: np.ndarray) -> np.ndarray:
    return x * y

def shift(x: np.ndarray, k: int = 1) -> np.ndarray:
    return np.roll(x, k)

Binding and shifting are reversible. Binding distributes over bundling.  Binding and shifting preserve similarity.

x = hdv()
y = hdv()
z = hdv()

all(bind(x, bundle(y, z)) == bundle(bind(x, y), bind(x, z)))
Out[12]: True

cos(x, y) == cos(bind(x, z), bind(y, z))
Out[15]: True

cos(x, y) == cos(shift(x), shift(y))
Out[16]: True

The above examples are from here.

What is the Dollar of Mexico?

This example is from Kanerva's paper

Humans are good at language. The question above doesn't make literal sense. It does make a figurative sense: what performs the same function in Mexico as the dollar does in the US?

Let's see how HDC might be used to answer this question.

First we generate some hdvs to represent the data.

U = hdv()  # USA
M = hdv()  # Mexico
D = hdv()  # dollar
P = hdv()  # peso
    
X = hdv()  # country
Y = hdv()  # currency

# records for US and Mexico combining the 
# individual currency and country
A = bundle(bind(X, U), bind(Y, D)) # US
B = bundle(bind(X, M), bind(Y, P)) # Mexico

We can ask what is the role of the dollar in the US. Binding the dollar and the US record extracts currency telling us that the dollar is the US currency.

# bind(D, A) = 
#     bundle(bind(D, bind(X, U)), bind(D, bind(Y, D))) = 
#     bundle(bind(D, bind(X, U)), Y) ~ 
#     Y

dollar_role_us = bind(D, A)  
print(cos(dollar_role_us, Y)) # dollar is currency of US
0.7107038764492565

The dollar is not currency of Mexico, but we can get what plays is similar role in Mexico by binding the dollar's role in the US with the Mexico record.

# bind(dollar_role_us, B) ~ P
dollar_of_mexico = bind(dollar_role_us, B)

print(cos(dollar_of_mexico, P))  # peso is dollar of mexico
0.5000999900019995
print(cos(dollar_of_mexico, D))  # dollar is not currency of Mexico
-0.014197160851716099

Protein Classification

This example comes from Michiel Stock. It's similar to examples of classifying human languages from here and here.

The data we will use is a FASTA file with 499 human protein sequences and 500 yeast sequences  downloaded from GitHub

For our purposes, we can think of proteins as strings of amino acids represented by letters.

amino_acids = ['A', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y']

To represent a sequence, we will break it into overlapping trimers. Each trimer will be coded as an hdv. We can store all possible 21^3 trimers in a dictionary.

def trimer_hdv(amino_acids: list[str], N: int = 10_000) -> dict[str, np.ndarray]:
    trimer_hdvs = dict()
    for aa1 in amino_acids:
        for aa2 in amino_acids:
            for aa3 in amino_acids:
                trimer_hdvs[aa1 + aa2 + aa3] = hdv(N = N)

    return trimer_hdvs 

trimer_hdvs = trimer_hdv(amino_acids)

Each sequence hdv will be a bundled collection of shifted amino acid trimers. For example, MGLVSS will be stored as the bundle of
MGL
    GLV
      LVS
         VSS
Since we want to encode each sequence and it's species, we'll simply add the trimer hdvs and take the sign. The sequence species will be stored in an array of strings.

def embed_sequences(seqs: list[SeqRecord], 
                    trimer_hdvs: dict[str, np.ndarray], 
                    N: int = 10_000) -> tuple[np.ndarray, np.ndarray]:
    hdvs = np.zeros((len(seqs), N))
    seq_types = []
    for idx, seq in enumerate(seqs):
        # bundle
        for pos in range(len(seq) - 2):
            hdvs[idx, :] += trimer_hdvs[seqs[idx].seq[pos:(pos+3)]]
        hdvs[idx, :] = np.sign(hdvs[idx, :])
        if seqs[idx].id.find("HUMAN") != -1:
            seq_types.append("HUMAN")
        else:
            seq_types.append("YEAST")

    return hdvs, np.array(seq_types, dtype = str)

hdvs, seq_types = embed_sequences(seq_recs, trimer_hdvs) # seq_recs is a list of BioPython SeqRecords

Note that summing followed by taking the sign is equivalent to bundling the hdvs.

After embedding, we split the hdvs into a training and a test set. This function returns the indices for each set.

def get_training_test_index(length: int, training_pct: float = 0.8) -> tuple[list, list]:
    idx = np.random.choice(length, size = int(np.rint(training_pct * length)), replace = False)
    mask=np.full(length, True, dtype = bool)
    mask[idx] = False

    ids = np.array(list(range(length)), dtype = int)

    return list(ids[~mask]), list(ids[mask])

training_idx, test_idx = get_training_test_index(hdvs.shape[0])

From the training set for each species, we will create a prototype hdv by bundling the sequences.

def prototype(hdvs: np.ndarray, idx: list) -> np.ndarray:
    return np.sign(np.sum(hdvs[idx], axis = 0))

human_training_idx = [i for i in training_idx if seq_types[i] == "HUMAN"]
human_prototype = prototype(hdvs, human_training_idx)
yeast_training_idx = [i for i in training_idx if seq_types[i] == "YEAST"]
yeast_prototype = prototype(hdvs, yeast_training_idx)

We can make predictions of the test set by calculating the cosine simularity to the human and yeast prototypes.

def prediction(human_prototype: np.ndarray,
               yeast_prototype: np.ndarray,
               x: np.ndarray) -> str:
    return "HUMAN" if cos(human_prototype, x) > cos(yeast_prototype, x) else "YEAST"

predictions = np.array([prediction(human_prototype, yeast_prototype, x) for x in hdvs[test_idx]], dtype = str)
print(np.mean(predictions == seq_types[test_idx]))

How well does it work?

$ time python src/proteins_hdc.py
0.845

real    0m7.568s
user    0m7.319s
sys     0m0.270s

 Not too bad for such a simple model. To improve performance, we could try some form of retraining for the misclassified sequences or some of the incremental improvements described here.

All of the code is available on GitHub. The sequence data is available here.

====================================================
The introductory paragraphs were initially generated by Bing's AI chat bot and edited by me. Copying from an LLM is not the same as writing.

====================================================

Source: Conversation with Bing, 4/27/2023

(1) An Introduction to Hyperdimensional Computing for Robotics. https://link.springer.com/article/10.1007/s13218-019-00623-z.

(2) [2202.04805] Understanding Hyperdimensional Computing for Parallel .... https://arxiv.org/abs/2202.04805.

(3) In-memory hyperdimensional computing | Nature Electronics. https://www.nature.com/articles/s41928-020-0410-3.

(4) Fulfilling Brain-inspired Hyperdimensional Computing with In ... - IBM. https://www.ibm.com/blogs/research/2020/06/in-memory-hyperdimensional-computing/.

=====================================================

Source: Conversation with Bing, 4/27/2023

(1) How Much Energy Does the Brain Use? - BrainFacts. https://www.brainfacts.org/Brain-Anatomy-and-Function/Anatomy/2019/How-Much-Energy-Does-the-Brain-Use-020119.

(2) How many calories does the brain consume? | Calories - Sharecare. https://www.sharecare.com/health/calories/brain-calories-at-rest.

(3) Power of a Human Brain - The Physics Factbook - hypertext-book. https://hypertextbook.com/facts/2001/JacquelineLing.shtml.

(4) Does Thinking Really Hard Burn More Calories? - Scientific American. https://www.scientificamerican.com/article/thinking-hard-calories/.

(5) We finally know why the brain uses so much energy. https://www.livescience.com/why-does-the-brain-use-so-much-energy.

=====================================================

Source: Conversation with Bing, 4/27/2023

(1) Brain size - Wikipedia. https://en.wikipedia.org/wiki/Brain_size.

(2) Brain Anatomy and How the Brain Works | Johns Hopkins Medicine. https://www.hopkinsmedicine.org/health/conditions-and-diseases/anatomy-of-the-brain.

(3) How Big Is a Human Brain? Brain Size and Brain Weight - Verywell Mind. https://www.verywellmind.com/how-big-is-the-brain-2794888.

(4) Human brain - Wikipedia. https://en.wikipedia.org/wiki/Human_brain.

=====================================================

Source: Conversation with Bing, 4/27/2023

(1) Training ChatGPT AI Required 185,000 Gallons of Water: Study. https://gizmodo.com/chatgpt-ai-water-185000-gallons-training-nuclear-1850324249.

(2) What are GPT-3 and ChatGPT by OpenAI? How do they work?. https://blog.illacloud.com/what-are-gpt-3-and-chatgpt-by-openai-how-do-they-work/.

(3) Introducing ChatGPT - openai.com. https://openai.com/blog/chatgpt.

(4) How many days did it take to train GPT-3? Is training a neural ... - Reddit. https://www.reddit.com/r/GPT3/comments/p1xf10/how_many_days_did_it_take_to_train_gpt3_is/.

(5) Product - OpenAI. https://openai.com/product. 

No comments:

Post a Comment