The Analytic Garden: High Dimensional Vectors

Hyperdimensional computing recently made the news. This article gives some examples of hyper vector databases.

In a previous post, we looked at hyperdimensional computing (HDC) using high dimensional vectors. As we saw, orthogonality is an important property of high-d vectors. We said that two random (-1, 1) high-d vectors were orthogonal. What do we really mean by that?

In these lecture notes, Kothari and Arora show the following result,

\[P\left( {\left| {\cos ({\theta _{x,y}})} \right| > \sqrt {\frac{{\log (c)}}{N}} } \right) < \frac{1}{c}\].

What this is saying is that probability that the absolute value of the cosine of the angle between randomly generated high-d vectors is greater that a simple function of N and c. N is the dimension of the random vectors. c is an arbitrary constant. We can choose c to adjust the probability.

What is a good choice for c?

If we choose $c = {e^{0.01N}}$, then $\sqrt {\frac{{\log (c)}}{N}} = 0.1$.

In other words,

\[P\left( {\left| {\cos ({\theta _{x,y}})} \right| > 0.1} \right) < {e^{ - 0.01N}}\]

If N = 10,000, ${e^{ - 0.01N}}$ is a very small number, 3.7e-44. . The cosine of the angle between randomly chosen high d vectors is only greater than 0.1 with a very small probability.

What angle has a cosine of 0.1?

import numpy as np

np.arccos(0.1) * 180/np.pi
Out[159]: 84.26082952273322

Two randomly 10,000 dimension vectors have a vanishingly small probability of being outside of the range -84.3 to 84.2 degrees. They are orthogonal.

We could safely choose $\frac{{\sqrt c }}{2}$ random vectors and be confident that they are mutually orthogonal.

Let's see how varying N affects the probability that the cos of the angle between two high-d vectors is greater than 0.1.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

def cos_prob(N, t = 0.1):
    c = np.exp(N * t**2)
    return N, 1/c, np.sqrt(c)/2

columns = ["N", "prob", "m"]
df = pd.DataFrame([cos_prob(n) for n in np.array(list(range(1, 11))) * 1000], columns = columns)

print(df)

sns.lineplot(x = df["N"], y = np.log(df["prob"]))
plt.ylabel("log prob")
plt.show()

N prob m

0 1000 4.539993e-05 7.420658e+01

1 2000 2.061154e-09 1.101323e+04

2 3000 9.357623e-14 1.634509e+06

3 4000 4.248354e-18 2.425826e+08

4 5000 1.928750e-22 3.600245e+10

5 6000 8.756511e-27 5.343237e+12

6 7000 3.975450e-31 7.930067e+14

7 8000 1.804851e-35 1.176926e+17

8 9000 8.194013e-40 1.746714e+19

9 10000 3.720076e-44 2.592353e+21

For N = 1000, the probability of the cosine of the angle between randomly chosen vectors being greater than 0.1 is 4.5e-05, but we only feel safe choosing 74 vectors. At N = 2000, the probability of being outside the range is 2e-09 and we can be confident in choosing over a million random vectors.

The log probability of being out side of the range -84.3 to 84.2 decreases linearly with increasing N.

We can adjust the maximum angle also. If we choose, $c = {e^{\frac{N}{4}}}$, $\sqrt {\frac{{\log (c)}}{N}} = 0.5$ or about 60 degrees, still orthogonal.

The Analytic Garden

High Dimensional Vectors

No comments:

Post a Comment

Labels

Contributors

wfmu