Hyperdimensional computing recently made the news. This article gives some examples of hyper vector databases.
In a previous post, we looked at hyperdimensional computing (HDC) using high dimensional vectors. As we saw, orthogonality is an important property of high-d vectors. We said that two random (-1, 1) high-d vectors were orthogonal. What do we really mean by that?
In these lecture notes, Kothari and Arora show the following result,
\[P\left( {\left| {\cos ({\theta _{x,y}})} \right| > \sqrt {\frac{{\log (c)}}{N}} } \right) < \frac{1}{c}\].
What this is saying is that probability that the absolute value of the cosine of the angle between randomly generated high-d vectors is greater that a simple function of N and c. N is the dimension of the random vectors. c is an arbitrary constant. We can choose c to adjust the probability.
What is a good choice for c?
If we choose $c = {e^{0.01N}}$, then $\sqrt {\frac{{\log (c)}}{N}} = 0.1$.
In other words,
\[P\left( {\left| {\cos ({\theta _{x,y}})} \right| > 0.1} \right) < {e^{ - 0.01N}}\]
If N = 10,000, ${e^{ - 0.01N}}$ is a very small number, 3.7e-44. . The cosine of the angle between randomly chosen high d vectors is only greater than 0.1 with a very small probability.
What angle has a cosine of 0.1?
import numpy as np np.arccos(0.1) * 180/np.pi Out[159]: 84.26082952273322
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt def cos_prob(N, t = 0.1): c = np.exp(N * t**2) return N, 1/c, np.sqrt(c)/2 columns = ["N", "prob", "m"] df = pd.DataFrame([cos_prob(n) for n in np.array(list(range(1, 11))) * 1000], columns = columns) print(df) sns.lineplot(x = df["N"], y = np.log(df["prob"])) plt.ylabel("log prob") plt.show()
No comments:
Post a Comment