Posts

Showing posts from April, 2023

High Dimensional Vectors

Image
Hyperdimensional  computing recently made the news . This article gives some examples of hyper vector databases. In a previous post , we looked at hyperdimensional computing (HDC) using high dimensional vectors. As we saw, orthogonality is an important property of high-d vectors. We said that two random (-1, 1) high-d vectors were orthogonal. What do we really mean by that? In these lecture notes , Kothari and Arora show the following result, \[P\left( {\left| {\cos ({\theta _{x,y}})} \right| > \sqrt {\frac{{\log (c)}}{N}} } \right) < \frac{1}{c}\]. What this is saying is that probability that the absolute value of the cosine of the angle between randomly generated high-d vectors is greater that a simple function of N and c . N is the dimension of the random vectors. c is an arbitrary constant. We can choose c to adjust the probability. What is a good choice for c ?  If we choose $c = {e^{0.01N}}$, then $\sqrt {\frac{{\log (c)}}{N}}  = 0.1$. In other ...

Hyperdimensional Computing

Image
I write because I don't know what I think until I read what I say. - Flannery O'Connor I program because I don't understand a subject until I see the code that I have written. - me Artificial neural networks (ANNs) are sometimes described by analogy to biological neurons. However, their mode of operation is not similar to biological brains. It's impossible to deny the impressive success of NN models like chatGPT-3 and 4 or DALL-E-2. Unfortunately, training these models requires very large computing infrastructure available to only a few institutions. chatGPT was trained using Microsoft's Azure cloud system. OpenAI has not disclosed the length of time required to train GPT-3, complicating the researchers’ estimations². However, Microsoft has built supercomputers for AI training and says that its latest supercomputer contains 10,000 graphics cards and over 285,000 processor cores². The process of training ChatGPT involves a combination of data preparation, model des...

Emerging Variants for April

Image
 It's been a while since I looked at this data. I was updating my version of R and decided this would be a good exercise to test it and also to see what Sars-COV-2 variants were on the rise. I downloaded metadata from GISAID on April 20, 2023. The file contained records for 15,398,189 sequences. I took Pango lineages for the top four emerging and used the method described here to plot the accumulation of sequences for the variants. system.time(df_variants_usa <- get_selected_variants_ch( "data/metadata.tsv" , selected_variants = selected_variants, title = "Selected Variants USA 2023-04-20" , start_date = as.Date( "2022-09-01" ))) Here's the plot. You can find the code to produce the plot here .