# An Introduction to Hyperdimensional (HD) Computing

Rishikanth

Over the last few years a new paradigm of computing called Hyperdimensional (HD) Computing (we will use HD and HDC interchangeably) has been gaining popularity as a lightweight alternative for Machine Learning and various reasoning problems. A major part of my PhD research is to explore HD computing to enable lightweight Machine Learning applications. When I was getting started with this topic I was searching for a dumbed down explanation of HDC but couldn’t find any online. So I decided to write a couple articles to explain HDC in a simple manner with examples. I hope this article helps you understand HDC and allows you to better understand technical papers related to HDC.

Hyperdimensional (HD) computing or HDC is a brain-inspired computing paradigm that falls under the class of methods known as Vector Symbolic Architectures (VSA). Simply put, VSA methods deal with computations on large vectors (think 10,000) via a set of mathematical operations. HD computing operates on a vector space defined by high-dimensional sparse low-precision vectors via a set of carefully chosen operations on these vectors. An example of such a vector space is $$\{1, 0\}^D$$ where $$D = 10000$$. This paradigm of computing enjoys several salient features, such as high parallelizability, simple operations which can easily be accelerated, robustness to noise and low-precision (often binary) computing.

In this multi-part post series, I’ll detail various aspects of HD computing along with how to use HDC for Machine Learning applications. This first part provides a gentle introduction to HDC and touches upon the fundamental aspects of it.

## Origins of Hyperdimensional computing

HD computing draws inspiration from neuroscience based on biological computing models observed in various organisms. One of the most significant discoveries in the intersection of neuroscience and machine learning is the observation that biological organisms utilize sparse high-dimensional distributed representations as the data structure for reasoning.

A specific example is the olfactory system of the fruit-flies which uses a random linear mapping $$\phi(x)$$ to project a low dimensional input $$x \in \mathbb{R}^n$$ onto a point in high-dimensional space $$\phi(x) \in \mathbb{R}^d$$. Specifically in the fruit-fly, the olfactory input is a vector of dimension 50 $$(n = 50)$$ which is mapped to a 2000 dimension $$(d = 2000)$$ representation. This pattern has been observed in several other organisms which is a motivation behind VSA methods.

## Fundamentals of HD computing

HD computing is an alternate form of representation of data. While traditional computing paradigms represent data using 8-64 bit words, HD computing represents data as large vectors called hypervectors (For eg. 10000 bit word). Each attirbute, entity or variable is represented using these hypervectors. So why would such a representation be advantageous for us ? Let’s go through each characteristic one by one. As mentioned above, HD representations are high-dimensional, sparse, low-precision and distributed.

• low-precision: Each value of the hypervector is of very low-precision. In literature most hypervectors use $$\{-1, 1\}$$ or $$\{0, 1\}$$. Or values are typically quantized to arbitrary precision. This allows us to reduce storage requirements. Using binary representations allows us to significantly accelerate operations as bit-wise operations are faster compared to floating point or integer math.
• sparse: This is a characteristic of binary hypervectors. Usually at least 50% of the values are 0 which means we can again compress storage.
• high-dimensional: The high-dimensionality of hypervectors implicitly introduces redundancy of information. This allows us to tolerate noise on our representations as we can still recover useful information due to redundancy.
• distributed: This characteristic implies that the information is uniformly distributed across the entire hypervector. In contrast, conventional data representations such as integers use a more structured representaiton. For eg. for integers each bit represents a specific power of 2. The advantage is that if we were to loose a portion of the vector due to noise of any kind the effects are not catastrophic since no single value of the vector stores more information than the others. This is not true for integer or floating point. For eg. the number 117 is represented as 0b1110101. Now if we were to loose 25% information (2 bits are unknown - let’s replace them with 0) leaving us with 0b1110000 our value would change to 112. Now if we are to loose the first 2 bits instead, our value changes to 0b0010101 the value changes to 21 which is significantly different from 117. Thus, loosing the MSB leads to more catastrophic data loss. HD representation on the other hand doesn’t suffer from this as no particular bit holds any more information than the other as data is distributed uniformly along the entire vector allowing us to recover information even if we loose some portion of it due to noise.

Thus the high-dimensionality along with the distributed nature of the hypervectors makes HD representaitons highly robust to noise. In practice what this means is that we can significantly accelerate computing by working with approximate low-precision representations which are much cheaper compared to say floating point representations. Another nice benefit of having distributed representations is that each bit can be processed in parallel allowing for significant paralellization benefits.

An important property that HD relies on is the orthogonality of randomly sampled high-dimensional vectors. This means that if you were to randomly generate 2 vectors of very large dimensions, they are bound to be nearly orthogonal. Let’s do a simple experiment to prove this. We randomly sample 2 vectors and measure the hamming distance between these vectors and repeat the process a 1000 times. We use hamming distance as the metric because HD vectors are typically binary or ternary in nature. Below is the plot of the distribution of distances between any 2 randomly generated hypervectors. As you can see the distribution is concentrated around 0.48 - 0.51 with a peak at 0.50. This means that if we use two random hypervectors to represent entities, they differ by almost 5000 bits ! If we take a third random vector it differs from the first 2 by approx 5000 bits ! This means that random hypervectors are unrelated.

## Hyperdimensional operations

Having discussed the salient aspects of HD let’s now look at how to actually perform computing on these representations. Let’s say you have 3 variables and their corresponding values which you want to represent using HD.

Variables: Name, Country, Occupation
Values: Rishi, India, Student


The goal is to use HD to represent the variables and associate them with their corresponding values. The process of representing a data-structure using HD is often referred to as encoding in HD literature. The encoding function transforms a given data structure into it’s equivalent HD representation. Let’s walk through how this is done. Every variable or value is represented using a hypervector. We combine these different hypervectors systematically to represent data as a whole.

### Generating hypervectors

Like mentioned above, every single entity is represented by a hypervector. For every single unique entity, we simply sample a random hypervector and assign it. In our above example, each of the variables and values will be a randomly generated hypervector. For eg. the variable “Name” will be represented by H1, “Country” by H2. The value “Rishi” will be represented by H3 and so on. where H1, H2, H3 are all randomly sampled hypervectors. These representations are unique because each random hypervector is dissimilar from each other as demonstrated in the previous section. We essentially are never going to run out of random hypervectors to represent data !!

Depending on the VSA type you use there will be a specific distribution we will sample the hypervector from. For this article we will use the simplest of all which is just randomly sampling from a bernoulli distribution creating a hypervector whose elements are $$\{-1, 1\}$$. This can be done in code as follows:

import numpy as np

hypervec = 2 * np.random.binomial(1, 0.5, 10000) - 1
print(hypervec[:10])


output

[ 1  1 -1 -1 -1 -1 -1  1 -1 -1]


### Similarity metric

A lot of the computations of HD computing relies on measuring a notion of similarity between 2 hypervectors. For binary hypervectors, the similarity metric is usually the hamming distance. The hamming distance measure how many bits are different between 2 vectors. For eg. for 2 vectors \begin{aligned} A = [1, 0, 1, 1], B &= [0, 1, 1, 1] \\ \Gamma(A, B) = Hamming(A, B) &= 2 \end{aligned}

Another possible and very commonly used metric is the similarity similarity which measures the angular distance between 2 vectors. Mathemtiaclly it’s given by: \begin{aligned} \gamma(A, B) = Cosine(A, B) = \frac{A.B}{|A||B|} \end{aligned}

For the rest of this article we will be using the cosine similarity. The cosine similarity gives a value between $$[0,1]$$ with 1 indicating maximum similairty and 0 indicating the hypervectors are orthogonal.

### Binding

The binding operation is used to ‘bind’ or join items together such as assigning a value to a variable. Simply put the binding operation is the equivalent of assigning Name = Rishi. For ternary $$(\{-1 , 1\})$$ hypervectors, the binding operation is simply the elementiwse product of the 2 hypervectors. Continuing with our example, Name -> H1 and Rishi -> H3. To assign Name = Rishi we bind H1 and H2 with each other. Binding 2 hypervectors results in a new hypervector $$H_{bind}$$ that is dissimilar to both the origianl hypervectors. Let’s denote the set or space of hypervectors by $$\mathcal{H}$$. We can define the binding operation as: \begin{aligned} \bigotimes: \mathcal{H} \times \mathcal{H} \rightarrow \mathcal{H} \end{aligned}

where the $$\bigotimes$$ is a mapping over two hypervectors producing a third hypervector that is dissimilar to both the input hypervectors.

import numpy as np

def similarity(A, B):
dot = np.dot(A, B)
sim_score = dot / np.linalg.norm(A) / np.linalg.norm(B)

return sim_score

N = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Name
R = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Rishi

assign_R_to_N = N * R #binding Rishi to Name

print("distance between assign_R_to_N and N: ", similarity(assign_R_to_N, N))
print("distance between assign_R_to_N and R: ", similarity(assign_R_to_N, R))


Output:

distance between assign_R_to_N and N:  -0.009399999999999999
distance between assign_R_to_N and R:  -0.0078000000000000005


As you can see by the similarity measure, the bound vector is orthogonal to both N and R hypervectors. The binding operation is both commutative and assosicative. It is also distributive over bundling.

### Bundling

The bundling operation on the other hand takes as input 2 hypervectors and produces a third hypervector which is similar to both the input hypervectors. The bundling operator is a means of aggregating multiple entities into a single hypervector. For ternary hypervectors the bundling operator is simply the elementwise sum of the hypervectors. Mathemtically it’s represented as: \begin{aligned} \bigoplus: \mathcal{H} \times \mathcal{H} \rightarrow \mathcal{H} \end{aligned}

bundle_R_N = np.sign(N + R)

print("distance between bundle_R_N and N: ", similarity(bundle_R_N, N))
print("distance between bundle_R_N and R: ", similarity(bundle_R_N, R))


Output:

distance between bundle_R_N and N:  0.7073895673530959
distance between bundle_R_N and R:  0.7073895673530959


### Unbinding

The binding operator has an inverse operation call unbinding. Given a hypervector $$H_{bind}$$ that binds 2 hypervectors $$H_1, H_2$$, we would like to be able to retrieve the original hypervectors $$H_1$$ or $$H_2$$. We can do this using the unbinding operation. For this VSA type we are currently using, the unbinding operator is the same as the binding operator.. Meaning the binding operation $$*$$ is its self-inverse. So $$H_1 * H_1$$ yields an identity vector $$I$$. Since we are basically taking the elementwise square by multiplying with itself, the values all become 1 since our hypervector is $$\{-1, 1\}^d$$. We can quickly check this:

hypervec = 2 * np.random.binomial(1, 0.5, 10000) - 1
print((hypervec * hypervec)[:10])


output:

[1 1 1 1 1 1 1 1 1 1]


Let’s say we want to retrieve $$H_1$$ from $$H_{bind}$$. We unbind $$H_2$$ from $$H_{bind}$$ which gives us $$H_1$$. How does this work ? \begin{aligned} H_{bind} &= H_1 \bigotimes H_2 \\ H_{bind} \bigotimes H_2 &= H_2 \bigotimes H_1 \bigotimes H_2 \\ H_{bind} \bigotimes H_2 &= H_1 \bigotimes H_2 \bigotimes H_2 \\ H_{bind} \bigotimes H_2 &= H_1 \bigotimes I\\ H_{bind} \bigotimes H_2 &= H_1 \end{aligned}

Let’s check this in code now:

h1 = 2 * np.random.binomial(1, 0.5, 10000) - 1
h2 = 2 * np.random.binomial(1, 0.5, 10000) - 1
h_bind = h1 * h2

unbind_h1 = h_bind * h2
print(similarity(unbind_h1, h1))
print(similarity(unbind_h1, h2))


output

1.0
-0.008


Now if we unbind $$H_1$$ from $$H_{bind}$$, we can see using our distance metric that $$H_1$$ and $$unbind_\{h1\}$$ are highly similar. We have successfully retrieved $$H_1$$ from $$H_{bind}$$.

## An example of storing and retrieving data

Let’s go back to the original problem of storing records. Each record has the name of the person, country of origin and their occupation. Let’s say we want to store records of 2 different people as shown below:

1. Rishi, India, Student
2. Chandler, Scotland, Transponster


We want to be able to retrieve records and the information in it. For eg. if I were to query what is the occupation of Chandler, I want to be able to retrieve the answer as Transponster. Let’s step through how to do this with HD representations.

Each record has 3 variables. For creating a record, we assign the values to their corresponding variables. First lets represent each of the variables and values with hypervectors.

variables = {}
variables['N'] = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Name
variables['C'] = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Country
variables['O'] = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Occupation

names = {}
name['Rishi'] = 2 * np.random.binomial(1, 0.5, 10000) - 1
name['Chandler'] = 2 * np.random.binomial(1, 0.5, 10000) - 1

countries = {}
countries['India'] = 2 * np.random.binomial(1, 0.5, 10000) - 1
countries['USA'] = 2 * np.random.binomial(1, 0.5, 10000) - 1

occupations = {}
occupations['Student'] = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Rishi
occupations['Transponster'] = 2 * np.random.binomial(1, 0.5, 10000) - 1 # Rishi

#Sanity check
print(similarity(occupations['Student'], countries['India']))


output:

0.0068000000000000005


We have now represented each entity with a unique hypervecvtor. Now let’s build a record. We first bind each attribute to its value. We bind together three different pairs of hypervectors one each for assosicating name, country and occupation. This gives us 3 different hypervectors each representing one association for name, country and occupation. We bundle these 3 hypervectors to create a single hypervector that represents this record. Woahh that was a bit fast. Let’s check what’s going on here. Programmatically this can be done as below:

record_1_name = variables['N'] *  names['Rishi']
record_1_country = variables['C'] *  countries['India']
record_1_occupation = variables['O'] * occupations['Student']

record_1 = np.sign(record_1_name + record_1_country + record_1_occupation)


Note: After bundling we take the sign function to quantize our hypervectors back to $$\{-1, 1\}$$.

\begin{aligned} record_{name} &= N \bigotimes rishi\\ record_{country} &= C \bigotimes india\\ record_{occupation} &= O \bigotimes student\\ record_1 &= record_{name} \bigoplus record_{country} \bigoplus record_{occupation}\\ record_1 &= (N \bigotimes rishi) \bigoplus (C \bigotimes india) \bigoplus (O \bigotimes student) \end{aligned}

So if I want to find out the name of the person in record_1 how do we get it ?? Here’s where things get really cool. Essentially we created assosciations between the fields: name, country, occupation with their corresponding values: rishi, india, student. This created 3 different hypervectors $$record_{name}, record_{country}, record_{occupation}$$. We bundled or aggregated them together into a single hypervector $$record_1$$ which is similar to each of the 3 hypervectors.

Suppose we want to find the name of the person in $$record_1$$ we use the unbinding operator to retrieve the name. That would work like this:

\begin{aligned} record_1 \bigotimes N &= N \bigotimes\\ & \left( (N \bigotimes rishi) \bigoplus (C \bigotimes india) \bigoplus (O \bigotimes student) \right) \\ \end{aligned}

Recall that the binding operator is distributive over bundling. Hence we can re-write the above equation as

\begin{aligned} record1_{name} &= record_1 \bigotimes N\\ &= (N \bigotimes N \bigotimes rishi) \bigoplus (N \bigotimes C \bigotimes india)\\ & \bigoplus (N \bigotimes O \bigotimes student) \\ &= (I \bigotimes rishi) \bigoplus (N \bigotimes C \bigotimes india) \bigoplus\\ & (N \bigotimes O \bigotimes student) \\ &= rishi \bigoplus (N \bigotimes record_{country}) \bigoplus (N \bigotimes record_{occupation})\\ &= rishi \bigoplus noise_1 \bigoplus noise_2 \end{aligned}

So the $$N$$ hypervector is distributively bound with each of the 3 hypervectors that were bundled together. This reduces the first term to just the name $$rishi$$ as $$N$$ binds with itself producing the identity $$I$$ vector. The other 2 terms bind with $$N$$ to produce 2 noise terms. This noise hypervector carries no semantic meaning and is simply a hypervector that is dissimilar to $$record_{country}$$ and $$record_{occupation}$$. The final result is the actual name of the person we want bundled with two noise terms. Remember that bundling preserves the similarity between the original hypervectors. So it carries the information we want (which is the name of the person). However we still don’t know what the name is specifically. All we need to do now is to do a search. We compare the name we retrieved with each of the name hypervectors we had stored and output the name which has the maximum similarity with our retrieved hypervector $$record1_{name}$$. This process is referred to in literature as assosicative search

The entire process can be implemented in code as follows:

record1_name = record_1 * variables['N']

retrieved_name = None
max_sim = float("-inf")
for n in names.keys():
sim_score = similarity(names[n], record1_name)
if sim_score > max_sim:
max_sim = sim_score
retrieved_name = n

print("the name of the person 1 is: ", retrieved_name)


output

the name of the person 1 is:  Rishi


That was quite a bit we covered. Please take the time to re-read the above and try implementing it yourself in code to get a better understanding of what is going on here. You might think that this is a lot of work to simply store 3 variables and retrieve them. However, keep in mind that so far we have only been operating on ternary vectors which requires just 2 bits per dimension and all the operations are bit-wise operations which can be done in parallel. Since each of the operations operate elementwise with no dependency on neighboring bits or values, we can easily accelerate the process by computing them in parallel. Moreover, while I expand the derivation to explain the entire process simplies to a couple lines of code.

## Storing and retrieving multiple records

We saw a simple example of how to store and retrieve a single record. Now let’s look at how to store multiple records and how to retrieve a specific record. Continuing with our original example lets now store the record for Chandler. We again use the same process to create a record for chandler.

record_2_name = variables['N'] *  names['Chandler']
record_2_country = variables['C'] *  countries['USA']
record_2_occupation = variables['O'] * occupations['Transponster']

record_2 = np.sign(record_2_name + record_2_country + record_2_occupation)


\begin{aligned} record2_{name} &= N \bigotimes chandler\\ record2_{country} &= C \bigotimes usa\\ record2_{occupation} &= O \bigotimes transponster\\ record_2 &= record_{name} \bigoplus record_{country} \bigoplus record_{occupation}\\ record_2 &= (N \bigotimes chandler) \bigoplus (C \bigotimes usa)\\ & \bigoplus (O \bigotimes transponster) \end{aligned}

To store both the records $$record_1$$ and $$record_2$$ we simply bundle the two together. This way we preserve the similarity to both records at the same time allowing us to retrieve any record we want.

records = np.sign(record_1 + record_2)


\begin{aligned} records = record_1 \bigoplus record_2 \end{aligned}

Now say we want to find out Chandler’s occupation, we first retrieve the record corresponding to Chandler’s then from this record we retrieve his occupation. Let’s see how this is done. To first retrieve Chandler’s record we follow a similar procedure as above.

\begin{aligned} ref_{chandler} &= N \bigotimes chandler \\ record_{chandler} &= ref_{chandler} \bigotimes records\\ &= ref_{chandler} \bigotimes \left( record_1 \bigoplus record_2 \right) \\ &= \left(ref_{chandler} \bigotimes record_1\right) \bigoplus \\ & \left(ref_{chandler} \bigotimes N \bigotimes chandler \right) \\ &\qquad \bigoplus \left(ref_{chandler} \bigotimes record2_{country}\right)\\ &\qquad \bigoplus \left(ref_{chandler} \bigotimes record2_{occupation}\right) \\ &= \left(ref_{chandler} \bigoplus record_1 \right) \bigoplus\\ & \left(N \bigotimes chandler \bigotimes N \bigotimes chandler \right)\\ &\qquad \bigoplus \left(N \bigotimes chandler \bigotimes C \bigotimes usa \right) \bigoplus\\ &\left(N \bigotimes chandler \bigotimes O \bigotimes transponster \right)\\ &= noise \bigoplus \left(N \bigotimes N \bigotimes chandler \bigotimes chandler \right)\\ &\qquad \bigoplus \left(N \bigotimes chandler \bigotimes C \bigotimes usa \right) \bigoplus\\ &\left(N \bigotimes chandler \bigotimes O \bigotimes transponster \right)\\ &= noise \bigoplus I \bigoplus \left(N \bigotimes chandler \bigotimes C \bigotimes usa \right) \bigoplus\\ &\left(N \bigotimes chandler \bigotimes O \bigotimes transponster \right)\\ &= noise \bigoplus \left(N \bigotimes chandler \bigotimes C \bigotimes usa \right) \bigoplus\\ &\left(N \bigotimes chandler \bigotimes O \bigotimes transponster \right)\\ \end{aligned}

So now we have retrieved the record corresponding to Chandler’s. Next we want to find out the occupation of chandler. We again do this by unbinding as below:

\begin{aligned} \begin{split} ref_{chandler} &= N \bigotimes chandler \bigotimes O\\ chandler_{occupation} &= ref_{chandler} \bigotimes record_{chandler}\\ &= \left(N \bigotimes chandler \bigotimes O\right) \bigotimes\\ &\qquad \left(noise \bigoplus \left(N \bigotimes chandler \bigotimes C \bigotimes usa \right)\\ \bigoplus \left(N \bigotimes chandler \bigotimes O \bigotimes transponster \right)\right)\\ &= noise \bigoplus \left(N \bigotimes N \bigotimes chandler \bigotimes chandler \bigotimes C \bigotimes usa\right)\\ &\qquad \bigoplus \left(N \bigotimes N \bigotimes chandler \bigotimes chandler \bigotimes O \bigotimes transponster\right) \\ &= noise \bigoplus \left(I \bigotimes C \bigotimes usa \right) \bigoplus \left(I \bigotimes transponster\right)\\ &= noise \bigoplus noise_2 \bigoplus transponster \end{split} \end{aligned}

We create a reference vector with the specification of the record we would like to extract. We then try to do a search operation in our records to retrieve the specific record of interest. In this case Chandler’s. We then look for the specific field in this retrieved record to get the information we need. Now all we have to do is to do an associative search with our list of occupations to identify Chandler’s occupation. The entire process can be done in code as below:

ref_chandler = variables['N'] * names['Chandler']
record_chandler = records * ref_chandler
ref_chandler = variables['N'] * names['Chandler'] * variables['O']
chandlers_occupation = record_chandler * ref_chandler

occupation = None
max_sim = float("-inf")
for o in occupations.keys():
sim_score = similarity(occupations[o], chandlers_occupation)
if sim_score > max_sim:
max_sim = sim_score
occupation = o

print("Chandler's occupation is: ", occupation)


output

Chandler's occupation is:  Transponster


## Noise robustness of HD

We looked at how to store and retrieve information using HD computing. Before we wrap it up, I want to briefly demonstrate the robustness of HD computing to errors / noise on the representations. We will flip a certain percentage of the bits in the representation and try to repeat the above example of trying to retrieve Chandler’s occupation. Let’s dive in:

noise_mask = [-1] * int(0.2 * 10000) + [1] * int(0.8 * 10000)

ref_chandler = variables['N'] * names['Chandler']
record_chandler = noisy_records * ref_chandler
ref_chandler = variables['N'] * names['Chandler'] * variables['O']
chandlers_occupation = record_chandler * ref_chandler

occupation = None
max_sim = float("-inf")
for o in occupations.keys():
sim_score = similarity(occupations[o], chandlers_occupation)
if sim_score > max_sim:
max_sim = sim_score
occupation = o

print("Chandler's occupation is: ", occupation)


output

Chandler's occupation is:  Transponster


Despite corrupting 20% of the representation we are still able to accurately retrieve the information we need. This is a very useful property. For e.g, in FPGA and ASIC implementations the operating voltage has an effect on the accuracy of the computations. Lowering the operating voltage will introduce errors in the calculations. With HD computing since we can tolerate errors, the operating voltage can be reduced allowing for low-power computing. This opens up opportunities for accelerating computing and reducing power making HDC very ideal for edge computing applications. This is the primary reason HDC is becoming increasingly popular in the research community.

## Wrapping it up

Before we close I want to briefly mention that the operations of binding, unbinding and bundling vary with the type of VSA we use. This paper provides a detailed explanation and comparison of different types of VSAs along with their corresponding binding, unbinding and bundling operators. Depending on the VSA use some of the properties such as invertibility changes.

If you made it this far, thank you for reading my article. In this first part, I gave a gentle introduction to Hyperdimensional computing which I hope piqued your interest and or helped you understand what HD computing is. In subsequent posts I’ll cover more encoding methods to demonstrate how we can represent a wide variety of datastructures such as sequences, features for ML algorithms and even position / coordinates. I’ll also be covering how we can perform lightweight and fast ML using HDC.

In the mean time if you want to read up more on HDC I would suggest reading this paper. For the theoretically inclined folks, there’s this excellent paper which details the mathematical nuances of HDC including proofs and bounds for each of the characteristics we looked at here.

### References

[1]: Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. paper

[2]: Theoretical Foundations of Hyperdimensional Computing. paper