
Hashing
We use hash functions every single day as programmers. Objects in JavaScript,
dictionaries in Python, HashMap
in Java. Every language has a data structure
that allows us to map keys to values, and under the hood they all make use of
hash functions to do that. You may have also heard people talk about
"checksumming", perhaps to check the validity of a file you've downloaded.
These also use hash functions.
But what is a hash function, and how does it work?
In this post we're going to demystify what hash functions are. We're going to visualise what makes a good hash function, explore the use-cases of hash functions, discuss the dangers of bad hash functions, and at the end you're going to be able to have a go at writing your own hash function!
› What is a hash function?
Put simply, hash functions take data of any size and produce a number. They always produce the same number for the same input, and the number they produce is of a fixed size. When I say fixed size, I just mean there is a defined minimum and maximum value.
If we were to write the function signature in TypeScript, it would look like this:
function hash(input: string): number {
return 0;
}
That's all there is to it. You could use this as your hash function if you really wanted to, though you'd quickly run in to problems.
![]()
What if we wanted to hash something that isn't a string, like a file?
I've made a simplification. In reality, hash functions take a byte array as their input. To keep our code simple, and to make the examples easier to understand, we're going to assume all byte arrays can be represented as strings.
› What makes a good hash function?
The best hash functions are ones that have the fewest "collisions." A collision is when two different inputs produce the same output. Because we're taking input of any size and outputting a number with a fixed size, collisions can't be avoided. If we wrote a hash function that output a number between 0 and 8, and we passed it 9 distinct values, there will always be at least one collision.