Cryptographic hash functions

Posted by David Harding, Saïvann Carignan, Balaji Srinivasan

Tutorial

What you’ll learn

By the end of this tutorial you will have learned how to use SHA256 at the command line to verify data integrity. This is a critical property necessary for building a blockchain that cannot be modified by attackers.

Install 21

How hashes work

Whether you're mining with the 21 Bitcoin Computer or simply sending and receiving transactions from the Bitcoin network using 21, you're repeatedly creating and verifying cryptographic hashes. (We’ll simply call them “hashes” from this point on, but you should be aware that there are non-cryptographic hashes that don’t have the same essential security properties.)

Hashes are designed to convert an arbitrary amount of data to a short string in a fully reproducible way, so inputting the same data into the hash function always produces the same output string. Let’s see that in action:

## Save the text “hello world” in the file hello.txt
echo "hello world" > hello.txt

## Verify that “hello world” (without quotes) was saved correctly
cat hello.txt

## Get the SHA256 hash for “hello world”
sha256sum hello.txt

Anyone who runs the above commands precisely as shown should see the result:

a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447 hello.txt

This isn’t specific to your computer—anyone in the world on any platform and any operating system will get the same hash for the “hello world” text string. That is because SHA256 is a mathematical function which maps the underlying byte representation of “hello world” (a number) to a byte array (another number) whose hexadecimal representation is a948904f2f0f479b8f8197694b30184b0d2ed1c1cd2a1ec0fb85d299a192a447.

In shorthand, we say that a948...a447 is the SHA256 hash of “hello world”.

Different data, totally different hashes

As of this writing, the Bitcoin blockchain is over 50 gigabytes. Yet anyone can hash it and get a single 32 byte hash. By sharing that tiny hash with other people, they can be sure that everyone has the same blockchain with the same record of transactions.

That works because every time you hash the same input data with the same hash function, you get the same output hash. But hashes have another important property—every time you hash different data, you get a completely different hash. The data is so different that there’s no way to predict what any part of it will be—in other words, it looks like random data. Let's give that a try:

## Hash three different strings
echo foo | sha256sum
echo bar | sha256sum
echo baz | sha256sum

Notice that each output hash is completely different, except for being the same length.

We described Bitcoin's use of Hashcash-style Proof Of Work (POW) in an earlier tutorial, but now that you know about hashes we can go into a little bit more detail: Hashcash creator Adam Back realized in 1997 that it was possible to use hash functions to prove that you did a certain amount of computational work.

For example, the main hash in Bitcoin produces a 256 bit random-looking output like those shown above, which means that its output is similar in randomness to flipping 256 coins in sequence and recording each heads as a 0 and each tails as a 1.

## A hash as bits (the hashes above are shown as bytes)
10011011101010101010010100...01000011111011110111010

So if you create two hashes from two different inputs, you would expect (on average) that the first bit of one of those hashes would be a 0 (“heads”). That means a hash starting with a zero bit is proof that you did the computational work (“number of coin flips”) necessary to create (on average) two hashes.

Extending this, a hash starting with a zero byte (8 zero bits, or “8 heads”, in a row) is proof you did the computational work (on average) to create 256 hashes.

The nice thing about hashes rather than coin flips is that it’s possible for anyone else in the world to verify you did the work by simply hashing the same input data you used with the same hash function. It doesn’t matter whether it took 256 hashes or 256 trillion hashes to get your result, anyone else in the world can verify it by doing a single hash.

But creating hashes isn't enough on its own to make Bitcoin work. We also need a way for individuals to authenticate themselves so that they can spend bitcoins to each other. That's why the next tutorial shows you how Bitcoin uses digital signatures to create cryptographically-secure transactions.