A cryptographic hash is a short, fixed length sequence of numbers which is easily calculated from a much larger input sequence such as a file downloaded from the internet. By recalculating the cryptographic hash of such a file and comparing it with an advertised cryptographic hash, one can check that the file has not changed. This allows the integrity of a file to be checked and any malicious or accidental changes to be spotted.
How can you trust the internet?
The Internet can be a hazardous place to download files from. How can you know that the file you are downloading has not been tampered with by others? How can you know that the file you have downloaded has not had an error introduced prior to becoming available on your computer? Cryptographic hashes, if used appropriately, allow you to be sure of both things.
In this article, we discuss what cryptographic hashes are, where you find them on the internet, how they are used in verifying a file’s contents, whether you should use them and how to use them.
What are cryptographic hashes?
Cryptographic hashes are created by applying a cryptographic hash function onto an arbitrary length input sequence to produce a fixed length and much shorter output sequence, which is the cryptographic hash.
A cryptographic hash function has the following features:
- It is one way, as in, you can convert the input sequence to an output sequence but it is impossible to convert the output sequence back to the input sequence. Functions which exhibit this behaviour are also called trap door functions or non-invertible functions (by impossible I mean incredibly difficult to do in any reasonable amount of time).
- A small change in the input sequence produces a large change in the output sequence.
- It is computationally easy to perform and so can be performed quickly.
- It accepts an arbitrary length input sequence.
- It produces a fixed size small output sequence.
- It is deterministic - the same input sequence produces the same output sequence every time.
- It is very difficult to find two input sequences which, when hashed, give a specific output sequence; this is known as collision resistance.
Common examples of cryptographic hash functions are MD5, SHA-1, SHA-2 and SHA-3. Currently, MD5 and SHA-1 are not considered to be sufficiently strong to be used to generate cryptographic hashes for files; so, generally, you will see SHA-2 or SHA-3 algorithms being used.
The SHA-2 algorithm has six variations, which are: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224 and SHA-512/256. These six variations produce output sequence lengths of 224, 256, 384, 512, 224 and 256 bits respectively. If you see SHA-256 referenced on a web page, it will nearly always refer to the SHA-256 variant of SHA-2. SHA-3 also has different output sequence lengths but in my experience, you are more likely to see something like SHA3-256 if SHA-3 is used. If you are in any doubt, check the small print or try both.
For a simple example of how small changes in the input sequence can hugely change the cryptographic hash, see the following figure:
The preceding example shows the difference in cryptographic hash when the popularly known string “The quick brown fox jumped over the lazy sleeping dog” has the ‘f’ of fox changed to be a ‘b’. As can be seen, the change in the hash is huge, with the second hash being totally different to the first.
Where do you find cryptographic hashes?
You may already have noticed long strings of hexadecimal digits, like those shown above, on websites next to the files you are downloading. These are cryptographic hashes of the files on the web site, performed by the person who made the file available for download. Normally the hash would be accompanied by information which indicates which hash function was used to produce the hash.
As an example of a cryptographic hash and how it can be presented on a website, the preceding figure shows the Download page for the popular FTP software FileZilla. As can be seen, there is a .exe file which is the file to download and a .exe.sha512 file which is the “Checksum” or cryptographic hash of the .exe file. In this case I have already clicked on the little ‘I’ icon which has shown the full SHA-512 hash.
You may be wondering why the 512 hash above has a file length of 160 bytes? This is because the file containing the hash also has the name of the file inserted after the hash. The hash itself is 512 bits long, which is 64 bytes (512/8). Each byte is expressed using two characters e.g. 255 is 1111 1111 in binary and FF in hexadecimal, so the 64 bytes are shown in 128 characters, plus the filename and some whitespace comes to 160 characters.
How are they used to verify the contents of a file?
To verify that a downloaded file is still as intended you can follow the steps below:
- Find the hash on the website.
- Determine the algorithm used for hashing from the website.
- Use the tools discussed below to calculate the hash of the downloaded file.
- Compare this hash to the hash on the web site.
- If they are exactly the same then the file is good.
If the hashes do not match then you will need to re-download the file. If they do match exactly then you can have the confidence that the file downloaded has not been corrupted and can be used safely on your machine.
As with any web browsing, it is always best to perform these operations on secured websites i.e. those using HTTPS at the start of their URLs or showing the lock icon on most browsers.
Do I need to use them?
In this day and age, with most websites being secured with HTTPS it is less likely (although not impossible) that files on web sites will be untrustworthy. However, when downloading large files, like DVD images or downloading over a less trustworthy medium like TOR or BitTorrent it can be advantageous to check the fil integrity using cryptographic hashes.
How do I check that a downloaded file is as expected in Windows, Mac, Linux?
To check the cryptographic hash of a file on Windows the easiest way is to use the command line utility “certutil”, following the steps below:
- Start a command line window up; if you can’t find this on the Windows menu, just type cmd into the search box after clicking the Windows icon in the bottom left.
- Change directory to the directory containing the file you wish to check.
- Enter the command “certutil –hashfile -?” to get help on the operation and also allow you to decide the string to use to select the hash algorithm; we will assume SHA512 in our example.
- To get the hash, enter the command “certutil –hashfile file.txt SHA512”.
This will calculate and display the SHA-512 cryptographic hash for the file ‘file.txt’.
On OS-X use the built-in shasum command from a Terminal window. From Launchpad click on the Other group and then on Terminal. In the terminal windows change to the folder containing the file you wish to check and issue the command “shasum –a 512 file.txt”. If you wish to use another hash algorithm type “shasum –h” to get help and make your selection, then retype the command with the correct –a option.
On Linux you should use one of the built-in commands such as md5sum, sha1sum, sha224sum, sha256sum, sha384sum or sha512sum. For some other tools in Linux you can look at this article.
In this article, we have skated over what cryptographic hashes are, why and how they can be used to help you verify the integrity of a file downloaded from the internet. I’ve given a brief example of where you might find such a hash and also the commands that you can use on popular operating systems to perform the check. We’ve only just scratched the surface of cryptographic hashes and if it has piqued your interest I would recommend heading over to Wikipedia and doing a bit more reading on cryptographic hashes.
Mark Davison, Terzo Digital, February 2018