Breaking Blocks: How Hashing Holds the Blockchain Together
Blockchain security relies on one key principle: once data is recorded, it can't be altered without breaking the chain. This post dives into how blocks are structured, the magic of hashing, and how a small change can unravel everything. Plus, we build a simple blockchain block in TypeScript!
Introduction
If you've come from my previous post in this series, welcome back! Otherwise, you should read it: Build Your Own Blockchain: The First Step to Building a Decentralised Future.
My last post was quite wordy, and if you're here, you're hoping to get stuck on some more practical parts of making your own blockchain. That's understandable, but it's essential to understand the underlying principles when writing code rather than just copying and pasting. For this reason, I decided not to make and share a Git repo, as I didn't want to tempt people to just copy what I had written, but instead to understand the methods used to make a secure blockchain.
As a reminder, in the last post, we started to set up our environment. We're not aiming to make a fully-fledged, production-ready blockchain-based cryptocurrency, so I've forgone much of the setup I would usually include in a production project.
For this post, we will start by looking at a blockchain's minimal data structure and exploring our first cryptography topic: hashing.
What is a block?
I'm going out on a limb here and say that you've probably guessed that a blockchain is made up of blocks. Unlike sweetbreads, which are neither sweet nor bread, the clue is in the name.
Within a blockchain, each block exists to hold some data. In the case of cryptocurrency, this data will consist of transactions. But for it to be a chain, each block needs to be linked to each other sequentially; each block must be in order. This is also true for financial transactions. You want each transaction to be recorded in the order that they happened; otherwise, you might find yourself spending money that isn't there anymore.
To make this work, our block has to have a few different properties:
- Index: This is the block's position in the chain and can be represented as a number.
- Timestamp: This is the time the block was created.
- Data: For now, this can be any serialisable data, but in the future, it will be our transactions.
- Previous Block Hash: We'll get to hashing soon, but for now, it refers to the previous block and is important for linkage.
- Hash: The hash for this block.
The index isn't entirely necessary, strictly speaking, but I'm keeping it as a fast way to make it clear to the reader what order my blocks are in.
This structure ensures that our blocks can't be changed once they are part of the chain, but you're probably wondering how. Time to talk about hashing. In blockchain, a hash is used to confirm linkage. A hash is a function you run on some data to get a desired output, usually represented as a string. I get linkage if I use the hash from my previous block in the data when making a hash in my new block. Changing data in an old block will change its hash, which will change the hash of every following block, thus unravelling the whole thing. Hashing protects against bad actors who wish to edit the blockchain after the fact.
What is hashing?
In short, hashing is a destructive mathematical function. When you hash data, you put it through a transformative process that results in data loss and results in data of a fixed length. There are a few things to consider for a basic hash:
- Irreversible: Reversing a hash function should be practically impossible due to its destructive nature.
- Deterministic: The data from the same hashing function should always output the same hash.
- Collision space: It should be implausible that similar (but not the same) data results in the same hash.
Let's look at each of these a bit closer. For a hash to be irreversible, it needs to be destructive; we need to lose data. Consider this basic mathematical function
`f(x) => 0x`
No matter what value I put in the function, the result is always 0. Therefore, if I have the hash 0, I cannot determine the original data because I have destroyed it. This is a terrible example because the collision space is 1, and all data is the same hash.
Let's write a new hash as a TypeScript function. You can overwrite the code from the previous post in the index.ts
file:
function simpleHash(x: number): number {
// Extract 8-bit chunks from x
let a = x & 0xff;
let b = (x >> 8) & 0xff;
let c = (x >> 16) & 0xff;
let d = (x >> 24) & 0xff;
// Apply transformations
let result = (a ^ (b * 3) ^ (c * 7) ^ (d * 11)) & 0xffff;
return result;
}
In this code, we split up `x` into 8-bit segments. We then multiply each segment by prime numbers and use the bitwise function `XOR` to distribute the bits. This results in only 16 bits of whatever was passed into the function. So it's still destructive, but now we have a larger collision space.
If we run this with a few different tests, we can see a variety of outputs. Remember to transpile you TypeScript before running.
➜ minimal-blockchain node dist/index.js
┌─────────┬────────────┬────────┐
│ (index) │ input │ output │
├─────────┼────────────┼────────┤
│ 0 │ 0 │ 0 │
│ 1 │ 1 │ 1 │
│ 2 │ 42 │ 42 │
│ 3 │ 12345 │ 169 │
│ 4 │ 3735928559 │ 4068 │
└─────────┴────────────┴────────┘
I will always get the same outputs from these inputs. Therefore, my function is deterministic.
As you can see from the data, we now get a variety of outputs. We're only considering a total of 24 bits from our input, and we're reducing this even further to just 16 bits, meaning the function is destructive.
But "Leo, I can clearly see that 0, 1, and 42 all just output as themselves. Any value that uses less than 16 bits is going to hash as itself!" I can hear you say.
Well, you'd be right. But consider what happens if you run the number 297 through the hash? Or 556? What about the number 1,448,697,834? All of these result in the output 42. So you can't say with any certainty which value you started with.
This leads to the next point, collision space. In our example, the result is 16 bits. That means that only 2^16 available hashes can be made from our algorithm, that is, 65,536 possible hashes. That might seem like a lot, but at the time of writing, the Bitcoin blockchain has nearly 900k blocks. A low collision space means it would be easy to manipulate the data in the block to find a hash collision. If we think back to how hashes are used to link the chain, if I could change the data in an old block without changing its hash, I could commit crypto fraud!
I don't know about you, but that sounds like a bit of fun. Let's commit crypto fraud!
Finding a collision
To make a chain, I need at least two blocks. I'm going to use my simple hash function to create the hashes, and my first block, the genesis block, will just have the hash of 0 as its previous hash. We'll need to change our function slightly to accept an object as input by first converting it into a numeric value.
function simpleHash(input: any): number {
// Convert input to a string
let str = JSON.stringify(input);
// Convert string to an array of characters
let chars = str.split("");
// Take the numeric value of each string character and add them together
const x = chars.reduce((acc, char) => acc + char.charCodeAt(0), 0);
// Extract 8-bit chunks from x
let a = x & 0xff;
let b = (x >> 8) & 0xff;
let c = (x >> 16) & 0xff;
let d = (x >> 24) & 0xff;
// Apply transformations
let result = (a ^ (b * 3) ^ (c * 7) ^ (d * 11)) & 0xffff;
return result;
}
And now for our blocks:
[
{
index: 0,
timestamp: 0,
data: '{"to":"Leo","from":"Miles","amount":1}',
previousHash: 48,
hash: 11
},
{
index: 1,
timestamp: 1,
data: '{"to":"Miles","from":"Leo","amount":1}',
previousHash: 11,
hash: 3
}
]
In our very simple and small blockchain, we have two transactions. In the first, Miles sends Leo (that's me) 1 coin, and in the next transaction, I refund Miles 1 coin. Then, we can assume more blocks are added afterwards.
Note that the first block had a hash value of 11, and the second block had a hash value of 3. I want to cheat Miles out of some coins, so I will find a way to change the first block to give me more coins while keeping the hash the same. I can run a simple loop to brute-force the block creation until I get a block with a hash of 11 and a high transaction value.
let i = 2;
let block = { ...genesisBlock };
while (true) {
block = { ...block, data: JSON.stringify({ to: "Miles", from: "Leo", amount: i }) };
const hash = simpleHash(block);
if (hash === genesisBlockHash) {
console.log("Found the block", block);
console.log("Hash", hash);
break;
}
i++;
}
Running this produces:
➜ minimal-blockchain node dist/index.js
Found the block {
index: 0,
timestamp: 0,
data: '{"to":"Miles","from":"Leo","amount":100049}',
previousHash: 48
}
Hash 11
With that, Miles sent me 100,049 coins in the first transaction! My hash is still 11, meaning it matches the next block, and I now have a blockchain that confirms the transaction. If I distribute my blockchain to other people, Miles will have been defrauded for a lot of coins!
Better Hashing
Ok, so we've acknowledged that my hash function is a bit crap. But it should have illustrated some basic concepts and got you thinking about trust and security. There are plenty of hashing algorithms out there, all with a variety of use cases. Hashes aren't only used in blockchain but are generally used for verification purposes, like holding the hash of passwords in a database or providing a hash for software to show the code hasn't been tampered with. To find a good hash, we need to improve our list.
- Deterministic: The same input will always produce the same hash.
- Fast Computation: Generating a hash should be computationally efficient.
- Preimage Resistance: AKA Irreversible. It should be infeasible to determine the input from its hash.
- Small Changes Cause Large Differences (Avalanche Effect): A minor alteration in the input should result in a drastically different hash.
- Collision Resistance: No two different inputs should produce the same hash.
Some well-known hash functions include MD5, SHA2 and Bcrypt.
MD5 is generally used for data integrity checks. It is 128-bit in size and very fast to generate. However, its collision space is effectively broken. Also, because it's so quick to generate, brute-forcing it would be pretty straightforward. It wouldn't be ideal for our blockchain because, with more time and computing resources, we could still manipulate blocks and rob Miles for even more coins!
SHA-2 comes in several sizes, ranging from 224-bit to 512-bit. It's not as fast to generate as an MD5, making brute forcing take longer, but you want to use the larger sizes to make it more collision-resistant. SHA-2 is often used for passwords, though it was not meant for that purpose because it lacks Salt support (Note to self: Do a future post on password security). The simplest explanation of a Salt is extra data you add when hashing to ensure identical data results in a different hash, requiring the salt to also be used. This prevents against using pre-computed hash tables.
Finally, we have Bcrypt. It's 192-bit in size, but it's intentionally slow to generate. It does come with salt support, which makes it very collision-resistant and difficult to brute force. BCrypt is great for passwords but too much for our blockchain. So, we should go with SHA-2, specifically the 256-bit version, because I like powers of 2.
Merkle Trees
One problem with blockchain is that you must regularly verify the whole chain. The more blocks on the chain, the longer it will take to do this, as each block must run its hashes. A Merkle tree is a data structure used in blockchains to efficiently verify the integrity of large data. It organises transactions in a tree-like format, where each leaf node represents a transaction hash, and parent nodes are hashes of their respective children. The root hash represents the entire block's transactions and is included in the block header. We're not bothering with this right now in our minimalistic blockchain. Still, if we wanted efficient blockchain verification, this would be the way to do it, as it would allow rapid verification without downloading the entire blockchain.
Building a Simple Block in TypeScript
Ok, you've seen a simple block, and we've messed around with breaking our simple hash function. But now, let's create a viable block class for our project.
Step 1: Implement the Block Class
Create a file block.ts and add the following code:
import { createHash } from "crypto";
export class Block {
index: number;
timestamp: number;
data: any;
previousHash: string;
hash: string;
constructor(index: number, timestamp: number, data: any, previousHash: string = "") {
this.index = index;
this.timestamp = timestamp;
this.data = data;
this.previousHash = previousHash;
this.hash = this.calculateHash();
}
calculateHash(): string {
const hash = createHash("sha256");
hash.update(
JSON.stringify(this.index) +
JSON.stringify(this.timestamp) +
JSON.stringify(this.data) +
JSON.stringify(this.previousHash),
);
return hash.digest("hex");
}
}
// Example usage
const block1 = new Block(1, Date.now(), { amount: 100 }, "0");
console.log(block1);
Step 2: Run the Code
Now run the block code to see the example usage using node dist/block.js
➜ minimal-blockchain node dist/block.js
Block {
index: 0,
timestamp: 1739016385097,
data: { to: 'Leo', from: 'Miles', amount: 1 },
previousHash: '',
hash: '01839c80b778d595d9b66eea80cc596d9b253b5e205f98e800b06f9300324d0a'
}
You should see a JSON representation of the block, including its computed hash.
Step 3: Demonstrate Hash Sensitivity
Modify the block's date field and rerun the code. Notice how a small change completely alters the hash, demonstrating the avalanche effect.
➜ minimal-blockchain node dist/block.js
Block {
index: 0,
timestamp: 1739016482597,
data: { to: 'Leo', from: 'Miles', amount: 1 },
previousHash: '',
hash: 'f8f158245c8c63ae89081472274eeb975e0840f648b01da24113b25648cc9b3f'
}
See, just by changing the timestamp value the hash output is completely different.
Conclusion
In this post, we explored blockchain data's fundamental structure and hashing's role in ensuring security. We built a simple Block class in TypeScript to demonstrate how changing data affects the hash, highlighting the tamper-resistant nature of blockchain.
In the next post, we will extend this foundation by chaining blocks together, introducing validation mechanisms, and discussing blockchain integrity checks. Stay tuned!