What is cryptocurrency mining

Asymmetric cryptography

Asymmetric cryptography is a black box algorithm that can be used for signing. (By black box I mean that I won't go into detail how it works, but obviously, these algorithms are public)

The signature algorithm works like that:

  • You have two "keys" --- these "keys" are just binary data. One key is named private key second is named public key.
  • You can't derive private key from public key (and usually you can't derive public key from private key).
  • You have some data you need to sign.
  • To sign this data you need private key. This produces a signature which is just another piece of binary data.
  • Anyone that knows public key, signature and data can validate that this signature was performed using private key.

What is a crypto coin

For most crypto coins coin is just an assertion that says, everyone holding private key for this public key can spend this coin.

This might be a little bit more complex, for example, sometimes you might want to create coins that, eg. require three signatures from three people to be spent.

What is a double spend attack

Let's say I buy some VPN Access using my crypto coins. To do this I prepare a a transaction that says:

  • I hereby give these coins to public address of VPN service (transaction A)

The problem is I can as well prepare a transaction that says:

  • I hereby give the same coins to, let's say, my grandmother (transaction B)

From the network's point of view, both these transactions have valid signatures, however, they can't be both valid, as in a stable monetary system people can't spend their money more than once.

One may attempt to solve it using: "which transaction entered the network first" (actually, this approach is used, eg. by banks). However in the case of crypto coins, we explicitly don't want to have some central authority that validates transactions. The whole system needs to be distributed. In this case, a malicious attacker can send the first transaction to a node in Japan, and the second transaction to a node in the USA. In this case, network latency will cause roughly half of the network to assume transaction A was first, and the other half will assume transaction B was first.

Creating distributed systems that have a consistent view of the world is hard. There is even a "Cap Theorem", that tells us that having strong consistency in distributed, fault tolerant system is impossible.

One of the methods to get eventual consistency, and prevent double spend attack is called Proof of Work (PoW).

Proof of Work (or mining)

Let's say there is a computationally intensive puzzle that has the following properties:

  • Amount of computation required to solve this puzzle can be tweaked;

  • Solving this puzzle involves guessing the solution and then verifying this solution works;

    This property is here basically to allow organizations with different computing powers to participate in solving this puzzle.

    Let's say we have two organizations, first is called A Team, and second B Team. A team has twice more computing power than B team.

    If this puzzle required fixed amount of computation to solve, A Team would always solve it first. We don't want this property! In this case, we'd want A Team solving the puzzle in 66% of cases, and B Team in 33% of cases.

I'll give you example construction of such puzzle later.

Mining works as follows:

  • Actually each transaction contains a "network fee" this is a fee that can be spent by miner;
  • Miner takes a block of transactions, these transactions need to be all valid, and coherent both with themselves and previous blocks.
  • Miner may then create a transaction that gives him N coins out of thin air (in case of Bitcoin this extra reward will go down)
  • Block contains also the reference to the previous mined block.
  • This block of transactions is then sealed by solving this computationally intensive puzzle.

When miner solves this puzzle, his block is published, and the network may (or may not) accept it.

Note that since each block has a reference to the previous one, blocks form a chain (called blockchain).

Network clients accept blocks that are:

  • Part of the longest chain of blocks (part of the longest blockchain);
  • Are coherent with previous blocks;

This protects against double spend attack --- if there are contradicting transactions in the network miners will only select a subset of these transactions that is coherent.

The puzzle needs to be computationally expensive so an attacker can't easily create blockchain out of thin air (right now mining Bitcoin takes about 2 promiles of electricity worldwide --- and much more than 2 promiles of computing power), and this is only to mine new blocks. To create blockchain from scratch you'd need to have much more power.

To control the network (that is to erase it's most recent history) you'd need to control 51% of computing power dedicated to mining (which is not really impossible given how much mining has concentrated).

The difficulty of the puzzle needs to be adaptable, as we want to have predictable time between each mined block, in case more computing power is used for mining, the difficulty of the puzzle is increased, and if less power is dedicated for mining, the difficulty is decreased.

Hashing functions

Cryptographically secure hash functions are very peculiar things.

Long story short, these are functions that:

  • Take arbitrarily long data input (eg. a movie)
  • Produce output of known length (this is a hash)

Also, they have the following additional properties:

  • There is no, computationally cheap, algorithm that allows us to produce input that has a given hash;
  • It's hard to create two inputs of that have the same hash; (This property really follows from the first one, but it's good to have it explicit)
  • Hashes are distributed uniformly --- that is --- before you evaluate this function on given input every hash value is equally probable;
  • If a single bit changes in input, on average half of output bits change;

Note

Hash functions are also used in peer-to-peer networks. Torrent files essentially contain a hash of file you download.

Your client then downloads file contents from random strangers from the Internet, then compute the hash of the file, if file hash matches you can be sure that the file (you downloaded from random untrustworthy strangers from the internet) is, in fact, the same file someone uploaded to the network long time ago.

This is an overly simplified description of p2p networks :) but you get the the gist of it.

Typical Proof Of Work puzzle

Typical proof of work puzzle works along the lines:

  • You have a list of transactions (which is really some binary data)
  • You add the hash of the last block to the list of transactions and compute resulting hash. This hash is the hash of the current block. Let's call this block_id.

The puzzle is as follows:

  • You compute a hash of block_id and a nonce;
  • Nonce is some random data (so miner is allowed to either use a random nonce, or just try all possible nonces in sequence)
  • You select nonce so N first bytes of resulting hash are zeroed;

N can be used to adjust the difficulty of the puzzle;

Since there is no way to predict what hash will be produced, and hash function produces an uniform distribution of results, you just need to try a lot of nonces.

This puzzle allows teams of different computing powers to compete with each other, if each of them tries out different nonces first one that gets lucky wins.