MVCC in postgresql

Note

This is based on my lecture on the databases, which (is still available in Polish)

There are two ways to implement proper transaction isolation:

  • First is by using locking. In this padagrim whenever transaction reads data a lock is issued, and any write to that data will wait until reading transaction finishes (this might be simplified).
  • Second is by using MVCC — that is multi version concurrency. It works as follows: each transaction sees database in a state at the time the transaction, so reads and writes don’t need to wait for each other (there is a problem with write skew anomaly, which is solved by the postgresql 9.1 and newer.

How does MVCC work

Start of the transaction

When transaction starts following things happen (or may happen depending on isolation level):

  • Transaction is assigned a txid a transaction ID, transaction id’s are ordered 32 bit integers (that may wrap around at some point in time, but Postgres handles it).
  • txid``s of all committed transactions are stored (possibly in a more efficient way than storing all ``txids)

Data constraints

Each row contains couple of magic columns:

xmin
This is the txid of transaction that inserted this row
xmax
This is the txid of transaction that deleted this column
cmin, cmax
Index of statement in transaction that added/deleted that row

Basic operations

INSERT
When a transaction inserts a row it xmin is set to txid of this transaction.
DELETE
When a transaction deletes a row it just sets xmax to it’s txid
UPDATE
Updates are replaced with a delete and insert pair.

Data visibility

Row is visible for transaction txid if (all statements must be true):

  • It’s xmin < txid (row was inserted by a transaction before this one).
  • Transaction xmin is commited (in case of Read Commited isolation level), or xmin was commited before start of current transaction (other isolation levels)
  • It’s xmax is empty or xmax > txid (row was deleted by a transaction that started after this one).

In case of transaction that issue multiple statements cmin, cmax are used for example to have a cursor that consistently iterates over a table, even if the same transaction alters the table.

VACUUM

Data can’t be deleted from disk immediately in databases using MVCC, because ongoing transactions might not ‘see’ the delete, and still need to access deleted row. Data can be deleted only after all transactions with txid lower row’s than xmax have either committed or have been rolled back.

Postgresql does this (more or less) automatically, but you might call VACUUM by hand if you need to reclaim space (this space will not necessarily be freed to the OS, rather it will be accessible for new inserts).

Sources