MVCC in postgresql¶
Note
This is based on my lecture on the databases, which (is still available in Polish)
There are two ways to implement proper transaction isolation:
- First is by using locking. In this padagrim whenever transaction reads data a lock is issued, and any write to that data will wait until reading transaction finishes (this might be simplified).
- Second is by using MVCC — that is multi version concurrency. It works as follows: each transaction sees database in a state at the time the transaction, so reads and writes don’t need to wait for each other (there is a problem with write skew anomaly, which is solved by the postgresql 9.1 and newer.
How does MVCC work¶
Start of the transaction¶
When transaction starts following things happen (or may happen depending on isolation level):
- Transaction is assigned a
txid
a transaction ID, transaction id’s are ordered 32 bit integers (that may wrap around at some point in time, but Postgres handles it). txid``s of all committed transactions are stored (possibly in a more efficient way than storing all ``txids
)
Data constraints¶
Each row contains couple of magic columns:
xmin
- This is the
txid
of transaction that inserted this row xmax
- This is the
txid
of transaction that deleted this column cmin
,cmax
- Index of statement in transaction that added/deleted that row
Basic operations¶
- INSERT
- When a transaction inserts a row it
xmin
is set totxid
of this transaction. - DELETE
- When a transaction deletes a row it just sets
xmax
to it’stxid
- UPDATE
- Updates are replaced with a delete and insert pair.
Data visibility¶
Row is visible for transaction txid
if (all statements must be true):
- It’s
xmin < txid
(row was inserted by a transaction before this one). - Transaction
xmin
is commited (in case of Read Commited isolation level), orxmin
was commited before start of current transaction (other isolation levels) - It’s
xmax
is empty orxmax > txid
(row was deleted by a transaction that started after this one).
In case of transaction that issue multiple statements cmin
, cmax
are
used for example to have a cursor that consistently iterates over a table,
even if the same transaction alters the table.
VACUUM¶
Data can’t be deleted from disk immediately in databases using MVCC, because
ongoing transactions might not ‘see’ the delete, and still need to access
deleted row. Data can be deleted only after all transactions with txid
lower
row’s than xmax
have either committed or have been rolled back.
Postgresql does this (more or less) automatically, but you might call
VACUUM
by hand if you need to reclaim space (this space will not necessarily
be freed to the OS, rather it will be accessible for new inserts).
Sources¶
- BRIUCE MOMJIAN
MVCC Unmasked
: http://momjian.us/main/writings/pgsql/mvcc.pdf (I have also cached itlocally
, due to permissive CC license) - Serializable Snapshot Isolation on Postgresql Wiki,
(due to even more permissive license it is also cached
locally
. - My old lectures: http://db.fizyka.pw.edu.pl/~bzdak/bazy_danych_ed_20/wyklad10/wyk10.html#mvcc-w-postgresq
- Wikibook I have collected
(in case some future problems with Wikipedia
you can try version hosted here
).