Some remarks about databases

Note

I was refreshing my knowledge on the broad subject of databases, mostly by reading Wikipedia articles, which resulted (apart from notes attached here) in creating this wikipedia book (in case some future problems with Wikipedia you can try version hosted here).

Read anomalies Anomalies

What can happen if your database doesn’t serialize transactions properly:

Dirty read

When one transaction sees uncommitted data of another one

Short example:

We have two transactions T1 and T2.

  • T1 reads value of a row R
  • T2 writes new value of row R
  • T1 sees a change in row value
Non-Repeatable Read

When value of a row changes during a transaction.

Short example:

We have two transactions T1 and T2.

  • T1 reads value of a row R
  • T2 writes new value of row R
  • T2 commits <- Difference from Dirty Read
  • T1 sees a change in row calue
Phantom read

When a result of query changes during the transaction.

Note

Database without Non-Repeatable Reads cam still suffer from phantom reads, non-repeatable reads apply when other transaction updates or deletes records while phantom also apply to inserted rows.

Short example:

  • T1 executes a query: SELECT AVG(value) FROM account WHERE ...GROUP BY ...
  • T2 executes: INSERT .... INTO account
  • T2 commits
  • T1 executes a query: SELECT AVG(value) FROM account WHERE ...GROUP BY ... and gets different results.
Write Skew

This is an anomaly for MVCC databases, which occours each transaction works on a snapshot of database, and don’t see changes done by each other.

Note

This anomaly is not defined in SQL standard, as SQL standard had locking databases in mind.

Short example:

Let’s consider a banking application, with following constraint: balance of account must be non-negative, but we don’t store it explicitly, it is just calculated by SELECT SUM(t.value) FOR account .... INNER JOIN transaction as t ON ...

  • At the start of the transaction balance of account A is 100PLN
  • Transaction T1 starts
  • Transaction T2 starts
  • Transaction T1 adds a transaction for A that withdrawns 100PLN (which is valid as constraint is held)
  • T1 Commits
  • Transaction T2 does the same (which still is valid as constraint is held inside a snapshot for T2 — constraint is not held “outside of” this snapshot).

Note

Wikipedia says that you might resolve this issue by explicitly writing to a dummy row just to force write-write confilct.

Transaction isolation Levels

Serializable

Highest transaction isolation. Transactions are executed as if they were executed serially.

No read anomalies can occur.

In database that uses locks it requires to put locks on:

  • Every row you write to
  • Every row you read
  • Range lock for every query (for example lock all records for which specified condition is true).

These locks are held until the end of the transaction.

Snapshot Isolation

This transaction isolation level works as follows: Each transaction sees database in a (consistent) state the database was at the beginning of the transaction (with any changes it did).

This is very different from Serializable, as it allows “Write Skew” anomalies.

Note

This isolation level is not defined in SQL standard.

You can define Serializable isolation on top of Snapshot Isolation relatively easy by:

  • This can be done relatively easy by the DMBS, see this article:

    Cahill, M. J., Röhm, U., & Fekete, A. D. (2009). Serializable isolation for snapshot databases. ACM Transactions on Database Systems, 34(4), 1–42. doi:10.1145/1620585.1620587

  • Writing proper code that introduces artificial write conflicts between data.

    Note

    These conflicts are artificial, because they exist only to introduce write conflicts, which will abort transaction that tries to write to the database as the second one.

Repeatable Read

This isolation level reading a row will always produce the same value, even of these rows were changed by other transactions. However query results are can change during the transaction (especially for aggregate queries).

In database that uses locks it requires to put locks on:

  • Every row you write to
  • Every row you read

These locks are held until the end of the transaction.

Read Committed

In this isolation level transaction doesn’t see changes made by another uncommitted transactions, however it may see changes made by commited transactions.

In database that uses locks it requires to put locks on:

  • Every row you write to
  • Every row you read

Write locks are held until the end of the transaction, however read locks are released after each select.

Read Uncommitted
In this level you can see uncommitted data sent by saved by other transactions.

Serializability

We have a some set of concurrent transactions, these transactions are serializable, if one can produce a schedule containing these transactions executed serially in some order.

Serializability is important because:

If DBMS checks if database is in a consistent state after each of the transactions, and transactions are serializable — it means that the database is in a consitent state after all transactions.