Post

Interview Question: How are Database Transactions Implemented?

Photo by Christina @ wocintechchat.com on Unsplash

I once got asked this question in an interview for a fullstack developer intern position, and I was curious on the best way to answer it, so I did a bit of research afterwards.

In the context of databases, a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) is called a transaction.

For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

Transactions are used to ensure data consistency and recoverability in the event of system failures.

So how are transactions implemented? There are two popular families of techniques: write-ahead logging and shadow paging.

For both of them, locks must be acquired on:

  • All information to be updated
  • All information to be read (depending on the level of isolation)

Write-Ahead Logging (WAL)

Basically, changes to the database are first recorded in a log before they are applied to the actual data pages.

  1. Changes to the database are first recorded in a log before they are applied to the actual data pages.
  2. Before modifying a data page, the system writes a log record that contains the details of the change. This log is written to a durable storage medium (e.g., disk) before the actual data page is updated.
  3. During system recovery, the log is used to redo or undo transactions to bring the database to a consistent state.
  • Concurrency: Allows for concurrent execution of transactions, as the log ensures that changes are recorded before being applied to the database.
  • Overhead: Requires additional I/O operations for writing log records, which can introduce some overhead, especially for write-intensive workloads.
  • Durability: Provides strong durability guarantees, as the log records are durable before any changes to the data pages are made.

Shadow Paging

Shadow paging maintains a shadow copy (or snapshot) of the entire database and switches between the old and new versions to implement transactions.

  • Principle: Shadow paging maintains a shadow copy (or snapshot) of the entire database and switches between the old and new versions to implement transactions.
  • Copy: When a change is made, a new copy of the entire database is created, which becomes the current version, and the old version is retained.
  • Recovery: During recovery, the system can simply switch back to the last consistent snapshot (the shadow) to undo any uncommitted transactions.
  • Concurrency: Not inherently designed for concurrent execution of transactions, as multiple transactions might need their own shadow copies, which can be space-inefficient.
  • Overhead: Creating full copies of the database for each change can be space-intensive and time-consuming.
  • Durability: Provides durability when a snapshot is created, but changes might not be durable until a new snapshot is taken.

An alternative to locking is multiversion concurrency control, in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks, i.e., writing transactions do not block reading transactions, and readers do not block writers. Going back to the example, when user A’s transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data. One implementation, namely snapshot isolation, relaxes the isolation property.

Conclusion

Choosing between WAL and Shadow Paging depends on the specific requirements and trade-offs of the application:

  • WAL is more suitable for systems requiring high concurrency and where durability is crucial, but it incurs some write overhead.
  • Shadow Paging can be efficient for read-intensive workloads but might be less suitable for write-intensive systems due to the overhead of creating and maintaining multiple copies of the database.

In practice, many modern database management systems use a combination of these techniques and other mechanisms to achieve a balance between concurrency, durability, and efficiency.

This post is licensed under CC BY 4.0 by the author.