-
Notifications
You must be signed in to change notification settings - Fork 503
Log Manager
The job of the LogManager
in the system is to receive buffers of LogRecord
from transactions and persist them in memory so that the changes can be recovered in the event of a crash. It does this by first serializing. The LogManager
does this by serializing the LogRecord
into a recoverable format (described below) into 4K byte buffers. These buffers are then written into the log file, and the log file is periodically persisted.
In the future, the LogManager
will also be responsible for shipping logs over the network to replicas.
The LogManager
has Start()
and StopAndPersist()
to bring it up and shut it down. Decoupling this logic from the constructor and destructor (respectively) allows one to start and stop a single log manager repeatedly without having to initialize a new LogManager
object. As the name suggests, StopAndPersist()
forces a processing of all unserialized logs, as well as persisting the log file on disk.
The main entry point for the LogManager
is through AddBufferToFlushQueue()
. Transactions will pass their (potentially partially) full buffers of LogRecord
through this method. After it is called, the LogManager
owns the buffers, and transactions should no longer reference these buffers.
The LogManager
also exposes a ForceFlush()
method for testing. This forces the disk writer task to persist the log file.
Finally, the bulk of the work of the LogManager is handled by two tasks:
These tasks are managed through the DedicatedThreadRegistry
.
The LogSerializerTask
is responsible for serializing the LogRecord
received from transactions into a format that can be processed by log consumers. These serialized records are written into buffers, which are handed over to consumers when full. We also extract the commit callbacks from commit records so that they can be called by the DiskLogConsumerTask
when it guarantees the commit log is persisted in memory. In the case of the DiskLogConsumerTask
, this means a format that can be written to disk, and then can be used to reconstruct the entire LogRecord
during recovery.
The LogSerializerTask
periodically serializes the logs by sleeping in a loop. The amount of time we sleep is a settings defined interval. The reason serialization is done in the background and not as soon as logs are received is to reduce the amount of time a transaction spends interacting with the log manager
A LogRecord
is serialized into the following recoverable format depending on the record type:
----------------------------------------------------------------------------------
| redo record size (32-bit) | record type (8-bit) | txn_begin_timestamp (64-bit) |
----------------------------------------------------------------------------------
| database_oid (32-bit) | table_oid (32-bit) | TupleSlot (64-bit) |
----------------------------------------------------------------------------------
| num_cols (16-bit) | col_id1 (32-bit) | col_id2 (32-bit) | ... |
----------------------------------------------------------------------------------
| null_bitmap (variable-bit) |
----------------------------------------------------------------------------------
| val1 | val2 | val3_varlen_size (32-bit) | val3_varlen_content | val4 | ... |
----------------------------------------------------------------------------------
Notes:
- The size of the serialization of
null_bitmap
is the "byte-alligned ceiling" ofnum_cols
. Ifnum_cols = 5
, thennull_bitmap
is 8-bit. Ifnum_cols = 32
, thennull_bitmap
is 32-bit, etc. - As can be seen in the last row, var-len's sizes and content are interleaved with non-varlen values. The size of non-varlen attributes is determined using the
storage::BlockLayout
. - NULL values are determined using the
null_bitmap
and are not serialized out.
----------------------------------------------------------------------------------
| redo record size (32-bit) | record type (8-bit) | txn_begin_timestamp (64-bit) |
----------------------------------------------------------------------------------
| database_oid (32-bit) | table_oid (32-bit) | TupleSlot (64-bit) |
----------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
| redo record size (32-bit) | record type (8-bit) | txn_begin_timestamp (64-bit) | commit_timestamp (64-bit) |
--------------------------------------------------------------------------------------------------------------
The job of the DiskLogConsumerTask
is to take logs serialized by the LogSerializerTask
and persist them on disk. It is important to differentiate between writing to the log file and persisting to disk. Whenever the task thread's wait is interrupted, it will always write the buffers to the log file, but it will only persist the log file if some condition is met. The task will interrupt its wait and write the buffers to the log file if any of the following conditions are met:
- Someone calls
ForceFlush()
on the log manager - We have received a new buffer through the consumer queue
- The task is being shut down
- The wait times out (see #1 below)
After writing the log buffers to the file, the task will then persist the file if any of the following conditions are met:
- A settings defined amount of time has passed (default: 20ms)
- A settings defined threshold of memory written has been written since the last persist (default: 1MB)
- Someone calls
ForceFlush()
on the log manager - The task is being shut down
To understand the flow of the log manager, below is an example of how a LogRecord
goes from a transaction to persisted on disk:
- The
LogManager
receives buffers containing records from transactions via theAddBufferToFlushQueue
method, and adds them to the serializer task's flush queue - The
LogSerializerTask
will periodically process and serialize buffers in its flush queue and hand them over to the consumer queue. - When a buffer of logs is handed over to a consumer, the consumer will wake up and process the logs. In the case of the
DiskLogConsumerTask
, this means writing it to the log file. - The
DiskLogConsumerTask
will persist the log file when the conditions mentioned in theDiskLogConsumerTask
section are met. - When the persist is done, the
DiskLogConsumerTask
will call the commit callbacks for any CommitRecords that were just persisted.
In the future, the LogManager
will also be in-charge of sending logs over the network to replicas. This will most likely entail having a new ReplicaLogConsumerTask
that mirrors the DiskLogConsumerTask
, except instead of writing to disk, it will send the logs over network.
Carnegie Mellon Database Group Website