-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
MDEV-36301 SET GLOBAL innodb_log_file_disabled, innodb_log_group_home_dir #4014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
3459e02
to
2cc94dc
Compare
5fc1dba
to
c4095de
Compare
I am seeing a crash here is the stacktrace:- Assertion:- mysqld: /home/saahil/MDEV-36301/storage/innobase/log/log0log.cc:743: log_t::resize_start_status log_t::resize_start(os_offset_t, const char *, void *): Assertion `disabled == (writer == skip_write_buf)' failed.
Attaching mysql.log file and rr trace path /home/saahil/.local/share/rr/mysqld-0 sofia machine- 172.20.0.163 |
Incorrect inputs getting executed for the variable :- MariaDB [test]> set global innodb_log_file_disabled =ON , innodb_log_file_disabled =-0; |
created a jira bug ticket: https://jira.mariadb.org/browse/MDEV-36803 |
This is something in the option parsing code outside InnoDB. I assume that -0 would be interpreted as |
storage/innobase/srv/srv0start.cc
Outdated
bool log_t::resize_rename() noexcept | ||
{ | ||
std::string old_name{get_log_file_path("ib_logfile101")}; | ||
std::string old_name{get_log_file_path("ib_logfile101", log_sys.resize_dir)}; | ||
std::string new_name{get_log_file_path()}; | ||
|
||
if (IF_WIN(MoveFileEx(old_name.c_str(), new_name.c_str(), | ||
MOVEFILE_REPLACE_EXISTING), | ||
!rename(old_name.c_str(), new_name.c_str()))) | ||
return false; | ||
|
||
const int err= IF_WIN(int(GetLastError()), errno); | ||
std::string &remove= err == IF_WIN(ERROR_NOT_SAME_DEVICE, EXDEV) | ||
? old_name : new_name; | ||
IF_WIN(DeleteFile(remove.c_str()), unlink(remove.c_str())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized SET GLOBAL innodb_log_group_home_dir
must actually retain new_name
in log_sys.resize_dir
. Attempting the atomic rename is fine, but the subsequent steps need to be different:
- Whether or not the atomic renaming succeeded, rename the file further to
ib_logfile0
inlog_sys.resize_dir
if it is notnullptr
. - If the initial atomic renaming failed with
errno==EXDEV
and the subsequent renaming succeeded, we will have twoib_logfile0
, both of which are valid for recovery. Delete the originalib_logfile0
and declare success.
9170459
to
9ba9164
Compare
db846e8
to
1700b6f
Compare
The parameter innodb_log_spin_wait_delay will be deprecated and ignored, because there is no spin loop anymore. Thanks to commit 685d958 and commit a635c40 multiple mtr_t::commit() can concurrently copy their slice of mtr_t::m_log to the shared log_sys.buf. Each writer would allocate their own log sequence number by invoking log_t::append_prepare() while holding a shared log_sys.latch. This function was too heavy, because it would invoke a minimum of 4 atomic read-modify-write operations as well as system calls in the supposedly fast code path. It turns out that with a simpler data structure, instead of having several data fields that needed to be kept consistent with each other, we only need one Atomic_relaxed<uint64_t> write_lsn_offset, on which we can operate using fetch_add(), fetch_sub() as well as a single-bit fetch_or(), which reasonably modern compilers (GCC 7, Clang 15 or later) can translate into loop-free code on AMD64. Before anything can be written to the log, log_sys.clear_mmap() must be invoked. log_t::base_lsn: The LSN of the last write_buf() or persist(). This is a rough approximation of log_sys.lsn, which will be removed. log_t::write_lsn_offset: An Atomic_relaxed<uint64_t> that buffers updates of write_to_buf and base_lsn. log_t::buf_free, log_t::max_buf_free, log_t::lsn. Remove. Replaced by base_lsn and write_lsn_offset. log_t::buf_size: Always reflects the usable size in append_prepare(). log_t::lsn_lock: Remove. For the memory-mapped log in resize_write(), there will be a resize_wrap_mutex. log_t::get_lsn_approx(): Return a lower bound of get_lsn(). This should be exact unless append_prepare_wait() is pending. log_get_lsn(): A wrapper for log_sys.get_lsn(), which must be invoked while holding an exclusive log_sys.latch. recv_recovery_from_checkpoint_start(): Do not invoke fil_names_clear(); it would seem to be unnecessary. In many places, references to log_sys.get_lsn() are replaced with log_sys.get_flushed_lsn(), which remains a simple std::atomic::load(). Reviewed by: Debarun Banerjee (cherry picked from commit acd071f)
log_t::append_prepare_wait(): Do not attempt to read log_sys.write_lsn because it is not protected by log_sys.latch but by write_lock, which we cannot hold here. The assertion could fail if log_t::write_buf() is executing concurrently, and it has not yet executed log_write_buf() or updated log_sys.write_lsn. Fixes up commit acd071f (MDEV-21923) (cherry picked from commit 84dd243)
Assertion found on RQG testing InnoDB: Assertion failure in file /data/Server/MDEV-36301A/storage/innobase/buf/buf0buf.cc line 1192
RR trace is present on SDP machine:- /data/results/1747808608/TBR-2280_rr/ |
Hitting this Assertion on latest commit :-origin/MDEV-36301 8ce4b1a 2025-05-22T10:33:56+03:00
RR trace is present on SDP machine- /data/results/1748001679/TBR-2277_1 |
4560600
to
2ca4fef
Compare
|
|
|
I filed MDEV-36886 for this regression of #3925. |
At this point, the memory-mapped log is being re-enabled, but it is currently disabled and we can’t trust |
innodb_log_file_disabled: A new Boolean settable global parameter, for disabling the InnoDB redo log. When set, the server is not crash safe. innodb_log_group_home_dir: Allow the value to be changed with SET GLOBAL, as long as the log remains in the same file system or innodb_log_file_disabled=ON was set. innodb_log_update(): A common function for implementing SET GLOBAL innodb_log_file_size, innodb_log_file_disabled, innodb_log_checkpoint_now, innodb_log_group_home_dir. log_sys.buf_size_requested: The configured innodb_log_buffer_size. While the log is disabled, we may set log_sys.buf_size (the actual size of log_sys.buf) differently. log_sys.disabled: The current setting of innodb_log_file_disabled. log_t::append_prepare(): Instead of referring to file_size or capacity() for mmap=true, always refer to buf_size. When log_sys.disabled holds, log_sys.buf may be much smaller than log_sys.file_size. Its size is always reflected by log_sys.buf_size. log_t::attach(): Handle log_sys.disabled. log_t::disable(): Implements SET innodb_log_file_disabled=ON. Even if the log used to be in persistent memory, here we will set up dummy log_sys.buf and log_sys.flush_buf so that the dummy writes will appear to use file based writes. log_t::skip_write_buf(): A dummy log writer implementation that is used when log_sys.disabled holds and the log is not being resized. log_t::resize_abort(): When the log remains disabled, "persist" all the log and advance the group_lock and flush_lock to the current LSN, just like log_t::disable() does. log_t::resize_start(): Handle log_sys.disabled, that is, SET GLOBAL innodb_log_file_disabled=OFF when the redo log had previously been disabled. If we are on persistent memory, we will "fake" the dummy log_sys.buf to appear as memory-mapped as well, so that log_t::resize_write() and log_t::resize_write_buf() can assume that both log files are of the same type (memory-mapped or file-based). The dummy log_sys.flush_buf will be stored in log_sys.resize_flush_buf in that case. When moving from memory-mapped to a regular log file, we will allocate log_sys.checkpoint_buf. It will be freed in log_t::close(). log_t::resize_rename(): When innodb_log_group_home_dir is changed between file systems, handle the non-atomic replacement of the log file in a special way. For a moment, a recoverable ib_logfile0 will exist in both locations. log_t::resize_write(): Detect memory-mapped log by !resize_log.is_opened(). If the memory-mapped log is being re-enabled, log_sys.resize_flush_buf may temporarily store the value of a dummy log_sys.flush_buf. During any log resizing or disabling or enabling, both buf and resize_buf must appear to be either file-based or memory-mapped. log_t::write_buf(), log_write_up_to(): Handle the special case that disable() executed or resize_start() started re-enabling the log while we were waiting for flush_lock or write_lock. In the latter case, log_sys.writer will be changed from log_t::skip_write_buf to log_writer_resizing during the execution of log_write_up_to(). log_t::persist(): Skip the writes if the log is disabled, that is, a memory-mapped log is in the process being re-enabled. In this case, we cannot trust log_sys.file_size, and the pmem_persist() could be invoked on an invalid address range. log_resize_acquire_group(): Acquier the group locks for write and flush. log_resize_release_group(): Release the group locks for write and flush. log_resize_acquire(): Return whether the group locks were elided. log_write_and_flush_prepare(), log_write_and_flush(): Protect also the log_sys.is_mmap() case with write_lock and flush_lock, in order to prevent a race condition between mtr_t::commit_file() and log_t::disable(). log_t::persist(): Remove the assertions about not holding write_lock or flush_lock. We will hold them during DDL operations.
Core dump is present on SDP machine |
We have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find my initial comments on the feature/testcase.
# restart: --innodb-force-recovery=6 | ||
SET GLOBAL innodb_log_checkpoint_now=1; | ||
Warnings: | ||
Warning 138 InnoDB doesn't force checkpoint when innodb-force-recovery=6. | ||
# restart: --innodb-read-only=1 | ||
SET GLOBAL innodb_log_checkpoint_now=1; | ||
Warnings: | ||
Warning 138 InnoDB doesn't force checkpoint when innodb-read-only=1. | ||
# restart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warnings seem useful. Should we remove them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warnings are not being localized via errmsg-utf8.txt
; they are always in English. I wanted to make innodb_log_checkpoint_now
to be able to use the same interface once MDEV-36828 has been fixed. The changes related to innodb_log_checkpoint_now
can surely be reverted.
SET GLOBAL innodb_log_file_disabled=ON; | ||
ALTER TABLE t ADD c2 CHAR FIRST; | ||
INSERT INTO t(a) SELECT * from seq_1_to_10000; | ||
SET STATEMENT max_statement_time=1e-6 FOR | ||
SET GLOBAL innodb_log_file_disabled=OFF; | ||
DROP TABLE t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we crash the server, restart and do select count(*) ? The intent is to check that the data is recovered properly after innodb_log_file_disabled is set to OFF.
- We should also test the case of a crash when innodb_log_file_disabled is ON and on restart. I think there should be an error message and server should refuse normal startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sure, this is worth adding.
- At the start of the test, the
eval $check_no_innodb
that InnoDB will refuse to start if it had been shut down or killed whileinnodb_log_file_disabled=ON
was in effect. It does not matter which one.
What we could add is a check that a startup with innodb_force_recovery=6
is possible. Executing that after a server kill would lead to nondeterministic results, that is, sometimes the database might appear OK, sometimes not. But hopefully it should not crash, no matter what. For example, the root page could be pointing to child pages that had not been written yet, or the sibling links between pages could be broken, if only some of the pages had been written from the buffer pool. This would be interesting and worthwhile to test.
--source include/restart_mysqld.inc | ||
SELECT * FROM t; | ||
DROP TABLE t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also specify and test the scenario of default and slow shutdown (innodb_fast_shutdown=0,1) when innodb_log_file_disabled=ON. Do we expect it to work ? In MySQL it works as all pages are flushed and no redo recovery is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work, and the test step is worth adding. Due to b07920b, InnoDB would only start up with innodb_force_recovery=6
, which makes it read-only. We could check the consistency of the data with CHECK TABLE
and SELECT
and expect it to be correct after the server was shut down, as long as innodb_fast_shutdown=2
is not being used.
SHOW VARIABLES LIKE 'innodb_log_file_size'; | ||
Variable_name Value | ||
innodb_log_file_size 5242880 | ||
SET GLOBAL innodb_log_file_size=6291456; | ||
SET GLOBAL innodb_log_file_disabled=ON; | ||
SET GLOBAL innodb_log_group_home_dir='log'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment here to capture our discussion. Changing innodb_log_group_home_dir
dynamically would cause the server restart to fail unless the configuration file is also changed by user. It could be an hindrance to usability making it virtually unusable unless the configuration change is temporarily done (and reverted back subsequently) for some testing purpose. It should be documented well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right. I edited the description of MDEV-36301 to try to make this clearer.
@@ -0,0 +1 @@ | |||
--skip-innodb-undo-tablespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to add a comment on why we need to disable undo tablespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. The reason is that this test is bootstrapping a new InnoDB data directory, and it assumes that we have innodb_undo_tablespaces=0
. I originally developed this test for the 10.11 branch where that is the default setting. I didn’t investigate how to make this work with the new default innodb_undo_tablespaces=3
, because I thought that with fewer files it is simpler.
SET GLOBAL innodb_log_file_disabled=ON; | ||
--list_files $datadir ib_logfile* | ||
--list_files $datadir/log ib_logfile* | ||
SET GLOBAL innodb_log_group_home_dir='log'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a concurrent mariabackup is copying the redo log files ? Are we synchronizing with backup locks. Let's define the behaviour and test.
This comment is relevant for both innodb_log_file_disabled
and innodb_log_group_home_dir
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When mariadb-backup --backup
is running concurrently with any change of log related parameters, it will continue to monitor the old ib_logfile0
that it opened when the backup was started. That file will stop to receive any updates as soon as the checkpoint completes and the ib_logfile101
is supposed to take over. So, mariadb-backup --backup
will basically hang. This problem exists ever since 177345d was implemented. As far as I can tell, the problem could only be addressed by making the mariadbd
process responsible for copying (or replicating) the log when a backup is running.
For SET GLOBAL innodb_log_file_disabled=ON
, the trouble starts immediately: log_t::disable()
will unlink the ib_logfile0
and redirect log_sys.writer
to the dummy function. Any processes that are holding an open file handle to the old log file will be able to read it, but that file will cease to receive any updates.
|
|
|
Description
innodb_log_file_disabled
: A new Boolean settable global parameter, for disabling the InnoDB redo log. When set, the server is not crash safe.innodb_log_group_home_dir
: Allow the value to be changed withSET GLOBAL
, as long as the log remains in the same file system orinnodb_log_file_disabled=ON
was set.innodb_log_update()
: A common function for implementingSET GLOBAL innodb_log_file_size, innodb_log_file_disabled, innodb_log_checkpoint_now, innodb_log_group_home_dir
.log_sys.buf_size_requested
: The configuredinnodb_log_buffer_size
. While the log is disabled, we may setlog_sys.buf_size
(the actual size) differently.log_sys.disabled
: The current setting ofinnodb_log_file_disabled
.log_t::attach()
: Handlelog_sys.disabled
.log_t::disable()
: ImplementsSET innodb_log_file_disabled=ON
. Even if the log used to be in persistent memory, here we will set up dummylog_sys.buf
andlog_sys.flush_buf
so that the dummy writes will appear to use file based writes.log_t::skip_write_buf()
: A dummy log writer implementation that is used whenlog_sys.disabled
holds and the log is not being resized.log_t::resize_abort()
: When the log remains disabled, "persist" all the log, just likelog_t::disable()
does.log_t::resize_start()
: Handlelog_sys.disabled
, that is,SET GLOBAL innodb_log_file_disabled=OFF
when the redo log had previously been disabled. If we are on persistent memory, we will "fake" the dummylog_sys.buf
to appear as memory-mapped as well, so thatlog_t::resize_write()
andlog_t::resize_write_buf()
can assume that both log files are of the same type (memory-mapped or file-based). The dummylog_sys.flush_buf
will be stored inlog_sys.resize_flush_buf
in that case. When moving from memory-mapped to a regular log file, we will allocatelog_sys.checkpoint_buf
. It will be freed inlog_t::close()
.log_t::resize_rename()
: Wheninnodb_log_group_home_dir
is changed between file systems, handle the non-atomic replacement of the log file in a special way. For a moment, a recoverableib_logfile0
will exist in both locations.log_resize_release_group()
: Release the group write and flush locks.log_resize_acquire()
: Return whether the group locks were elided.How can this PR be tested?
See the modified test cases. Additionally,
SET GLOBAL innodb_log_group_home_dir
was tested between file systems.Basing the PR against the correct MariaDB version
main
branch.For now, this includes the dependencies #3925 and #4038, because they have not been merged into the
main
branch yet.PR quality check