Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unnecessary header overwrites #253

Open
wants to merge 187 commits into
base: REL_14_STABLE_neon
Choose a base branch
from
Open

Commits on Nov 21, 2022

  1. [smgr_api] [community] smgr_api.patch

    Make smgr API pluggable. Add smgr_hook that can be used to define custom smgrs.
    Remove smgrsw[] array and smgr_sw selector. Instead, smgropen() loads
    f_smgr implementation using smgr_hook.
    
    Also add smgr_init_hook and smgr_shutdown_hook.
    And a lot of mechanical changes in smgr.c functions.
    
    This patch is proposed to community: https://commitfest.postgresql.org/33/3216/
    
    Author: anastasia <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    e53d24b View commit details
    Browse the repository at this point in the history
  2. [contrib/zenith] contrib_zenith.patch

    Add contrib/zenith that handles interaction with remote pagestore.
    To use it add 'shared_preload_library = zenith' to postgresql.conf.
    
    It adds a protocol for network communications - see libpagestore.c;
    and implements smgr API.
    
    Also it adds several custom GUC variables:
    - zenith.page_server_connstring
    - zenith.callmemaybe_connstring
    - zenith.zenith_timeline
    - zenith.wal_redo
    
    Authors:
    Stas Kelvich <[email protected]>
    Konstantin Knizhnik <[email protected]>
    Heikki Linnakangas <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    3bfbae8 View commit details
    Browse the repository at this point in the history
  3. [walredo] zenith_wal_redo.patch

    Add WAL redo helper for zenith - alternative postgres operation mode to replay wal by pageserver request.
    
    To start postgres in wal-redo mode, run postgres with --wal-redo option
    It requires zenith shared library and zenith.wal_redo
    
    Author: Heikki Linnakangas <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2f7e01f View commit details
    Browse the repository at this point in the history
  4. lastWrittenPageLSN.patch

    Save lastWrittenPageLSN in XLogCtlData to know what pages to request from remote pageserver.
    
    Authors:
    Konstantin Knizhnik <[email protected]>
    Heikki Linnakangas <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    3200b62 View commit details
    Browse the repository at this point in the history
  5. Fix GetPage requests right after replaying CREATE DATABASE

    In the test_createdb test, we created a new database, and created a new
    branch after that. I was seeing the test fail with:
    
        PANIC:  could not open critical system index 2662
    
    The WAL contained records like this:
    
        rmgr: XLOG        len (rec/tot):     49/  8241, tx:          0, lsn: 0/0163E8F0, prev 0/0163C8A0, desc: FPI , blkref #0: rel 1663/12985/1249 fork fsm blk 1 FPW
        rmgr: XLOG        len (rec/tot):     49/  8241, tx:          0, lsn: 0/01640940, prev 0/0163E8F0, desc: FPI , blkref #0: rel 1663/12985/1249 fork fsm blk 2 FPW
        rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn: 0/01642990, prev 0/01640940, desc: RUNNING_XACTS nextXid 541 latestCompletedXid 539 oldestRunningXid 540; 1 xacts: 540
        rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/016429C8, prev 0/01642990, desc: CHECKPOINT_ONLINE redo 0/163C8A0; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 540; online
        rmgr: Database    len (rec/tot):     42/    42, tx:        540, lsn: 0/01642A40, prev 0/016429C8, desc: CREATE copy dir 1663/1 to 1663/16390
        rmgr: Standby     len (rec/tot):     54/    54, tx:          0, lsn: 0/01642A70, prev 0/01642A40, desc: RUNNING_XACTS nextXid 541 latestCompletedXid 539 oldestRunningXid 540; 1 xacts: 540
        rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01642AA8, prev 0/01642A70, desc: CHECKPOINT_ONLINE redo 0/1642A70; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 540; online
        rmgr: Transaction len (rec/tot):     66/    66, tx:        540, lsn: 0/01642B20, prev 0/01642AA8, desc: COMMIT 2021-05-21 15:55:46.363728 EEST; inval msgs: catcache 21; sync
        rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/01642B68, prev 0/01642B20, desc: CHECKPOINT_SHUTDOWN redo 0/1642B68; tli 1; prev tli 1; fpw true; xid 0:541; oid 24576; multi 1; offset 0; oldest xid 532 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
    
    The compute node had correctly replayed all the WAL up to the last
    record, and opened up. But when you tried to connect to the new
    database, the very first requests for the critical relations, like
    pg_class, were made with request LSN 0/01642990. That's the last
    record that's applicable to a particular block. Because the database
    CREATE record didn't bump up the "last written LSN", the getpage
    requests were made with too old LSN.
    
    I fixed this by adding a SetLastWrittenLSN() call to the redo of
    database CREATE record. It probably wouldn't hurt to also throw in a
    call at the end of WAL replay, but let's see if we bump into more
    cases like this first.
    
    This doesn't seem to be happening with page server as of 'main'; I was
    testing with a version where I had temporarily reverted all the recent
    changes to reconstruct control file, checkpoints, relmapper files
    etc. from the WAL records in the page server, so that the compute node
    was redoing all the WAL. I'm pretty sure we need this fix even with
    'main', even though this test case wasn't failing there right now.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    1f4e2b5 View commit details
    Browse the repository at this point in the history
  6. handle_eviction_of_non_wal_logged_pages.patch

    Some operations in PostgreSQL are not WAL-logged at all (i.e. hint bits)
    or delay wal-logging till the end of operation (i.e. index build).
    So if such page is evicted, we will lose the update.
    
    To fix it, we introduce PD_WAL_LOGGED bit to track whether the page was wal-logged.
    If the page is evicted before it has been wal-logged, then zenith smgr creates FPI for it.
    
    Authors:
    Konstantin Knizhnik <[email protected]>
    anastasia <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f1cd5c6 View commit details
    Browse the repository at this point in the history
  7. [walproposer] wal_proposer.patch

    Add WalProposer background worker to broadcast WAL stream to Zenith WAL acceptors
    
    Author: Konstantin Knizhnik <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    78c598d View commit details
    Browse the repository at this point in the history
  8. persist_unlogged_tables.patch

    Ignore unlogged table qualifier. Add respective changes to regression test outputs.
    
    Author: Konstantin Knizhnik <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    dad11c7 View commit details
    Browse the repository at this point in the history
  9. fix_pg_table_size.patch

    Request relation size via smgr function, not just stat(filepath).
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    79457f9 View commit details
    Browse the repository at this point in the history
  10. [walredo] fix_gin_redo.patch

    Author: Konstantin Knizhnik <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0d37741 View commit details
    Browse the repository at this point in the history
  11. [walredo] fix_brin_redo.patch

    Author: Konstantin Knizhnik <[email protected]>
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6e94b78 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    3f64084 View commit details
    Browse the repository at this point in the history
  13. wallog_t_ctid.patch

    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a71dc6f View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    2c8cb7c View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    7677d32 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    8184a39 View commit details
    Browse the repository at this point in the history
  17. Bring back change that got lost in refactoring. silence ReadBuffer_co…

    …mmon error. TODO: add a comment, why this is fine for zenith.
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    da19cde View commit details
    Browse the repository at this point in the history
  18. [contrib/zenith] [refer #225] if insert WAL position points at the en…

    …d of WAL page header, then return it back to the page origin
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    34e7d89 View commit details
    Browse the repository at this point in the history
  19. [walproposer] Create replication slot for walproposer to avoid loose …

    …of WAL at compute node
    
    + Check for presence of replication slot
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0be1b1b View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    fec5731 View commit details
    Browse the repository at this point in the history
  21. [walproposer] Fix breaking out of WalProposerPoll and WaitEventSetWai…

    …t inside.
    
    WAL proposer (as bgw without BGWORKER_BACKEND_DATABASE_CONNECTION) previously
    ignored SetLatch, so once caught up it stuck inside WalProposerPoll infinitely.
    
    Futher, WaitEventSetWait didn't have timeout, so we didn't try to reconnect if
    all connections are dead as well. Fix that.
    
    Also move break on latch set to the end of the loop to attempt
    ReconnectWalKeepers even if latch is constantly set.
    
    Per test_race_conditions (Python version now).
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2488a80 View commit details
    Browse the repository at this point in the history
  22. [walproposer] Make it possible to start postgres without reading chec…

    …kpoint from WAL
    
    + Check for presence of zenith.signal file to allow skip reading checkpoint record from WAL
    
    + Pass prev_record_ptr through zenith.signal file to postgres
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5dc9a35 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    822921a View commit details
    Browse the repository at this point in the history
  24. [walredo] Add basic support for Seccomp BPF mode

    This patch aims to make our bespoke WAL redo machinery more robust
    in the presence of untrusted (in other words, possibly malicious) inputs.
    
    Pageserver delegates complex WAL decoding duties to postgres,
    which means that the latter might fall victim to carefully designed
    malicious WAL records and start doing harmful things to the system.
    To prevent this, it has been decided to limit possible interactions
    with the outside world using the Secure Computing BPF mode.
    
    We use this mode to disable all syscalls not in the allowlist.
    Please refer to src/backend/postmaster/seccomp.c to learn more
    about the pros & cons of the current approach.
    
    + Fix some bugs in seccomp bpf wrapper
    
    * Use SCMP_ACT_TRAP instead of SCMP_ACT_KILL_PROCESS to receive signals.
    * Add a missing variant of select() syscall (thx to @knizhnik).
    * Write error messages to an fd stderr's currently pointing to.
    funbringer authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c5783b9 View commit details
    Browse the repository at this point in the history
  25. [smgr_api] [contrib/zenith] 1. Do not call mdinit from smgrinit() bec…

    …ause it cause memory leak in wal-redo-postgres
    
    2. Add check for local relations to make it possible to use DEBUG_COMPARE_LOCAL mode in SMGR
    
    + Call smgr_init_standard from smgr_init_zenith
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6b975a6 View commit details
    Browse the repository at this point in the history
  26. [walproposer] [contrib/zenith] support zenith_tenant

    this patch adds support for zenith_tenant variable. it has similar
    format as zenith_timeline. It is used in callmemaybe query to pass
    tenant to pageserver and in ServerInfo structure passed to wal acceptor
    LizardWizzard authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5ad2aab View commit details
    Browse the repository at this point in the history
  27. [walproposer] Remove graceful termination of COPY during walproposer …

    …recovery.
    
    Rust's postgres_backend currently is too dummy to handle it properly: reading
    happens in separate thread which just ignores CopyDone. Instead, writer thread
    must get aware of termination and send CommandComplete. Also reading socket must
    be transferred back to postgres_backend (or connection terminated completely
    after COPY). Let's do that after more basic safkeeper refactoring and right now
    cover this up to make tests pass.
    
    ref #388
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2d4ad90 View commit details
    Browse the repository at this point in the history
  28. [walproposer] [contrib/zenith] [refer #395] Do no align sart replicat…

    …ion position in wal_proppser to segment boundary
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2c5dbdf View commit details
    Browse the repository at this point in the history
  29. [test] Add contrib/zenith_test_utils with helpers for testing and deb…

    …ugging.
    
    Now it contains only one function test_consume_xids() for xid wraparound testing.
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2f693e6 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    d5b54a7 View commit details
    Browse the repository at this point in the history
  31. [contrib/zenith] Use authentication token passed as environment varia…

    …ble in connections
    
    to pageserver. Token is passed as cleartext password.
    LizardWizzard authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    26b19c0 View commit details
    Browse the repository at this point in the history
  32. [contrib/zenith] Fix race condition while WAL-logging page, leading t…

    …o CRC errors.
    
    zenith_wallog_page() would call log_newpage() on a buffer, while holding
    merely a shared lock on the page. That's not cool, because another backend
    could modify the page concurrently. We allow changing hint bits while
    holding only a shared lock, and changes on FSM pages, at least. See comments
    in XLogSaveBufferForHint() for discussion of this problem.
    
    One instance of the race condition that I was able to capture on my laptop
    happened like this:
    
    1. Backend A: needs to evict an FSM page from the buffer cache to make
       room for a new page, and calls zenith_wallog_page() on it. That is
       done while holding a share lock on the page.
    
    2. Backend A: XLogInsertRecord() computes the CRC of the FPI WAL record
       including the FSM page
    
    3. Backend B: Updates the same FSM page while holding only a share lock
    
    4. Backend A: Allocates space in the WAL buffers, and copies the WAL
       record header and the page to the buffers.
    
    At this point, the CRC that backend A computed earlier doesn't match the
    contents that were written out to the WAL buffers.
    
    The update of the FSM page in backend B happened from there (fsmpage.c):
    
    	/*
    	 * Update the next-target pointer. Note that we do this even if we're only
    	 * holding a shared lock, on the grounds that it's better to use a shared
    	 * lock and get a garbled next pointer every now and then, than take the
    	 * concurrency hit of an exclusive lock.
    	 *
    	 * Wrap-around is handled at the beginning of this function.
    	 */
    	fsmpage->fp_next_slot = slot + (advancenext ? 1 : 0);
    
    To fix, make a temporary copy of the page in zenith_wallog_page(), and
    WAL-log that. Just like XLogSaveBufferForHint() does.
    
    Fixes neondatabase/neon#413
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6116229 View commit details
    Browse the repository at this point in the history
  33. [walproposer] Rework walkeeper protocol to use libpq (#60)

    The majority of work here is going to be heavily cleaned up soon, but
    it's worth giving a brief overview of the changes either way.
    
    * Adds libpqwalproposer, serving a similar function to the existing
      libpqwalreceiver -- to provide access to libpq functions without
      causing problems from directly linking them.
    
    * Adds two new state components, giving (a) the type of libpq-specific
      polling required to move on to the next protocol state and (b) the
      kind of socket events it's waiting on. (These are expected to be
      removed or heavily reworked soon.)
    
    * Changes `WalProposerPoll` to make use of a slightly more specialized
      `AdvancePollState`, which has been completely reworked.
    sharnoff authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    ee4f564 View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    14996ae View commit details
    Browse the repository at this point in the history
  35. zenith_regression_tests.patch

    Add alternative output for tablespace test, because tablespaces are not supported in zenith yet
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    8d0ab18 View commit details
    Browse the repository at this point in the history
  36. Configuration menu
    Copy the full SHA
    2e3ae85 View commit details
    Browse the repository at this point in the history
  37. Basic safekeeper refactoring and bug fixing.

    On the walproposer side,
    
    - Change the voting flow so that acceptor tells his epoch along with giving
      the vote, not before it; otherwise it might get immediately stale. #294
    - Adjust to using separate structs for disk and network.
    
    ref #315
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    de54e5a View commit details
    Browse the repository at this point in the history
  38. Rename VCL to epochStartLsn and restart_lsn to truncate_lsn.

    epochStartLsn is the LSN since which new proposer writes its WAL in its epoch,
    let's be more explicit here.
    
    In several places it also actually meant something we call *commit_lsn* -- the
    latest lsn known to be reliably commited (it constantly moves within one wal
    proposer).
    
    truncate_lsn is LSN still needed by the most lagging safekeeper. restart_lsn is
    terminology from pg_replicaton_slots, but here we don't really have 'restart';
    hopefully truncate word makes it clearer.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c6c8e33 View commit details
    Browse the repository at this point in the history
  39. [refer #27] Implement shared relsize cache to improve zenith performa…

    …nce.
    
    Cache relfilenode size returned by zenith_nblocks() and also update it when relation is extended.
    Don't update it from zenith_write() or zenith_wallog_page(), since there is no guarantee that these functions wouldn't be called for some page that is not the last one
    
    It can be configured with zenith.relsize_hash_size GUC parameter.
    Set it to 0 to disable caching.
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    9985388 View commit details
    Browse the repository at this point in the history
  40. Cleanup walproposer changes from #60

    Closes #66. Mostly corresponds to cleaning up the states we store. Goes
    back to single states for each WalKeeper, and we perform blocking writes
    for everything but sending the WAL itself.
    
    A few things have been factored out into libpqwalproposer for
    simplicity - like handling the nonblocking status of the connection
    (even though it's only changed once).
    sharnoff authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6862e55 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    951bc22 View commit details
    Browse the repository at this point in the history
  42. Ask pageserver only with LSN's aligned on record boundary.

    Now pageserver tracks only last_record_lsn and ignores
    last_valids_lsn. We can cause deadlock at start or extreme slowness
    during the normal work if we call get_page with LSN of incomplete
    record.
    
    Patch by @knizhnik
    kelvich authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    686272d View commit details
    Browse the repository at this point in the history
  43. [refer #506] Correctly initialize all fields of WAL page header for f…

    …irst WAL record of started compute node
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    999db4a View commit details
    Browse the repository at this point in the history
  44. Add --sync-safekeepers starting standalone walproposer to sync safeke…

    …epers (#439).
    
    It is intended to solve the following problems:
    
    a) Chicken-or-the-egg one: compute postgres needs data directory
       with non-rel files that are downloaded from pageserver by calling
       basebackup@LSN. This LSN is not arbitrary, it must include all
       previously committed transactions and defined through consensus
       voting, which happens... in walproposer, a part of compute node.
    
    b) Just warranting such LSN is not enough, we must also actually commit
       it and make sure there is a safekeeper who knows this LSN is
       committed so WAL before it can be streamed to pageserver -- otherwise
       basebackup will hang waiting for WAL. Advancing commit_lsn without
       playing consensus game is impossible, so speculative 'let's just poll
       safekeepers, learn start LSN of future epoch and run basebackup'
       won't work.
    
    Currently --sync-safekeepers is considered completed when 1) at least majority
    of safekeepers and 2) *all* safekeepers with live connection to walproposer
    switch to new epoch and advance commit_lsn allowing basebackup to proceed. 2)
    limits availablity, but that's because currently we don't have a mechanism
    defining which safekeeper should stream WAL into pageserver.
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5bed2be View commit details
    Browse the repository at this point in the history
  45. Update Dockerfile

    ololobus authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    610efb1 View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    d4b420d View commit details
    Browse the repository at this point in the history
  47. Always advance truncateLsn to commitLsn, keeping it on record boundary.

    And take initial value from freshly created slot position. Thus proposer always
    starts streaming from the record beginning; it simplifies WAL decoding on
    safekeeper.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    12bf0bf View commit details
    Browse the repository at this point in the history
  48. Minor logging editing.

    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    83654ed View commit details
    Browse the repository at this point in the history
  49. Fix walproposer starting streaming point.

    Send *all* entries (from the beginning, i.e. truncateLsn) to everyone but donor
    who doesn't need recovery at all and will receive only new entries. This can be
    optimized to avoid sending data which is already persisted (and correct), but
    previous such optimization was incorrect.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    d219291 View commit details
    Browse the repository at this point in the history
  50. Mark all recovery messages as received by the donor.

    I forgot to do that in 42316a8. Fixes segfault related to attempt to send the
    (garbage collected) message second time and queue advancement when donor doesn't
    restart.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6ba6889 View commit details
    Browse the repository at this point in the history
  51. Configuration menu
    Copy the full SHA
    52ccea9 View commit details
    Browse the repository at this point in the history
  52. Optimize walproposer starting streaming point.

    Safekeepers who are in the same epoch as donor definitely have correct WAL, so
    we can send to them since their flushLsn. This required some additionall fuss
    due to convention of always starting streaming at the record boundary.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    91392d3 View commit details
    Browse the repository at this point in the history
  53. Silence compiler warnings:

        contrib/zenith/libpagestore.c: In function ‘zenith_connect’:
        contrib/zenith/libpagestore.c:125:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
          125 |  const char **keywords = malloc((noptions + 1) * sizeof(*keywords));
              |  ^~~~~
    
        src/backend/tcop/zenith_wal_redo.c:294:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
          294 |  bool enable_seccomp = true;
              |  ^~~~
    
    In the passing, also move the 'n_synced' local variable closer to where
    it's used.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    1143ea3 View commit details
    Browse the repository at this point in the history
  54. Remove unused functions for reading non-rel pages.

    These could be used to fetch SLRUs and other non-relation things from the
    page server. But we don't do that, and have no plans in the near future.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    41e00a9 View commit details
    Browse the repository at this point in the history
  55. Misc cleanup in the code that communicates with the page server.

    - Remove unused 'system_id' field from ZenithRequest.
    - Remove unused 'loaded' variable.
    - Remove unused to pack pageserver->client messages, and to unpack
      client->pageserver messages.
    - Fix printing the response in debug message (was printing the request
      twice)
    - Avoid the overhead of converting request/response to string, unless
      the debug message is really going to be printed
    - Formatting fixes.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f5d1d4d View commit details
    Browse the repository at this point in the history
  56. Improve the protocol between Postgres and page server.

    - Use different message formats for different kinds of response messages.
    - Add an Error response message, for passing errors from page server to
      Postgres. An Error response now results in an ereport(ERROR
    - Add a flag to requests, to indicate that we actually want the latest
      page version on the timeline, and the LSN is just a hint that we know
      that there haven't been any modifications since that LSN. It is currently
      always set to 'true', but once we start supporting read-only replicas,
      they would set it to false.
    
    This changes the network postgres<->page server protocol, so this needs
    corresponding changes in the page server side
    
    Also refactor and fix the zm_to_string() function. The ZenithMessageStr
    array was broken, because the array indices didn't match the
    ZenithMessageTag enum values.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    ae2c2d2 View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    aa7dba9 View commit details
    Browse the repository at this point in the history
  58. Fix a badly worded comment

    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    e21c7f8 View commit details
    Browse the repository at this point in the history
  59. Configuration menu
    Copy the full SHA
    517d219 View commit details
    Browse the repository at this point in the history
  60. Catch walkeeper ErrorResponse in PQgetCopyData

    PQgetCopyData can sometimes indicate that the copy is done if the
    backend returns an error response. So while we still expect that the
    walkeeper never sends CopyDone, we can't expect it to never produce
    errors.
    sharnoff authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    770546f View commit details
    Browse the repository at this point in the history
  61. Use buffered I/O for reading commands from stdin.

    Whatever the bug mentioned in the FIXME comment was with buffered I/O,
    it has been fixed now. This greatly reduces the amount of CPU time spent
    in WAL redo.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    207ebd6 View commit details
    Browse the repository at this point in the history
  62. Replace fread() with plain read() and a hand-written buffer.

    The fread() call required allowing the 'fstat' syscall in the seccomp
    configuration, and apparently on some platforms also 'newfstatat', as
    Max reported this error:
    
        Sep 28 15:56:55.522 ERRO wal-redo-postgres: ---------------------------------------
        Sep 28 15:56:55.522 ERRO wal-redo-postgres: seccomp: bad syscall 262
        Sep 28 15:56:55.522 ERRO wal-redo-postgres: ---------------------------------------
    
    I'm afraid of allowing 'newfstatat', that seems like it's opening too
    much attack surface, since it allows access to files by filename. Maybe
    it's OK, but I'm not sure, but there isn't any fundamental reason why
    we'd need to call it, I'm not sure why glibc's fread() wants to call it.
    So let's avoid the trouble by writing our own simple buffer over plain
    read().
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    214affd View commit details
    Browse the repository at this point in the history
  63. Store unlogged tables locally, and replace PD_WAL_LOGGED.

    The smgr implementation needs to distinguish between unlogged/temp and
    regular 'permanent' relations, but the smgr API doesn't currently include
    that information. Add a 'relpersistence' field to SmgrRelationData, and
    as an argument to smgropen(). However, not all callers of smgropen()
    have a relcache entry at hand, so we allow some operations to pass 0,
    meaning 'unknown'.
    
    Now that we can store unlogged tables locally, use the same machinery
    to handle the buffered GiST and SP-GiST index builds. They populate the
    index by inserting all the tuples, and use the shared buffer cache while
    they do that. They don't WAL-log the pages while they do that, they log
    the whole relation as a separate bulk operation after the build has
    finished. That poses a problem for Zenith, where smgrwrite() is a no-op
    and we rely on WAL-logging to reconstruct the pages. Solve that problem by
    storing the pages locally in the compute node, like an unlogged relation,
    until the index build finishes and all the pages have been WAL-logged.
    To do that, the smgr needs to know when the caller is an unlogged build
    operation like that, so add functions to the Smgr API for that.
    
    With this commit, we no longer generate an FPI record whenever a rel is
    extended with an all-zeros page. See github issue #482. That greatly
    reduces the amount of WAL generated during bulk loading.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    8ea9a5d View commit details
    Browse the repository at this point in the history
  64. Fix queue cleanup in proposer (#93)

    Queue was moved further than truncateLsn, when quorumLsn matched end of wal record in the middle of queue message. Fix cleanup of unreceived messages.
    
    Co-authored-by: Arseny Sher <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    7c1df47 View commit details
    Browse the repository at this point in the history
  65. Support read-only nodes

    This changes the format of the 'zenith.signal' file. It is now a
    human-readable text file, with one line like "PREV LSN: 0/1234568", or
    "PREV LSN: none" if the prev LSN is not known, or "PREV LSN: invalid" if
    starting up in read-write is not allowed.
    
    Also, if 'zenith.signal' is present, don't try to read the checkpoint
    record from the WAL. Trust the copy in pg_control, instead.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    e49f138 View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    c38d759 View commit details
    Browse the repository at this point in the history
  67. Fix compiler warning.

    warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
      364 |  WalMessage *msgQueueAck = msgQueueHead;
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0ae859c View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    a18aa84 View commit details
    Browse the repository at this point in the history
  69. Initialize FSM/VM pages through buffer cache

    To prevent loading them from pageserver.
    
    Author: Konstantin Knizhnik with my extension to VM as well.
    ololobus authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    56a1290 View commit details
    Browse the repository at this point in the history
  70. Turn off back pressure by default

    ololobus authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    9409140 View commit details
    Browse the repository at this point in the history
  71. ShutdownConnection instead of ResetConnection in more places.

    At least currently risk of busy loop (e.g due to bugs) is much higher than
    benefit of additional availability if we immediately reconnect; add interval
    between the reconnection attempts.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f0d67ef View commit details
    Browse the repository at this point in the history
  72. Configuration menu
    Copy the full SHA
    3de0121 View commit details
    Browse the repository at this point in the history
  73. Handle keepalives while receiving WAL in recovery.

    Since c310932 safekeeper sometimes sends it.
    
    ref #843
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    bf17fe6 View commit details
    Browse the repository at this point in the history
  74. Fix truncateLsn update (#101)

    truncateLsn is now advanced to `Min(walkeeper[i].feedback.flushLsn)` with taking epochs into account.
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f9d1682 View commit details
    Browse the repository at this point in the history
  75. [walproposer] Get rid of SAB_Error after rebase

    Also see 1632ea4 for details.
    ololobus authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    dce3816 View commit details
    Browse the repository at this point in the history
  76. Add term history to safekeepers.

    See corresponding zenith commit.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    ba79713 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    0476da8 View commit details
    Browse the repository at this point in the history
  78. Use max_replication_apply_lag instead of max_replication_write_lag.

    Move backpressure throttling from XlogInsert, to ProcessInterrupts(), to restrict writing operations outside of critical section.
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    2f9af0c View commit details
    Browse the repository at this point in the history
  79. Forward pageserver connection string to safekeeper

    This is needed for implementation of tenant rebalancing. With this
    change safekeeper becomes aware of which pageserver is supposed to be
    used for replication from this compute.
    
    This also changes logic of substitution of auth token inside the
    connection string. So it is substituted during config variable
    parsing and available for both, smgr pageserver connection and
    walproposer safekeeper connection.
    LizardWizzard authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a7a4127 View commit details
    Browse the repository at this point in the history
  80. Configuration menu
    Copy the full SHA
    198ac63 View commit details
    Browse the repository at this point in the history
  81. Stop building docker images in this repo.

    Now docker images are being built in zenith repo as that way we have
    sequential version number that allows us to compare compute/storage
    versions.
    kelvich authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0ecc14e View commit details
    Browse the repository at this point in the history
  82. [walproposer] Async WAL append (#105)

    Implement async wp <-> sk protocol, send WAL messages ahead of feedback replies.
    
    New SS_ACTIVE state is introduced instead of former SS_SEND_WAL / SS_SEND_WAL_FLUSH / SS_RECV_FEEDBACK.
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    145a439 View commit details
    Browse the repository at this point in the history
  83. Fix walsender to work with zenith style standbyReply that sends non-z…

    …ero flushLsn.
    
    Clean up backpressure defaults.
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    703b9b8 View commit details
    Browse the repository at this point in the history
  84. Configuration menu
    Copy the full SHA
    dd07782 View commit details
    Browse the repository at this point in the history
  85. Reorder walproposer code in a more natural order (#112)

    Now functions in walproposer.c go in chronological order
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    3795081 View commit details
    Browse the repository at this point in the history
  86. Simplify walproposer code (#114)

    * Clean up walproposer states
    
    * Migrate AsyncReadFixed to AsyncReadMessage
    
    * Handle flushWrite better a bit
    
    * Update SS_ACTIVE event set in single place
    
    Now event set is updated only in the end of HandleActiveState, after
    all handlers code was executed.
    
    * Add comment on SS_ACTIVE write event
    
    * Add TODO for SS_ACTIVE DesiredEvents
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    29e6817 View commit details
    Browse the repository at this point in the history
  87. Configuration menu
    Copy the full SHA
    47cd68a View commit details
    Browse the repository at this point in the history
  88. walproposer renames (#116)

    * Rename walkeeper to safekeeper
    
    * Rename message variables as request/response
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    63d15a8 View commit details
    Browse the repository at this point in the history
  89. Configuration menu
    Copy the full SHA
    3588035 View commit details
    Browse the repository at this point in the history
  90. Add max_replication_write_lag

    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    595838c View commit details
    Browse the repository at this point in the history
  91. Do not throttle wal sender

    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c84ad00 View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    27354e7 View commit details
    Browse the repository at this point in the history
  93. Silence excessively noisy logging from walproposer.

    In the passing, switch a few places to ereport() instead of elog(), to
    avoid the overhead of constructing the string when it's not logged.
    
    Fixes neondatabase/neon#1066
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    200e7e0 View commit details
    Browse the repository at this point in the history
  94. Extend replication protocol with ZenithFeedback message.

    Add extensible ZenithFeedback part to AppendResponse messages
    Pass values sizes together with keys in ZenithFeedback message.
    
    Add standby_status_update fields into ZenithFeedback.
    Get rid of diskConsistentLsn field in AppendResponse, because now it is send via ZenithFeedback.
    Fix calculation of diskConsistentLsn and instanceSize - take values from latest reply from pageserver
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5013a30 View commit details
    Browse the repository at this point in the history
  95. Configuration menu
    Copy the full SHA
    f816a4d View commit details
    Browse the repository at this point in the history
  96. Use local relation cache for smgr_exists

    refer  #1077
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a25c83f View commit details
    Browse the repository at this point in the history
  97. Configuration menu
    Copy the full SHA
    e8fd3b6 View commit details
    Browse the repository at this point in the history
  98. Implement cluster size quota for zenith compute node.

    Use GUC zenith.max_cluster_size to set the limit.
    
    If limit is reached, extend requests will throw out-of-space error.
    When current size is too close to the limit - throw a warning.
    
    Do not apply size quota to autovacuum process
    
    Add pg_cluster_size() funciton in zenith extension
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    55171b3 View commit details
    Browse the repository at this point in the history
  99. Revert "Use local relation cache for smgr_exists"

    This reverts commit 45dd891.
    
    It introduced stable test_isolation failure. There was an idea that adding
    strict backpressure settings would help, as absense of this commit could behave
    as natural backpressure, but that didn't help. No better fix is immediately
    available, so let's revert until sorting this out.
    
    ref neondatabase/neon#1238
    ref neondatabase/neon#1239
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    fb17879 View commit details
    Browse the repository at this point in the history
  100. Change the unit of cluster size limit GUC to MB, and other fixes.

    The GUC is a 32-bit integer, so if the base unit is bytes, the max
    limit you can set is only 2 GB. Furthermore, the web console assumed
    that the unit is in MB, and set it to 10000 meaning 10 GB, but in
    reality it was set to just 10 kB.
    
    Remove the WARNINGs related to cluster size limit. That was probably
    supposed to be DEBUG5 or something, because it's extremely noisy
    currently. You get the WARNING for *every block* when a relation is
    extended.
    
    Some kind of a WARNING when you approach the limit would make sense,
    but it's difficult to do in a sensible way with WARNINGs from the
    server. Firstly, most applications will ignore WARNINGs, in which case
    they don't accomplish anything. If an application forwards them to the
    user, that's not great either unless the application user happens to
    be the DBA. If you're lucky, the WARNINGs end up in an application log
    and the DBA is alerted, but printing the message for every relation
    extension is too noisy for that too. An email alert would probably be
    best, outside Postgres.
    
    Also don't enforce the limit when extending a temporary or unlogged
    relation. They don't count towards the cluster size limit, so it seems
    weird to error out on them. And reword the error message a bit.
    
    Fixes neondatabase/neon#1233
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5da70d2 View commit details
    Browse the repository at this point in the history
  101. Improve error handling while connecting to page server.

    If anything goes wrong while establishing a connection, don't leak the
    socket.
    
    Also, if you get an error while sending the GetPage request, kill the
    connection. It's not clear what state it's in, so better to reconnect.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    085a631 View commit details
    Browse the repository at this point in the history
  102. Configuration menu
    Copy the full SHA
    3e146b4 View commit details
    Browse the repository at this point in the history
  103. Configuration menu
    Copy the full SHA
    bd2996b View commit details
    Browse the repository at this point in the history
  104. Initialize pgxactoff for walproposer

    refer #1244
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5575578 View commit details
    Browse the repository at this point in the history
  105. Configuration menu
    Copy the full SHA
    9acffbe View commit details
    Browse the repository at this point in the history
  106. Fix more compiler warnings.

    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f74a9d9 View commit details
    Browse the repository at this point in the history
  107. Remove dead code in handling ZenithFeedback part of an AppendResponse.

    The constructed StringInfoData 'z' variable wasn't used for anything, we
    passed the original 's' StringInfo directly to ParseZenithFeedbackMessage.
    That's fine, but let's remove the dead code.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    df7a0a1 View commit details
    Browse the repository at this point in the history
  108. Expose reading a relation page at a specific LSN (#131)

    * Expose reading a relation page at a specific LSN
    
    * Addressing comments
    antons-antons authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    876296d View commit details
    Browse the repository at this point in the history
  109. Configuration menu
    Copy the full SHA
    2be5289 View commit details
    Browse the repository at this point in the history
  110. Fix zenith_test_utils linkage on macOS

    Use function pointer to perform a cross-extension calls.
    kelvich authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    94601e5 View commit details
    Browse the repository at this point in the history
  111. Add warning fr unrecgonized GUCs with zenith prefix

    refer #1262
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    d3ccbc1 View commit details
    Browse the repository at this point in the history
  112. Configuration menu
    Copy the full SHA
    4a9cee9 View commit details
    Browse the repository at this point in the history
  113. Use local relation cache for smgr_exists

    refer  #1077
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a16ff76 View commit details
    Browse the repository at this point in the history
  114. Populate relsize cache when relation is created.

    Postgres can perform an smgrnblocks() call on the relation right after
    creating it, and we don't update the last-written LSN on smgrcreate().
    
    Perhaps we should update last-written LSN, instead. This isn't
    bulletproof.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    8d98cb0 View commit details
    Browse the repository at this point in the history
  115. Fix pg_table_size() on a view

    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a2c2c69 View commit details
    Browse the repository at this point in the history
  116. Don't set commitLsn to truncateLsn.

    It might jump back (on compute) this way, which is not fatal but violates sanity
    checks.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    78cfb83 View commit details
    Browse the repository at this point in the history
  117. Configuration menu
    Copy the full SHA
    ec76f4f View commit details
    Browse the repository at this point in the history
  118. Enable dumping corrupt WAL segments (#145)

    * Enable dumping corrupt WAL segments
    
     Add ability to dump WAL segment with corrupt page headers and recrods
     skips over missing/broken page headers
     skips over misformatted log recrods
     allows dumping log record from a particular file starting from an
    optional offset
     (without a need of carefully crafted input)
    antons-antons authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c2e6401 View commit details
    Browse the repository at this point in the history
  119. Don't hold walproposer WAL in memory (#141)

    WAL is no longer in memory to prevent OOM in the compute. Removed in-memory queue because it's not needed anymore. When streaming, WAL is now read directly from disk. Every safekeeper has a separate XLogReader. walproposer will now read as much WAL as it can for a single AppendRequest message, it can help with recovering lagging safekeepers. Because Recovery needs to save WAL for streaming, now walproposer can write WAL to disk and `--sync-safekeepers` mode will create pg_wal directory if needed. Replication slot `restart_lsn` is now synced with `truncate_lsn` to prevent truncation of disk WAL until needed.
    petuhovskiy authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f1834ee View commit details
    Browse the repository at this point in the history
  120. Add --sysid parameter to initdb

    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    a3c9478 View commit details
    Browse the repository at this point in the history
  121. Give up connection attempt to safekeeper after timeout.

    Enforces reconnection soon when packets are dropped, e.g. after turning ec2
    instance off.
    
    ref neondatabase/neon#1491
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    b43e732 View commit details
    Browse the repository at this point in the history
  122. Avoid redundand memory allocation and sycnhronization in walredo (#144)

    * Avoid redundand memory allocation and sycnhronization in walredo
    
    * Address review comments
    
    * Reduce number of temp buffers and size of inmem file storage for wal redo postgres
    
    * Misc cleanup
    
    Add comments on 'inmem_smgr.c', remove superfluous copy-pasted comments,
    pgindent.
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    14a5bc8 View commit details
    Browse the repository at this point in the history
  123. Fix missed include for InRecovery (#149)

    * Fix missed include for InRecovery
    
    * Fix missed include for InRecovery (used only in debug version with --enable--cassert)
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    1acb625 View commit details
    Browse the repository at this point in the history
  124. Avoid "bad syscall 39" on assertion failure in WAL redo process.

    ExceptionalCondition calls getpid(), which is currently forbidden by
    seccomp. You only get there if something else went wrong, but the "bad
    syscall" error hides the underlying cause of the error, which makes
    debugging hard.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c309bac View commit details
    Browse the repository at this point in the history
  125. Configuration menu
    Copy the full SHA
    4ce564a View commit details
    Browse the repository at this point in the history
  126. Configuration menu
    Copy the full SHA
    dc03bcd View commit details
    Browse the repository at this point in the history
  127. Turn Assertion into elog(ERROR), to help with debugging.

    This error is happening in the 'pg_regress' test in the CI, but not on
    my laptop. Turn it into an ERROR, so that we get the error context and
    backtrace of it.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    8a65bd4 View commit details
    Browse the repository at this point in the history
  128. Fix errors in WAL redo about relpersistence mismatch.

    In the WAL redo process, even "permanent" buffers are stored in the
    local buffer cache. Need to pass RELPERSISTENCE_PERMANENT to smgropen()
    in that case.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    fa222b9 View commit details
    Browse the repository at this point in the history
  129. Don't log 'last written LSN ahead of flushed'.

    That's a valid case, as edited comment says.
    
    neondatabase/neon#1303
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    fe1cf28 View commit details
    Browse the repository at this point in the history
  130. Perform inmem_smgr cleaup after processing each record (#154)

    * Perform inmem_smgr cleaup after processing each record
    
    * Prevent eviction of wal redo target page
    
    * Prevent eviction of wal redo target page frmo temp buffers
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    8b6fa25 View commit details
    Browse the repository at this point in the history
  131. Avoid extending relation in the WAL redo process.

    It's a waste of time, and otherwise you can run into the MAX_PAGES limit.
    
    Fixes neondatabase/neon#1615
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6fd7a41 View commit details
    Browse the repository at this point in the history
  132. Send timeline_start_lsn in Elected and receive it in VoteResponse mes…

    …sages.
    
    To support remembering it on safekeeper. Currently compute doesn't know initial
    LSN on non-first boot (though it could get it from pageserver in theory), so we
    rely on safekeepers to fetch it back.
    
    While changing the protocol, also add node_id to AcceptorProposerGreeting.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    6e49f22 View commit details
    Browse the repository at this point in the history
  133. Verify basebackup LSN against consensus LSN in walproposer.

    If not, such basebackup (clog etc) is inconsistent and must be retaken.
    
    Basebackup LSN is taken by exposing xlog.c RedoStartLSN in shmem.
    
    ref neondatabase/neon#594
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0a064cf View commit details
    Browse the repository at this point in the history
  134. Implement pg_database_size():

    - extend zenith pageserver API to handle new request type;
    - add dbsize_hook to intercept db_dir_size() call.
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    71ed60b View commit details
    Browse the repository at this point in the history
  135. Shut down instance on basebackup LSN mismatch.

    To force making basebackup again.
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0903362 View commit details
    Browse the repository at this point in the history
  136. Configuration menu
    Copy the full SHA
    598d318 View commit details
    Browse the repository at this point in the history
  137. zenith_test_utils extension: add neon_xlogflush()

    This function is to simplify complex WAL generation in neondatabase/neon#1574
    
    `pg_logical_emit_message` is the easiest way to get a big WAL record, but:
    * If it's transactional, it gets `COMMIT` record right after
    * If it's not, WAL is not flushed at all. The function helps here, so we
      don't rely on the background WAL writer.
    
    I suspect the plain `xlogflush()` name may collide in the future, hence the prefix.
    yeputons authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    b920251 View commit details
    Browse the repository at this point in the history
  138. Reduce noise in the logs from inmem_write()

    I'm seeing a lot of these warnings from B-tree SPLIT records:
    
        WARNING:  inmem_write() called for 1663/12990/16397.0 blk 2630: used_pages 0
        CONTEXT:  WAL redo at 1/235A1B50 for Btree/SPLIT_R: level 0, firstrightoff 368, newitemoff 408, postingoff 0
    
    That seems OK, replaying a split record legitimately accesses many buffers:
    the left half, the right half, left sibling, right sibling, and child.
    
    We could bump up 'temp_buffers' (currently 4), but I didn't do that
    beceause it's also good to get some test coverage for the
    inmem_smgr.c.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    fa4064e View commit details
    Browse the repository at this point in the history
  139. Configuration menu
    Copy the full SHA
    7fb8953 View commit details
    Browse the repository at this point in the history
  140. Improve error messages on seccomp loading errors.

    At neondatabase/neon#1783 (comment),
    Kirill saw case where the WAL redo process failed to open /dev/null.
    That's pretty weird, and I have no idea what might be causing it, but
    with this patch we'll at least get a little more details if it happens
    again. This will print the OS error (with %m) if it happens, and also
    distinguishes between the two error cases that previously both emitted
    the 'failed to open a test file' error.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    4f970da View commit details
    Browse the repository at this point in the history
  141. Rename contrib/zenith to contrib/neon. Rename custom GUCs:

    - zenith.page_server_connstring -> neon.pageserver_connstring
    - zenith.zenith_tenant -> neon.tenant_id
    - zenith.zenith_timeline -> neon.timeline_id
    - zenith.max_cluster_size -> neon.max_cluster_size
    lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    383181f View commit details
    Browse the repository at this point in the history
  142. Configuration menu
    Copy the full SHA
    7913ea5 View commit details
    Browse the repository at this point in the history
  143. Fix basebackup LSN comparison in walproposer.

    as basebackup LSN always skips over page header
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    fdc02cd View commit details
    Browse the repository at this point in the history
  144. Configuration menu
    Copy the full SHA
    ae0afcc View commit details
    Browse the repository at this point in the history
  145. Do not allocate shared memory for wal_redo process (#165)

    * Do not allocate shared memory for wal_redo process
    
    * Add comment
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5afaed3 View commit details
    Browse the repository at this point in the history
  146. Add check for NULL for malloc in InternalIpcMemoryCreate (#173)

    * Add check for NULL for malloc in InternalIpcMemoryCreate
    
    * apply pgindent
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    eeaaead View commit details
    Browse the repository at this point in the history
  147. Configuration menu
    Copy the full SHA
    d027915 View commit details
    Browse the repository at this point in the history
  148. Configuration menu
    Copy the full SHA
    fd9a599 View commit details
    Browse the repository at this point in the history
  149. Configuration menu
    Copy the full SHA
    ad78fc7 View commit details
    Browse the repository at this point in the history
  150. Misc cleanup in libpagestore.c.

    - Fix typos
    - Change Zenith -> Neon in the ZENITH_SMGR tag that's printed in error
      messages that is user-visible, and in various function names and comments
      that are not user-visible.
    - pgindent
    - Remove comment about zm_to_string() leaking memory. It doesn't.
    - Re-word some error messages to match PostgreSQL error message style guide
    - Cleanup logging style
    - Don't print JWT token to log
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    0d5f52a View commit details
    Browse the repository at this point in the history
  151. Large last written lsn cache (#177)

    Maintain cache of last written LSN for each relation segment (8 Mb).
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    eaae857 View commit details
    Browse the repository at this point in the history
  152. Add uuid-ossp to the supported extensions (#181)

    * Add uuid-ossp to the supported extensions
    
    Also update compile flags to `-O2` to trade compile time for PostgreSQL performance, and removes --enable-cassert.
    MMeent authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    118847d View commit details
    Browse the repository at this point in the history
  153. Update last written LSN for gin/gist index metadata (#182)

    * Update last written LSN for gin/gist index metadata
    
    * Replace SetLastWrittenLSN with family of SetLastWrittenLSNFFor* functions
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    b092941 View commit details
    Browse the repository at this point in the history
  154. Revert "Update last written LSN for gin/gist index metadata (#182)" (#…

    …183)
    
    This reverts commit 7517d1c.
    
    Revert "Large last written lsn cache (#177)"
    
    This reverts commit 595ac69.
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    4c77785 View commit details
    Browse the repository at this point in the history
  155. Fix uuid-ossp build

    kelvich authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    9424cb0 View commit details
    Browse the repository at this point in the history
  156. Configuration menu
    Copy the full SHA
    3916814 View commit details
    Browse the repository at this point in the history
  157. Eliminate UnkonwnXLogRecPtr and always use InvalidXLogRecPtr instead (#…

    …192)
    
    * Eliminate UnkonwnXLogRecPtr and always use InvalidXLogRecPtr instead
    
    * Remove GetMinReplicaLsn function
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    46385c5 View commit details
    Browse the repository at this point in the history
  158. Init wal redo buffer for fpi (#194)

    * Initialize wal_redo_buffer after applying record with FPI
    
    refer #1915
    
    * Update comment
    
    * Update src/backend/tcop/zenith_wal_redo.c
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    
    * Update src/backend/tcop/zenith_wal_redo.c
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    55cb46e View commit details
    Browse the repository at this point in the history
  159. Stamp XLP_FIRST_IS_CONTRECORD only if we start writing with page offset.

    Without this patch, on bootstrap XLP_FIRST_IS_CONTRECORD has been always put on
    header of a page where WAL writing continues. This confuses WAL decoding on
    safekeepers, making it think decoding starts in the middle of a record, leading
    to
    
    2022-08-12T17:48:13.816665Z ERROR {tid=37}: query handler for 'START_WAL_PUSH postgresql://no_user:@localhost:15050' failed: failed to run ReceiveWalConn
    
    Caused by:
        0: failed to process ProposerAcceptorMessage
        1: invalid xlog page header: unexpected XLP_FIRST_IS_CONTRECORD at 0/2CF8000
    arssher authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    7444287 View commit details
    Browse the repository at this point in the history
  160. Pull 99% of walproposer code into extension. (#188)

    * Pull 99% of walproposer code into extension.
    
    * Annotate nbytes to show it's used for asserts only, fixing one more warning.
    
    * Fix makefiles:
    
    - Include neon extensions into contrib Makefile
    - Configure libpqwalproposer more like other extensions
    
    * Add comment about lack of PG timelines, and make StartReplication static again.
    
    * Fix some compiler warnings in vendor/postgres, and pull libpqwalproposer into vendor/neon
    
    * Fix issue with makefile that didn't get caught in the normal test envs.
    MMeent authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f700c2b View commit details
    Browse the repository at this point in the history
  161. Use ECR for image (#195)

    * Use ECR for image
    
    * Keep arg consistent across dockerfiles
    
    Co-authored-by: Rory de Zoete <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    f733e09 View commit details
    Browse the repository at this point in the history
  162. Configuration menu
    Copy the full SHA
    0c46560 View commit details
    Browse the repository at this point in the history
  163. Configuration menu
    Copy the full SHA
    9d22aa6 View commit details
    Browse the repository at this point in the history
  164. Configuration menu
    Copy the full SHA
    698ff58 View commit details
    Browse the repository at this point in the history
  165. Move backpressure throttling implementation to neon extension (#203)

    * Move backpressure throttling implementation to neon extension and function for monitoring throttling time
    
    * Update src/include/miscadmin.h
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    5e7bd7d View commit details
    Browse the repository at this point in the history
  166. Configuration menu
    Copy the full SHA
    4c869e4 View commit details
    Browse the repository at this point in the history
  167. Configuration menu
    Copy the full SHA
    3d11198 View commit details
    Browse the repository at this point in the history
  168. Configuration menu
    Copy the full SHA
    63500f4 View commit details
    Browse the repository at this point in the history
  169. Configuration menu
    Copy the full SHA
    59e82e8 View commit details
    Browse the repository at this point in the history
  170. Update expected output for sysviews test because of changed default v…

    …alue of enable_seqscan_prefetch
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    784557d View commit details
    Browse the repository at this point in the history
  171. Configuration menu
    Copy the full SHA
    eff6e84 View commit details
    Browse the repository at this point in the history
  172. Configuration menu
    Copy the full SHA
    27a018f View commit details
    Browse the repository at this point in the history
  173. Pin pages with speculative insert tuples to prevent their reconstruct…

    …ion because spec_token is not wal logged (#221)
    
    * Pin pages with speculative insert tuples to prevent their reconstruction because spec_token is not wal logged
    
    refer #2587
    
    * Undo Neon trick in heap_xlog_insert which is not needed any more after pinning page for speulative insert
    
    * Update src/backend/access/heap/heapam.c
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    
    * Move ReleaseBuffer to the end of heap_finish_speculative function
    
    * Update src/backend/access/heap/heapam.c
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    
    Co-authored-by: Heikki Linnakangas <[email protected]>
    2 people authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    04aa290 View commit details
    Browse the repository at this point in the history
  174. Fix shared memory initialization for last written LSN cache (#224)

    * Fix shared memory initialization for last written LSN cache
    
    Replace (from,till) with (from,n_blocks) for SetLastWrittenLSNForBlockRange function
    
    * Fast exit from SetLastWrittenLSNForBlockRange for n_blocks == 0
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    efdcfa3 View commit details
    Browse the repository at this point in the history
  175. Configuration menu
    Copy the full SHA
    80e5204 View commit details
    Browse the repository at this point in the history
  176. Move walredo process code under pgxn in the main 'neon' repository.

    - Refactor the way the WalProposerMain function is called when started
      with --sync-safekeepers. The postgres binary now explicitly loads
      the 'neon.so' library and calls the WalProposerMain in it. This is
      simpler than the global function callback "hook" we previously used.
    
    - Move the WAL redo process code to a new library, neon_walredo.so,
      and use the same mechanism as for --sync-safekeepers to call the
      WalRedoMain function, when launched with --walredo argument.
    
    - Also move the seccomp code to neon_walredo.so library. I kept the
      configure check in the postgres side for now, though.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    c236bea View commit details
    Browse the repository at this point in the history
  177. Misc cleanup, mostly to reduce unnecessary differences with upstream.

    Fix indentation, remove unused definitions, resolve some FIXMEs.
    hlinnaka authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    380877c View commit details
    Browse the repository at this point in the history
  178. Optimize prefetch patterns in both heap seqscan and vacuum scans. (#227)

    Previously, we called PrefetchBuffer [NBlkScanned * seqscan_prefetch_buffers]
    times in each of those situations, but now only NBlkScanned.
    
    In addition, the prefetch mechanism for the vacuum scans is now based on
    blocks instead of tuples - improving the efficiency.
    MMeent authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    12aac2d View commit details
    Browse the repository at this point in the history
  179. Fix prefetch issues in parallel scans and vacuum's cleanup scan (#234)

    Parallel seqscans didn't take their parallelism into account when determining
    which block to prefetch, and vacuum's cleanup scan didn't correctly determine
    which blocks would need to be prefetched, and could get into an infinite loop.
    MMeent authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    38009a2 View commit details
    Browse the repository at this point in the history
  180. Use prefetch in pg_prewarm extension (#236)

    * Use prefetch in pg_prewarm extension
    
    * Change prefetch order as suggested in review
    knizhnik authored and lubennikovaav committed Nov 21, 2022
    Configuration menu
    Copy the full SHA
    da50d99 View commit details
    Browse the repository at this point in the history

Commits on Nov 23, 2022

  1. PG14: Prefetch cleanup (#242)

    * Update prefetch mechanisms:
    
    - **Enable enable_seqscan_prefetch by default**
    - Store prefetch distance in the relevant scan structs
    - Slow start sequential scan, to accommodate LIMIT clauses.
    - Replace seqscan_prefetch_buffer with the relations' tablespaces'
      *_io_concurrency; and drop seqscan_prefetch_buffer as a result.
    - Clarify enable_seqscan_prefetch GUC description
    - Fix prefetch in pg_prewarm
    - Add prefetching to autoprewarm worker
    - Fix an issue where we'd incorrectly not prefetch data when hitting a table wraparound. The same issue also resulted in assertion failures in debug builds.
    - Fix parallel scan prefetching - we didn't take into account that parallel scans have scan synchronization, too.
    MMeent committed Nov 23, 2022
    Configuration menu
    Copy the full SHA
    c6be492 View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2022

  1. Configuration menu
    Copy the full SHA
    8b158b8 View commit details
    Browse the repository at this point in the history
  2. Maintain last written LSN for each page to enable prefetch on vacuum,… (

    #244)
    
    * Maintain last written LSN for each page to enable prefetch on vacuum, delete and other massive update operations
    
    * Move PageSetLSN in heap_xlog_visible before MarkBufferDirty
    knizhnik committed Nov 24, 2022
    Configuration menu
    Copy the full SHA
    06edb5a View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2022

  1. Prefetch cleanup: (#247)

    - Prefetch the pages in index vacuum's sequential scans
       Implemented in NBTREE, GIST and SP-GIST.
       BRIN does not have a 2nd phase of vacuum, and both GIN and HASH clean up
       their indexes in a non-seqscan fashion: GIN scans the btree from left to
       right, and HASH only scans the initial buckets sequentially.
    MMeent committed Dec 5, 2022
    Configuration menu
    Copy the full SHA
    299bf4f View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2022

  1. Fix uninitialized variable in spgvacuum.c (#250)

    The compiler warning was correct and would have the potential to disable prefetching.
    MMeent committed Dec 7, 2022
    Configuration menu
    Copy the full SHA
    544ec69 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2022

  1. Configuration menu
    Copy the full SHA
    c22aea6 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2022

  1. fix unnecessary header overwrites

    use $(INSTALL_DATA) to copy header files, similar to in more recent v15
    branch. this helps with unnecessary rebuilds of postgres_ffi in neon.
    
    Cc: neondatabase/neon#1873
    koivunej committed Dec 22, 2022
    Configuration menu
    Copy the full SHA
    8465340 View commit details
    Browse the repository at this point in the history