Skip to content

Commit

Permalink
fix: removed gzip backup compression due to backpressure memory issues
Browse files Browse the repository at this point in the history
  • Loading branch information
titanism committed Feb 20, 2024
1 parent f8de546 commit b32b521
Show file tree
Hide file tree
Showing 32 changed files with 147 additions and 103 deletions.
2 changes: 1 addition & 1 deletion app/controllers/web/my-account/download-alias-backup.js
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ async function downloadAliasBackup(ctx) {
Bucket: `${config.env}-${dashify(
_.camelCase(alias.storage_location)
)}`,
Key: `${alias.id}.sqlite.gz`
Key: `${alias.id}.sqlite`
}),
{ expiresIn: 3600 }
);
Expand Down
2 changes: 1 addition & 1 deletion app/models/aliases.js
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ const Aliases = new mongoose.Schema({
// - $id-tmp-wal.sqlite (WAL)
// - $id-tmp-shm.sqlite (SHM)
//
// - $id.sqlite.gz (R2 backup) <--- excluded for now
// - $id.sqlite (R2 backup) <--- excluded for now
//
storage_used: {
type: Number,
Expand Down
25 changes: 10 additions & 15 deletions app/views/docs/best-quantum-safe-encrypted-email-service/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ We are the only 100% open-source and privacy-focused email service provider that
IMAP->>You: Success!
```

5. [Compressed backups of your encrypted mailboxes](#backups) are made daily. You can also request a new backup at any time or download the latest backup from <a href="/my-account/domains" target="_blank" rel="noopener noreferrer" class="alert-link">My Account <i class="fa fa-angle-right"></i> Domains</a> <i class="fa fa-angle-right"></i> Aliases. If you decide to switch to another email service, then you can easily migrate, download, export, and purge your mailboxes and backups at anytime.
5. [Backups of your encrypted mailboxes](#backups) are made daily. You can also request a new backup at any time or download the latest backup from <a href="/my-account/domains" target="_blank" rel="noopener noreferrer" class="alert-link">My Account <i class="fa fa-angle-right"></i> Domains</a> <i class="fa fa-angle-right"></i> Aliases. If you decide to switch to another email service, then you can easily migrate, download, export, and purge your mailboxes and backups at anytime.


## Technologies
Expand Down Expand Up @@ -176,34 +176,29 @@ We accomplish two-way communication with [WebSockets](https://developer.mozilla.

### Backups

> **tldr;** Compressed backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from <a href="/my-account/domains" target="_blank" rel="noopener noreferrer" class="alert-link">My Account <i class="fa fa-angle-right"></i> Domains</a> <i class="fa fa-angle-right"></i> Aliases.
> **tldr;** Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from <a href="/my-account/domains" target="_blank" rel="noopener noreferrer" class="alert-link">My Account <i class="fa fa-angle-right"></i> Domains</a> <i class="fa fa-angle-right"></i> Aliases.
For backups, we simply acquire a write lock, run a WAL checkpoint via `wal_checkpoint(PASSIVE)`, and then copy the file.
For backups, we simply run the SQLite `VACUUM INTO` command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the [SHA-256](https://en.wikipedia.org/wiki/SHA-2) hash has changed on the file as compared to the most recent backup.

Backups are stored if no existing backup is detected or if the [SHA-256](https://en.wikipedia.org/wiki/SHA-2) hash has changed on the file as compared to the most recent backup.
Note that we use the `VACUUM INTO` command as opposed to the built-in `backup` command because if a page is modified during a `backup` command operation, then it has to start over. The `VACUUM INTO` command will take a snapshot. See these comments on [GitHub](https://github.com/benbjohnson/litestream.io/issues/56) and [Hacker News](https://news.ycombinator.com/item?id=31387556) for more insight.

Additionally we use `VACUUM INTO` as opposed to `backup`, because the `backup` command would leave the database unencrypted for a brief period until `rekey` is invoked (see this GitHub [comment](https://github.com/m4heshd/better-sqlite3-multiple-ciphers/issues/46#issuecomment-1468018927) for insight).

The Secondary will instruct the Primary over the `WebSocket` connection to execute the backup – and the Primary will then receive the command to do so and will subsequently:

1. Connect to your encrypted mailbox.
2. Acquire a write lock.
3. Run a WAL checkpoint via `wal_checkpoint(PASSIVE)`.
4. Copy the file to a temporarily location.
4. Run the `VACUUM INTO` SQLite command.
5. Ensure that the copied file can be opened with the encrypted password (safeguard/dummyproofing).
6. Compress the resulting backup file with `gzip`.
7. Upload it to Cloudflare R2 for storage (or your own provider if specified).
6. Upload it to Cloudflare R2 for storage (or your own provider if specified).
7. Compress the resulting backup file with `gzip`.
8. Upload it to Cloudflare R2 for storage (or your own provider if specified).

Remember that your mailboxes are encrypted – and while we have IP restrictions and other authentication measures in place for WebSocket communication – in the event of a bad actor, you can rest assured that unless the WebSocket payload has your IMAP password, it cannot open your database.

Only one backup is stored per mailbox at this time, but in the future we may offer point-in-time-recovery ("[PITR](https://en.wikipedia.org/wiki/Point-in-time_recovery)").

Previously we ran the SQLite `VACUUM INTO` command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.

However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.

The `VACUUM INTO` command as opposed to the built-in `backup` command because if a page is modified during a `backup` command operation, then it has to start over. The `VACUUM INTO` command will take a snapshot. See these comments on [GitHub](https://github.com/benbjohnson/litestream.io/issues/56) and [Hacker News](https://news.ycombinator.com/item?id=31387556) for more insight.

Additionally `VACUUM INTO` was better than `backup`, because the `backup` command would leave the database unencrypted for a brief period until `rekey` is invoked (see this GitHub [comment](https://github.com/m4heshd/better-sqlite3-multiple-ciphers/issues/46#issuecomment-1468018927) for insight).

### Search

Our IMAP servers support the `SEARCH` command with complex queries, regular expressions, and more.
Expand Down
56 changes: 16 additions & 40 deletions helpers/parse-payload.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ const fs = require('node:fs');
const os = require('node:os');
const path = require('node:path');
const { Buffer } = require('node:buffer');
const { createGzip } = require('node:zlib');
const { isIP } = require('node:net');
const { Headers, Splitter, Joiner } = require('mailsplit');

Expand Down Expand Up @@ -753,13 +752,9 @@ async function parsePayload(data, ws) {
// run a checkpoint to copy over wal to db
tmpDb.pragma('wal_checkpoint(PASSIVE)');

//
// NOTE: we no longer run VACUUM because we already have
// autovacuum enabled and this is incredibly
// memory intensive on larger mailboxes
//
// TODO: vacuum into instead (same for elsewhere)
// vacuum temporary database
// tmpDb.prepare('VACUUM').run();
tmpDb.prepare('VACUUM').run();

// TODO: unlock the temporary database

Expand All @@ -770,17 +765,12 @@ async function parsePayload(data, ws) {
logger.fatal(err, { payload });
}

//
// NOTE: we no longer run VACUUM because we already have
// autovacuum enabled and this is incredibly
// memory intensive on larger mailboxes
//
// vacuum database
// await this.wsp.request.call(this, {
// action: 'vacuum',
// timeout: ms('5m'),
// session: { user: payload.session.user }
// });
await this.wsp.request.call(this, {
action: 'vacuum',
timeout: ms('5m'),
session: { user: payload.session.user }
});

response = {
id: payload.id,
Expand Down Expand Up @@ -1518,6 +1508,7 @@ async function parsePayload(data, ws) {
// run a checkpoint to copy over wal to db
db.pragma('wal_checkpoint(PASSIVE)');

// TODO: vacuum into instead (same for elsewhere)
// vacuum database
db.prepare('VACUUM').run();
}
Expand Down Expand Up @@ -1762,10 +1753,9 @@ async function parsePayload(data, ws) {
db.pragma('wal_checkpoint(PASSIVE)');

// create backup
// NOTE: we don't use `VACUUM INTO` anymore because it is extremely memory intensive
// const results = db.exec(`VACUUM INTO '${tmp}'`);
// logger.debug('results', { results });
await fs.promises.copyFile(storagePath, tmp);
const results = db.exec(`VACUUM INTO '${tmp}'`);

logger.debug('results', { results });

let backup = true;
let err;
Expand Down Expand Up @@ -1917,7 +1907,7 @@ async function parsePayload(data, ws) {
if (!_.isDate(new Date(payload.backup_at)))
throw new TypeError('Backup at invalid date');

// only allow one backup at a time
// only allow one backup at a time and once every hour
const backupLock = await this.lock.waitAcquireLock(
`${payload.session.user.alias_id}-backup`,
ms('30m'), // expires after 30m
Expand Down Expand Up @@ -1964,7 +1954,7 @@ async function parsePayload(data, ws) {
_.camelCase(payload.session.user.storage_location)
)}`;

const key = `${payload.session.user.alias_id}.sqlite.gz`;
const key = `${payload.session.user.alias_id}.sqlite`;

if (config.env !== 'test') {
let res;
Expand Down Expand Up @@ -2015,11 +2005,9 @@ async function parsePayload(data, ws) {
db.pragma('wal_checkpoint(PASSIVE)');

// create backup
// NOTE: we don't use `VACUUM INTO` anymore because it is extremely memory intensive
// const results = db.exec(`VACUUM INTO '${tmp}'`);
// logger.debug('results', { results });
await fs.promises.copyFile(storagePath, tmp);
const results = db.exec(`VACUUM INTO '${tmp}'`);

logger.debug('results', { results });
backup = true;

// open the backup to ensure that encryption still valid
Expand Down Expand Up @@ -2061,24 +2049,12 @@ async function parsePayload(data, ws) {
if (err.name !== 'NotFound') throw err;
}

// gzip the backup
// await S3.send(
// new PutObjectCommand({
// ACL: 'private',
// Body: fs.createReadStream(tmp).pipe(createGzip()),
// Bucket: bucket,
// Key: key,
// Metadata: {
// hash
// }
// })
// );
const upload = new Upload({
client: S3,
params: {
Bucket: bucket,
Key: key,
Body: fs.createReadStream(tmp).pipe(createGzip()),
Body: fs.createReadStream(tmp),
Metadata: { hash }
}
});
Expand Down
2 changes: 2 additions & 0 deletions helpers/setup-pragma.js
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ async function setupPragma(db, session, cipher = 'chacha20') {
// <https://www.sqlite.org/pragma.html#pragma_secure_delete>
db.pragma('secure_delete=ON');

//
// NOTE: we still run a manual vacuum every 24 hours
//
// turn on auto vacuum (for large amounts of deleted content)
// <https://www.sqlite.org/pragma.html#pragma_auto_vacuum>
Expand Down
22 changes: 9 additions & 13 deletions jobs/cleanup-sqlite.js
Original file line number Diff line number Diff line change
Expand Up @@ -220,19 +220,15 @@ const mountDir = config.env === 'production' ? '/mnt' : tmpdir;
throw err;
}

//
// NOTE: we no longer run VACUUM because we already have
// autovacuum enabled and this is incredibly
// memory intensive on larger mailboxes
//
// await wsp.request(
// {
// action: 'vacuum',
// timeout: ms('5m'),
// session: { user }
// },
// 0
// );
// eslint-disable-next-line no-await-in-loop
await wsp.request(
{
action: 'vacuum',
timeout: ms('5m'),
session: { user }
},
0
);
} catch (err) {
logger.error(err);
}
Expand Down
5 changes: 4 additions & 1 deletion locales/ar.json
Original file line number Diff line number Diff line change
Expand Up @@ -7305,5 +7305,8 @@
"command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.": "الأمر كل ساعة أثناء معالجة أوامر IMAP، مما يعزز كلمة المرور المشفرة من اتصال IMAP في الذاكرة.",
"However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.": "ومع ذلك، كان هذا الأسلوب يستهلك الكثير من الذاكرة، وكان يتسبب في حدوث أخطاء في نفاد الذاكرة أثناء إنتاج صناديق البريد الكبيرة.",
"Additionally": "بالإضافة إلى ذلك",
"was better than": "كان أفضل من"
"was better than": "كان أفضل من",
"Backups of your encrypted mailboxes": "النسخ الاحتياطية لصناديق البريد المشفرة الخاصة بك",
"Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from": "يتم عمل نسخ احتياطية لصناديق البريد المشفرة الخاصة بك يوميًا. يمكنك أيضًا طلب نسخة احتياطية جديدة على الفور أو تنزيل أحدث نسخة احتياطية في أي وقت من",
"command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the": "الأمر كل يوم أثناء معالجة أوامر IMAP، مما يعزز كلمة المرور المشفرة من اتصال IMAP في الذاكرة. يتم تخزين النسخ الاحتياطية في حالة عدم اكتشاف نسخة احتياطية موجودة أو في حالة ظهور خطأ"
}
5 changes: 4 additions & 1 deletion locales/cs.json
Original file line number Diff line number Diff line change
Expand Up @@ -7305,5 +7305,8 @@
"command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.": "příkaz každou hodinu během zpracování příkazů IMAP, který využívá vaše šifrované heslo z připojení IMAP v paměti.",
"However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.": "Tento přístup byl však příliš náročný na paměť a způsoboval chyby způsobené nedostatkem paměti při výrobě velkých poštovních schránek.",
"Additionally": "dodatečně",
"was better than": "bylo lepší než"
"was better than": "bylo lepší než",
"Backups of your encrypted mailboxes": "Zálohy vašich šifrovaných poštovních schránek",
"Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from": "Zálohy vašich šifrovaných poštovních schránek se provádějí denně. Můžete také okamžitě požádat o novou zálohu nebo si kdykoli stáhnout nejnovější zálohu",
"command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the": "příkaz každý den během zpracování příkazů IMAP, které využívá vaše zašifrované heslo z připojení IMAP v paměti. Zálohy se ukládají, pokud není detekována žádná existující záloha nebo pokud"
}
5 changes: 4 additions & 1 deletion locales/da.json
Original file line number Diff line number Diff line change
Expand Up @@ -7043,5 +7043,8 @@
"command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.": "kommando hver time under IMAP-kommandobehandling, som udnytter din krypterede adgangskode fra en IMAP-forbindelse i hukommelsen.",
"However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.": "Denne tilgang var imidlertid for hukommelsesintensiv og forårsagede fejl i hukommelsen i produktionen til store postkasser.",
"Additionally": "Derudover",
"was better than": "var bedre end"
"was better than": "var bedre end",
"Backups of your encrypted mailboxes": "Sikkerhedskopier af dine krypterede postkasser",
"Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from": "Sikkerhedskopier af dine krypterede postkasser laves dagligt. Du kan også øjeblikkeligt anmode om en ny backup eller downloade den seneste backup når som helst fra",
"command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the": "kommando hver dag under IMAP-kommandobehandling, som udnytter din krypterede adgangskode fra en IMAP-forbindelse i hukommelsen. Sikkerhedskopier gemmes, hvis der ikke findes nogen eksisterende sikkerhedskopi, eller hvis"
}
5 changes: 4 additions & 1 deletion locales/de.json
Original file line number Diff line number Diff line change
Expand Up @@ -6343,5 +6343,8 @@
"command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.": "Befehl jede Stunde während der IMAP-Befehlsverarbeitung, wobei Ihr verschlüsseltes Passwort von einer speicherinternen IMAP-Verbindung genutzt wird.",
"However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.": "Dieser Ansatz war jedoch zu speicherintensiv und führte bei der Produktion großer Postfächer zu Speichermangelfehlern.",
"Additionally": "Zusätzlich",
"was better than": "war besser als"
"was better than": "war besser als",
"Backups of your encrypted mailboxes": "Backups Ihrer verschlüsselten Postfächer",
"Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from": "Es werden täglich Backups Ihrer verschlüsselten Postfächer erstellt. Sie können auch sofort ein neues Backup anfordern oder jederzeit das neueste Backup herunterladen",
"command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the": "Befehl jeden Tag während der IMAP-Befehlsverarbeitung, die Ihr verschlüsseltes Passwort aus einer speicherinternen IMAP-Verbindung nutzt. Backups werden gespeichert, wenn kein vorhandenes Backup erkannt wird oder wenn die"
}
5 changes: 4 additions & 1 deletion locales/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -7076,5 +7076,8 @@
"command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.": "command every hour during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection.",
"However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.": "However this approach was too memory intensive, and was causing out of memory errors in production for large mailboxes.",
"Additionally": "Additionally",
"was better than": "was better than"
"was better than": "was better than",
"Backups of your encrypted mailboxes": "Backups of your encrypted mailboxes",
"Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from": "Backups of your encrypted mailboxes are made daily. You can also instantly request a new backup or download the latest backup at anytime from",
"command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the": "command every day during IMAP command processing, which leverages your encrypted password from an in-memory IMAP connection. Backups are stored if no existing backup is detected or if the"
}
Loading

0 comments on commit b32b521

Please sign in to comment.