Options for importing initial database #98

gbradley · 2024-04-05T17:20:36Z

Hopefully this isn't too niche a use-case...

Harbor has automated 95% of the work needed for us to quickly send preview sites to clients. The one step that I still have to do (and can't easily delegate) is importing initial data after the site has been provisioned.

As it currently stands, one option would be to write something custom inside FORGE_DEPLOY_SCRIPT; this would need to check to see if a specific DB table was empty, and if so, import the DB either by seeding or importing from a SQL file. Either way it would likely mean commiting a command into the laravel app to only be called inside harbor environments.

While it would work in theory, a nicer alternative might be to expose some additional environment variables, for example:

`FORGE_DB_SEED`

This would be a boolean, which if true, would run php artisan db:seed after the site was deployed for the first time. Useful when you want to fill the preview site with test data that's already committed in your repo.

`FORGE_DB_SQL`

This would be a path to a SQL file on the server; if present, it would be imported after the database was created. Useful when the preview site needs more realistic data that's not commited in the repo (due to size, data security etc).

In terms of implementation I suppose both of these would need to build and then execute a custom site command, before polling for the outcome.

Is this something others have come across / are there alternative ideas? If it seems like something useful I'm happy to look into it.

The text was updated successfully, but these errors were encountered:

mehrancodes · 2024-04-05T20:48:50Z

This looks really handy for DB seeding. Also got a similar request here recently which I note it down. But I'm not sure how tough it'll be or if it's even doable right now.

In real-world there can be some difficulties with deployment. For example in my project there is no way to seed testing database. GitHub Action creates weekly artifact of truncated, anonymized version of database. Currently I have script in project that connect via API to GitHub, downloads artifact and import it to database. This needs to set api key for GitHub directly in project, but I think that job should be delegated to workflow. Unfortunately I don't know if Forge API let upload file to use in deployment script. In summary:

Harbor should have possibility to download database from other GitHub Action using artifacts

Harbor should upload database file to server

Deployment script should use database file and import it.

neilgilmour · 2024-04-10T12:50:19Z

This would be really useful for me too. Currently I have an sql file stored on the server, and then an artisan command that restores it. So for each preview site that is set up, the deploy script runs the artisan command.

gbradley · 2024-04-10T13:38:45Z

@neilgilmour Does your command import the DB only on the first run, or does it restore on every deployment? I can see the use-case for both so I'l think about how that might be configured.

neilgilmour · 2024-04-12T12:25:07Z

Good question @gbradley. I haven't quite worked out whether to run harbor on pull_request opened only and set autodeploy, or just run harbor each time the head_ref is updated. For us this happens frequently as we work on the feature being PR'd.

For now I'm not using Forge's auto-deploy, and harbor runs if: contains(fromJson('["opened", "reopened", "edited", "synchronize", "ready_for_review"]'), github.event.action). And normally I only want the db to be restored once.

Digging through the harbor source I saw $this->markSiteAsNewlyMade() which is set during createSite(). I'm doing something vaguely similar I think - my db command is as follows;

        public function handle()
    {
        if (!App::environment(['local', 'preview', 'staging'])) {
            return 1;
        }

        if (
            /**
             * This checks if the command has already run. If it has
             * we skip it. This is useful because we don't want
             * to re-install the db every time we push code.
             **/
            $this->hasAlreadyRun() &&

            /**
             * This allows me to force the command to re-install
             * if I set an env var. I'm doing this in GitHub
             * actions by checking the commit message for
             * "['re-install db']".  I'm sure there's
             * a more elegant way to do this but...
             * ¯\_(ツ)_/¯
             **/
            !Config::boolean('app.reinstall_db_anonymous')
        ) {
            $this->info('Already run, skipping');

            return 0;
        }

        $this->info('** Seeding Dummy data **');
        $file = '/home/forge/staging.sql.gz';
        $password = config('database.connections.mysql.password');
        $credentials = config('database.connections.mysql');

        $command = collect([
            "gunzip < {$file}",
            '|',
            'mysql',
            '-u',
            $credentials['username'],
            !empty($password) ? "-p{$password}" : '',
            '-P',
            $credentials['port'],
            isset($credentials['host']) ? '-h ' . $credentials['host'] : '',
            $credentials['database'],
        ])
            ->filter()
            ->implode(' ');
        $import = Process::timeout(300)->run($command);

        $this->info($import->output());
        $this->error($import->errorOutput());

        if ($import->successful()) {
            $this->info('Success');

            // Here we create a file to indicate that the
            // command has already run for this site.
            Storage::put('db-anonymous-installed', '');

            return 0;
        }

        if ($import->failed()) {
            $this->error('Restore failed');
        }

        return 1;
    }

    private function hasAlreadyRun(): bool
    {
        return Storage::exists('db-anonymous-installed');
    }

gbradley · 2024-04-12T17:27:42Z

Given the above I'm now thinking along these variables & behaviour:

FORGE_DB_IMPORT_ON_DEPLOYMENT

This variable would control when the import is run:

default is false; import would only run on site provision
if true, import would additionally run on deployment

FORGE_DB_IMPORT

This variable would control how the import is performed:

default is boolean false, i.e. no import
if boolean true, Harbor will seed the database via artisan
if string, would point to a SQL, .zip or .gz file on the server, and would import from that file

I'd be interested in how @neilgilmour controls the env file based on the commit message, as that could be used to dynamically set FORGE_DB_IMPORT_ON_DEPLOYMENT (I tried that on GitHub Actions without success).

As for downloading a DB file via GitHub Action artifacts and uploading it to Forge, I'm not sure. That sounds like a use-case for some kind of custom hooks system where users could tell Harbor to run commands at certain points in the pipeline - something like #100 .

mehrancodes · 2024-04-14T19:27:34Z

Having these variables covers both db-seeding and db-importing perfectly.

I think @neilgilmour might use something like if: contains('[re-install db]', github.event.head_commit.message), but I haven't tested it yet. I like the idea of how it helps to refresh the database whenever we need :)

neilgilmour · 2024-04-25T14:28:56Z

Unfortunately github.event.head_commit.message doesn't show up for pull requests, so I had to go a bit more convoluted;

We have to checkout the code and then we have a few options for getting the commit message. I opted to checkout the head_ref rather than the whole repo. I only use this on pull requests so in my case this is safe;

jobs:
  get-commit-message:
    runs-on: ubuntu-latest
    outputs:
      message: ${{ steps.get-msg.outputs.message }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - name: Echo commit msg
        id: get-msg
        run: |
          msg=$(git log -1 --pretty=format:%s)
          echo "message=$msg" >> "$GITHUB_OUTPUT"

You can see I store it in the job's outputs so I can use it later on. You can either do this in the if: section of a job/step like this;

  check-message:
    runs-on: ubuntu-latest
    needs: get-commit-message
    steps:
      - name: Run when string exists
        if: ${{ contains(needs.get-commit-message.outputs.message, '[test]') }}
        run: echo 'found'
      - name: Run when string is absent
        if: ${{ ! contains(needs.get-commit-message.outputs.message, '[test]') }}
        run: echo 'not found'

But what I'm quite enjoying is using it inside the deploy script. We can either do the logic inside the deploy script, or use the fact that the workflow parses the ${{ }} blocks and then what ends up being used as the deploy script on the server is the result of the ${{ }}. So we can actually use Github's ternary syntax ${{ expression && 'echo "true"' || 'echo "false"' }} like this;

      - name: Provision
        env:
          FORGE_DEPLOY_SCRIPT: |
            cd $FORGE_SITE_PATH
            git pull origin $FORGE_SITE_BRANCH

            # more deploy stuff here, and then...
            # we can run an artisan command based on the PR opened, syncronize etc
            ${{ github.event.action == 'opened' && '$FORGE_PHP artisan db:install' || 'echo "Not opened, so not installing db"' }}

            # we could check the commit message for tags...
            ${{ contains(needs.get-commit-message.outputs.message, '[maintenance]') && '$FORGE_PHP artisan down }}
            ${{ contains(needs.get-commit-message.outputs.message, '[refresh]') && '$FORGE_PHP artisan migrate:fresh --seed --force }}

You get the idea!

gbradley · 2024-05-14T09:50:56Z

Thanks @neilgilmour i did think it wouldn’t be straightforward!

I have made some progress on this feature, will try to do some testing over the next few days.

mehrancodes added the enhancement New feature or request label Apr 5, 2024

gbradley mentioned this issue May 17, 2024

Options for importing database #107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Options for importing initial database #98

Options for importing initial database #98

gbradley commented Apr 5, 2024 •

edited

Loading

mehrancodes commented Apr 5, 2024

neilgilmour commented Apr 10, 2024

gbradley commented Apr 10, 2024

neilgilmour commented Apr 12, 2024 •

edited

Loading

gbradley commented Apr 12, 2024

mehrancodes commented Apr 14, 2024

neilgilmour commented Apr 25, 2024

gbradley commented May 14, 2024

Options for importing initial database #98

Options for importing initial database #98

Comments

gbradley commented Apr 5, 2024 • edited Loading

FORGE_DB_SEED

FORGE_DB_SQL

mehrancodes commented Apr 5, 2024

neilgilmour commented Apr 10, 2024

gbradley commented Apr 10, 2024

neilgilmour commented Apr 12, 2024 • edited Loading

gbradley commented Apr 12, 2024

FORGE_DB_IMPORT_ON_DEPLOYMENT

FORGE_DB_IMPORT

mehrancodes commented Apr 14, 2024

neilgilmour commented Apr 25, 2024

gbradley commented May 14, 2024

gbradley commented Apr 5, 2024 •

edited

Loading

`FORGE_DB_SEED`

`FORGE_DB_SQL`

neilgilmour commented Apr 12, 2024 •

edited

Loading