Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for importing initial database #98

Open
gbradley opened this issue Apr 5, 2024 · 8 comments
Open

Options for importing initial database #98

gbradley opened this issue Apr 5, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@gbradley
Copy link

gbradley commented Apr 5, 2024

Hopefully this isn't too niche a use-case...

Harbor has automated 95% of the work needed for us to quickly send preview sites to clients. The one step that I still have to do (and can't easily delegate) is importing initial data after the site has been provisioned.

As it currently stands, one option would be to write something custom inside FORGE_DEPLOY_SCRIPT; this would need to check to see if a specific DB table was empty, and if so, import the DB either by seeding or importing from a SQL file. Either way it would likely mean commiting a command into the laravel app to only be called inside harbor environments.

While it would work in theory, a nicer alternative might be to expose some additional environment variables, for example:

FORGE_DB_SEED

This would be a boolean, which if true, would run php artisan db:seed after the site was deployed for the first time. Useful when you want to fill the preview site with test data that's already committed in your repo.

FORGE_DB_SQL

This would be a path to a SQL file on the server; if present, it would be imported after the database was created. Useful when the preview site needs more realistic data that's not commited in the repo (due to size, data security etc).

In terms of implementation I suppose both of these would need to build and then execute a custom site command, before polling for the outcome.


Is this something others have come across / are there alternative ideas? If it seems like something useful I'm happy to look into it.

@mehrancodes
Copy link
Owner

This looks really handy for DB seeding. Also got a similar request here recently which I note it down. But I'm not sure how tough it'll be or if it's even doable right now.

In real-world there can be some difficulties with deployment. For example in my project there is no way to seed testing database. GitHub Action creates weekly artifact of truncated, anonymized version of database. Currently I have script in project that connect via API to GitHub, downloads artifact and import it to database. This needs to set api key for GitHub directly in project, but I think that job should be delegated to workflow. Unfortunately I don't know if Forge API let upload file to use in deployment script. In summary:

  • Harbor should have possibility to download database from other GitHub Action using artifacts
  • Harbor should upload database file to server
  • Deployment script should use database file and import it.

@mehrancodes mehrancodes added the enhancement New feature or request label Apr 5, 2024
@neilgilmour
Copy link

This would be really useful for me too. Currently I have an sql file stored on the server, and then an artisan command that restores it. So for each preview site that is set up, the deploy script runs the artisan command.

@gbradley
Copy link
Author

@neilgilmour Does your command import the DB only on the first run, or does it restore on every deployment? I can see the use-case for both so I'l think about how that might be configured.

@neilgilmour
Copy link

neilgilmour commented Apr 12, 2024

Good question @gbradley. I haven't quite worked out whether to run harbor on pull_request opened only and set autodeploy, or just run harbor each time the head_ref is updated. For us this happens frequently as we work on the feature being PR'd.

For now I'm not using Forge's auto-deploy, and harbor runs if: contains(fromJson('["opened", "reopened", "edited", "synchronize", "ready_for_review"]'), github.event.action). And normally I only want the db to be restored once.

Digging through the harbor source I saw $this->markSiteAsNewlyMade() which is set during createSite(). I'm doing something vaguely similar I think - my db command is as follows;

        public function handle()
    {
        if (!App::environment(['local', 'preview', 'staging'])) {
            return 1;
        }

        if (
            /**
             * This checks if the command has already run. If it has
             * we skip it. This is useful because we don't want
             * to re-install the db every time we push code.
             **/
            $this->hasAlreadyRun() &&

            /**
             * This allows me to force the command to re-install
             * if I set an env var. I'm doing this in GitHub
             * actions by checking the commit message for
             * "['re-install db']".  I'm sure there's
             * a more elegant way to do this but...
             * ¯\_(ツ)_/¯
             **/
            !Config::boolean('app.reinstall_db_anonymous')
        ) {
            $this->info('Already run, skipping');

            return 0;
        }

        $this->info('** Seeding Dummy data **');
        $file = '/home/forge/staging.sql.gz';
        $password = config('database.connections.mysql.password');
        $credentials = config('database.connections.mysql');

        $command = collect([
            "gunzip < {$file}",
            '|',
            'mysql',
            '-u',
            $credentials['username'],
            !empty($password) ? "-p{$password}" : '',
            '-P',
            $credentials['port'],
            isset($credentials['host']) ? '-h ' . $credentials['host'] : '',
            $credentials['database'],
        ])
            ->filter()
            ->implode(' ');
        $import = Process::timeout(300)->run($command);

        $this->info($import->output());
        $this->error($import->errorOutput());

        if ($import->successful()) {
            $this->info('Success');

            // Here we create a file to indicate that the
            // command has already run for this site.
            Storage::put('db-anonymous-installed', '');

            return 0;
        }

        if ($import->failed()) {
            $this->error('Restore failed');
        }

        return 1;
    }

    private function hasAlreadyRun(): bool
    {
        return Storage::exists('db-anonymous-installed');
    }

@gbradley
Copy link
Author

Given the above I'm now thinking along these variables & behaviour:

FORGE_DB_IMPORT_ON_DEPLOYMENT

This variable would control when the import is run:

  • default is false; import would only run on site provision
  • if true, import would additionally run on deployment

FORGE_DB_IMPORT

This variable would control how the import is performed:

  • default is boolean false, i.e. no import
  • if boolean true, Harbor will seed the database via artisan
  • if string, would point to a SQL, .zip or .gz file on the server, and would import from that file

I'd be interested in how @neilgilmour controls the env file based on the commit message, as that could be used to dynamically set FORGE_DB_IMPORT_ON_DEPLOYMENT (I tried that on GitHub Actions without success).

As for downloading a DB file via GitHub Action artifacts and uploading it to Forge, I'm not sure. That sounds like a use-case for some kind of custom hooks system where users could tell Harbor to run commands at certain points in the pipeline - something like #100 .

@mehrancodes
Copy link
Owner

Having these variables covers both db-seeding and db-importing perfectly.

I think @neilgilmour might use something like if: contains('[re-install db]', github.event.head_commit.message), but I haven't tested it yet. I like the idea of how it helps to refresh the database whenever we need :)

@neilgilmour
Copy link

Unfortunately github.event.head_commit.message doesn't show up for pull requests, so I had to go a bit more convoluted;

We have to checkout the code and then we have a few options for getting the commit message. I opted to checkout the head_ref rather than the whole repo. I only use this on pull requests so in my case this is safe;

jobs:
  get-commit-message:
    runs-on: ubuntu-latest
    outputs:
      message: ${{ steps.get-msg.outputs.message }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}
      - name: Echo commit msg
        id: get-msg
        run: |
          msg=$(git log -1 --pretty=format:%s)
          echo "message=$msg" >> "$GITHUB_OUTPUT"

You can see I store it in the job's outputs so I can use it later on. You can either do this in the if: section of a job/step like this;

  check-message:
    runs-on: ubuntu-latest
    needs: get-commit-message
    steps:
      - name: Run when string exists
        if: ${{ contains(needs.get-commit-message.outputs.message, '[test]') }}
        run: echo 'found'
      - name: Run when string is absent
        if: ${{ ! contains(needs.get-commit-message.outputs.message, '[test]') }}
        run: echo 'not found'

But what I'm quite enjoying is using it inside the deploy script. We can either do the logic inside the deploy script, or use the fact that the workflow parses the ${{ }} blocks and then what ends up being used as the deploy script on the server is the result of the ${{ }}. So we can actually use Github's ternary syntax ${{ expression && 'echo "true"' || 'echo "false"' }} like this;

      - name: Provision
        env:
          FORGE_DEPLOY_SCRIPT: |
            cd $FORGE_SITE_PATH
            git pull origin $FORGE_SITE_BRANCH

            # more deploy stuff here, and then...
            # we can run an artisan command based on the PR opened, syncronize etc
            ${{ github.event.action == 'opened' && '$FORGE_PHP artisan db:install' || 'echo "Not opened, so not installing db"' }}

            # we could check the commit message for tags...
            ${{ contains(needs.get-commit-message.outputs.message, '[maintenance]') && '$FORGE_PHP artisan down }}
            ${{ contains(needs.get-commit-message.outputs.message, '[refresh]') && '$FORGE_PHP artisan migrate:fresh --seed --force }}

You get the idea!

@gbradley
Copy link
Author

Thanks @neilgilmour i did think it wouldn’t be straightforward!

I have made some progress on this feature, will try to do some testing over the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants