Skip to content

Latest commit

 

History

History
357 lines (244 loc) · 21.2 KB

LOCAL_DEVELOPMENT.md

File metadata and controls

357 lines (244 loc) · 21.2 KB

Local development

Instruction for getting a local version of DIALS for development purposes. By following these intructions, you will have a fully functional version of DIALS (including the data finding, data downloading and ingesting, frontend rendering, and API) that runs locally on your computer, useful for debugging an issue or developing new features.

Note: make sure you are doing all the modifications to this repository (if any) in your own fork, and then make a pull request to merge them in the production repository; do not make changes directly in the production repository.

Get a local version of the DIALS repository

First, make your own fork of the DIALS repository on GitHub. Then, make a local instance on your computer using the usual git clone, for exampe:

git clone https://github.com/<YOUR GITHUB USERNAME>/dials.git

Get some data used for testing

The first step is to get some DQMIO data that will be used by your local instance of DIALS. There are essentially two methods: the first one is mounting the production CMS-Store path (CERN's T2 disk storage) so it can be read by your local instance as well; the second one is copying a small number of DQMIO files to your local machine. Whatever method you choose, you will anyway need a grid certificate for querying DBS (the central CMS file database, basically the backend behind DAS). Even in case where you will use locally copied files, you still need the grid certificate because the dataset metadata will be queried anyway. Check here how to generate a certificate. You can put the resulting usercert.pem and userkey.pem in a location of your choice, and provide the path as an environment variable (see instructions further below).

DIALS will execute an indexing pipeline, querying all available datasets and all available files within each dataset from DBS. The dataset index just contains the names and some metadata on the available datasets, so querying it is not a problem. However, the file index is used to trigger file ingestion jobs, implying that your local DIALS instance will attempt to download and/or ingest a huge number of DQMIO files. To avoid running out of space, you can provide a dummy DBS response to the indexing pipeline. This dummy response just contains a few files, that should be enough for testing and debugging. An example can be found in etl/mocks/dbs.json. To activate it, you have to provide the path to this file as an environment variable (see instruction further below).

The file ingestion jobs will try to load the DQMIO files specified in the DBS response (production/mocked), if you don't have eos mounted locally or sample files this pipeline will fail acusing FILE_NOT_AVAILABLE.

Accessing DQMIO data from EOS by mounting it locally

The first way of accessing the data is by mounting the appropriate EOS directory locally.

The following command will mount the production data directory from EOS in read-only mode:

mkdir -p ./etl/mocks/DQMIO_for_DIALS_local_dev
sshfs -o default_permissions,ro <YOUR-LXPLUS-USER>@lxplus.cern.ch:/eos/cms ./etl/mocks/DQMIO_for_DIALS_local_dev

In case you need to unmount (turning off the computer/losing connection to lxplus will umount automatically) you can run the following command:

umount ./DQMIO_for_DIALS_local_dev

(Note the use of umount rather than unmount.)

Note: this approach can give issues if you use Docker for running DIALS (see below), since the mount does not seem to be visible inside the Docker container. Therefore, the second approach, discussed below, is recommended.

Accessing DQMIO data by making a local copy

Instead of mounting the production DQMIO data, you can setup a directory that behaves exactly like production. To use this approach, simply copy the content of the folder /eos/project-m/mlplayground/public/DQMIO_for_DIALS_local_dev into a new ./etl/mocks/DQMIO_for_DIALS_local_dev folder, e.g. as follows:

scp -r <YOUR-LXPLUS-USER>@lxplus.cern.ch:/eos/project-m/mlplayground/public/DQMIO_for_DIALS_local_dev ./etl/mocks

Note: you should make sure that your dummy DBS response file is in sync with the files you actually copy to the local directory. They are in sync at the time of writing, but you might need to make modifications if you are using different files than the ones already set in the example.

Building and running a local DIALS instance using a Docker container

There are two broad approaches for setting up the environment and running a local version of DIALS: the first involves setting up the environment yourself, which can be a bit of a mess, but is easier once it is set up correctly; the second one uses a docker container instead, which is easier to set up but a little more tricky to interact with while developing. The latter approach is detailed here, the former in the next section.

Installing Docker

The advantage of using Docker is that you don't need a lot of packages or other dependencies. You will however need the packages pyyaml and python-decouple. You can install them using pip install pyyaml python-decouple.

For installing Docker, follow the steps here: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository. Use the instructions under ‘Install using the apt repository’. If you already have docker installed, you can skip this step, or follow the instructions for upgrading instead of installing.

Then, follow some additional steps here to avoid typing sudo every time: https://docs.docker.com/engine/install/linux-postinstall/. It seems necessary to reboot the computer for these changes to take effect.

Setup the backend environment variables

Create a .env file inside the backend folder with the following variables: Create a .env file inside the backend folder following the template file for native environment here or for docker environment here, and update the placeholders.

  • Note: you need to fill in the application secrets, request them to the application maintainers.

  • Note: optionally, you can also modify the DJANGO_WORKSPACES, for example if you're only interested in a single workspace for your purposes.

Setup the ETL environment variables

Create a .env file inside the etl folder following the template file for native environment here or for docker environment here, and update the placeholders.

  • Note: MOCKED_DBS_FPATH is optional, if do not set it the application will try to ingest all available files in DBS.

Building and launching the Docker container

The etl, backend and frontend ship a Dockerfile that can be used for local development. Furthermore, the DIALS repository ships the script gencompose-self-contained.py to automatically generate a docker-compose file based on the environment variables (e.g. related to how many workspace you want to use for development). You can optionally specify a path to where your local DIALS instance will store its database. This is useful to not have to re-download and/or re-ingest the files every time you launch your DIALS instance; instead it will read the database from where you stored it in a previous session.

./scripts/gencompose-self-contained.py --pg-persistent-path /mnt/dials-pg-data

You can start all services by first building, then starting the database and then starting from the repository root's directory:

docker compose build
docker compose up dials-init
docker compose up

Note: in some cases, Permission denied errors might show up related to the userkey.pem file when starting the indexing pipeline (see instructions below), even though the userkey.pem file is correctly set and publicly readable. If this occurs, you might want to check your user ID and group ID with echo $(id -u) and echo $(id -g) respectively. If they are not equal to the standard (1000), you should replace the docker compose build above by the modified command below:

docker compose build --build-arg UID=$(id -u) --build-arg GID=$(id -g)

After running docker compose up, you should see a whole bunch of messages in the terminal. Once you start seeing messages ending in Events of group {task} enabled by remote., the launch is complete and DIALS is up and running! You can additionally check that DIALS is correctly running by running the command (in a separate terminal) docker ps. If everything went well, you should see something like this:

CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS                   PORTS                                       NAMES
81765d44592e   dials_frontend   "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes             0.0.0.0:3000->3000/tcp, :::3000->3000/tcp   dials-frontend
fe3bfd7c084c   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-csc-bulk
fec0501e6d1c   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes             0.0.0.0:5555->5555/tcp, :::5555->5555/tcp   dials-flower
15d46df20bae   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-csc-priority
684bf5b1c590   dials_backend    "python manage.py ru…"   3 minutes ago   Up 3 minutes             0.0.0.0:8000->8000/tcp, :::8000->8000/tcp   dials-backend
75129249fca6   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-jetmet-bulk
376651d7e7aa   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-common-indexer
5457add93b70   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-egamma-priority
84d7579e8e55   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-ecal-priority
a5c5c169ad25   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-beat-scheduler
1ddefde9b55b   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 2 minutes                                                         dials-tracker-priority
a9a531ec4ba9   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-hcal-bulk
991418bbb585   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-ecal-bulk
dc2ceb05851a   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-private-bulk
3559615921e2   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-hcal-priority
dbeaf5400e28   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-egamma-bulk
69cf337c5220   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-jetmet-priority
dc7a3eb6f4ac   dials_etl        "bash -c 'celery --a…"   3 minutes ago   Up 3 minutes                                                         dials-tracker-bulk
67b69b897c86   postgres         "docker-entrypoint.s…"   22 hours ago    Up 3 minutes (healthy)   0.0.0.0:5432->5432/tcp, :::5432->5432/tcp   postgresql-local
d429b32d6206   redis            "docker-entrypoint.s…"   22 hours ago    Up 3 minutes (healthy)   0.0.0.0:6379->6379/tcp, :::6379->6379/tcp   redis-local

Interacting with the Docker container

Stopping the container

For killing all processes, use the ctrl+c keys.

Starting the data extraction process

In principle, the indexing, downloading and ingestion procedure is automatically launched at the start of every hour. However, for testing purposes, one can force the indexing by running the trigger-indexing script from inside the docker container with:

docker exec -it dials-flower bash -c 'python3 cli.py indexing -s'

Monitoring the queues

Tasks can be monitored trough flower. Open a web browser and enter localhost:5555 in the address bar. The webpage will ask for a login, the default username and password for local development is admin.

Interacting with your local DIALS web interface

Open a web browser and enter localhost:3000 in the address bar.

Interacting with your local DIALS API

Open a web browser and enter localhost:8000 in the address bar.

Cleaning PG database

docker compose up dials-purge

Cleaning Redis database

In case you stop the containers before finishing all tasks in celery queue you may need to clear redis before restarting the ETL from 0, this can be quickly done by flushing the database using redis-cli inside the container:

docker exec -it redis-local bash
redis-cli
flushall

Removing all generated containers

docker compose down
docker compose --profile=donotstart down

Removing all generated images

docker images dials\* -q | xargs docker rmi

Building and running a local DIALS instance natively (i.e. without Docker)

Setting up a local environment

The etl and backend uses Python ^3.10.13 , the third-party dependencies are managed by poetry ^1.7.1 and note that explicitly the etl has a hard dependency on ROOT ^6.30/02. After having all these dependencies you can run poetry install --no-root to install all the etl and backend dependencies specified in pyproject.toml. Then you should configure pre-commit by running poetry run pre-commit install, this will ensure code standardization.

The frontend uses Node.js ^20.11.0 and the third-party dependencies are managed by yarn that can be installed using npm install -g yarn. Then you can run yarn install to install the frontend dependencies specified in package.json. Note that the frontend will not work if code does not agree with eslint configuration, to fix any style problems you can run yarn run lint.

Note: in case you will be using Docker, the above setup steps are not needed. On the other hand, you will need the pyyaml package to generate the docker compose file.

Running PostgresSQL

Considering the main application will only communicate with the database using PostgreSQL DBMS (i.e. not messing with database files directly), running the DBMS decoupled from the main application is less stressful and successfully simulates the production environment. It goes without saying that is much easier to run Postgres using Docker and using the -v flag we can bind-mount the data stored inside the container in the host in order to have a persistent database across development sessions. You can find more information about postgres container here.

docker run -d \
    --name postgresql_local \
    --restart always \
    -e POSTGRES_USER=postgres \
    -e POSTGRES_PASSWORD=postgres \
    -v /mnt/postgresql_local_docker_data:/var/lib/postgresql/data \
    -p 5432:5432 \
    postgres

Running Redis

The same arguments used for running PostgreSQL locally also holds for running Redis (our in-memory database acting as message broker for our job queues) locally. Differently from PostgreSQL in development there is not real need for persistent store, so we will launch the counter ephemerally. You can find more information about redis container here.

docker run -d \
    --restart always \
    --name redis_local \
    -p 6379:6379 \
    redis

Setup the environment variables

Since we are running postgres and redis trough docker outside the same docker network the application would be execute, we just need to update some environment variables.

Backend

Refer to the backend environment variables in docker section and update the following variables:

DJANGO_REDIS_URL=redis://localhost:6379/3
DJANGO_DATABASE_URI=postgres://postgres:postgres@localhost:5432

ETL

Refer to the etl environment variables in docker section and update the following variables:

CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1
CELERY_REDBEAT_URL=redis://localhost:6379/2
DATABASE_URI=postgresql://postgres:postgres@localhost:5432

Running the ETL

From within repository root's directory or etl you can use the start-dev.sh script or the poe task poe start-etl to start the entire etl stack in one command.

Note that before starting the ETL natively you need to setup the database, in order to do this from within etl you can run alembic upgrade head. If you need to clean the database you can run alembic downgrade -1.

Note: If running the commands separated you should execute then inside the etl directory.

Running the Backend

From within repository root's directory or backend you can use the start-dev.sh script or the poe task poe start-api to start the entire backend stack in one command.

Note: If running the commands separated you should execute then inside the backend directory.

Running the Frontend

Inside the frontend directory you can using the script yarn run start to start the react-scripts development server.

Choose how many workspace to execute

The ETL part of this application is memory hungry, each celery queue consumes at idle at least 200MiB of RAM. By default we are using one queue for indexing, one queue for the beat scheduler, each workspace needs 2 queues (bulk and priority) and for each unique primary dataset that all workspace depends on we have 2*N downloading queues (bulk and priority).

It is possible to run locally 5 workspaces with 9 unique primary datasets, if you have at least 8GiB of RAM free (keep in mind that the memory consumption can be grater while ingesting the data, because the application will read some DQMIO root files). If you don´t have all this available run you can test the application with few workspaces and primary datasets.

Lets say you want to test only the tracker workspace with only the ZeroBias dataset, you'll need to modify the following environment variables:

Backend

DJANGO_WORKSPACES={"tracker": "cms-dqm-runregistry-offline-tracker-certifiers"}
DJANGO_DEFAULT_WORKSPACE=tracker

ETL

DATABASES=tracker

And you'll also need to update the etl.config.json file:

{
  ...,
  "workspaces": [
    {
      "name": "tracker",
      "primary_datasets": [
        {
          "dbs_pattern": "/ZeroBias/*Run202*/DQMIO",
          "dbs_instance": "global"
        }
      ],
      "me_startswith": [
        "PixelPhase1/",
        "SiStrip/",
        "Tracking/TrackParameters/highPurityTracks/pt_1/GeneralProperties/TrackEtaPhi_ImpactPoint_GenTk"
      ],
      "bulk_ingesting_queue": "tracker-bulk",
      "priority_ingesting_queue": "tracker-priority"
    }
  ]
  ...
}

If you are very limited in RAM you can also decrease the ingestion chunk size in the same file (beware that the ingestion will be slower when you decrease the chunk size):

{
  ...,
  "common_chunk_size": 1000,
  "th2_chunk_size": 200,
  ...
}

CERN's Keycloak QA environment

The QA server is very useful test environment for CERN's authentication service, but you can't reach it if you outside CERN's network. So it is important to always tunnel your connection trough lxtunnel, since the QA authentication server can only be accessible trough CERN. For doing that you can use sshuttle, it is a “poor man’s VPN” solution which works on macOS and Linux. It uses SSH tunnelling to transparently redirect certain parts of your traffic to the internal network.

This is the command I use (I save it in my zshrc file):

tunnel_to_cern () {
	sshuttle --dns -v -r lxtunnel.cern.ch 137.138.0.0/16 128.141.0.0/16 128.142.0.0/16 188.184.0.0/15 --python=python3
}

The lxtunnel alias resolves to the following ssh config:

Host lxtunnel
        HostName lxtunnel.cern.ch
        User <your-cern-username>
        GSSAPITrustDNS yes
        GSSAPIAuthentication yes
        GSSAPIDelegateCredentials yes

More information on tunneling to CERN can be found here and here.

Beware that if you running with docker, the docker network will not go trough sshuttle tunnel, so you'll need to run all the containers in the host network mode (this do not work on Mac). You can generate a specific docker compose file for this with the script gencompose-network-host.py. Note that you'll need to start the frontend with qa script: yarn run start:qa.