Skip to content

Bring your own tool

Maria Paola Ferri edited this page Oct 28, 2024 · 13 revisions

Virtual Research Environments integrate tools and pipelines to enforce a research community. We offer the possibility to integrate your application in one of these analytical platforms. Please, read though this documentation and contact us for any doubt or suggestion you may have.

Table of contents


Why?

open Virtual Research Environment offers a number of benefits for developers willing to integrate their tools in the platform:

  • Open access platform publicly available
  • A full web-based workbench with user support utilities, OIDC authentication service, and data handlers for local and remote datasets or repositories.
  • Visibility for your tool and organization, with ownership recognition, tailored web forms, help pages and customized viewers.
  • The possibility to add extra value to your tool by complementing it with other related tools already in the platform.
  • A complete control of your tool through the administration panel, for monitoring, logging and managing your tool.

Requirements

The application or pipeline to be integrated should:

  • Be free and open source code
  • Containerized (tested with Docker and Singularity)
  • Run in non-interactive mode in linux-based operating system

How it works?

There are some steps to follow for achieving the integration of a your application as a new VRE tool. As a result, the VRE is able to control the whole tool execution cycle. It:

  1. Automatically build the job-tune form on the web site with the parameters fields and inputs files of the Tool.
  2. Validate input files and parameters (format and data type filtering, maximum/minimum values, etc).
  3. Stage-in the required input files into the Tool working directory in the compute host (if required)
  4. Schedule the Tool in the cloud/HPC backend in a scalable manner
  5. Monitor and log tool progress during the execution
  6. Stage-out output files from the run working directory (if required)
  7. Registration at the website of the output files resulting from the execution

How to bring in a new tool?

Essentially, VRE will need the tool developer to provide three elements:

  • (1) The application code for your tool containing the application code;
  • (2) The VRE RUNNER and VRE_Tool.py adaptation;
  • (3) A list of descriptive metadata fields annotating the tool (i.e. input files requirements, descriptions).

The following steps describe how to achieve it.

  • Step 1       Initial tool description: expected list of inputs and outputs.
  • Step 2       Prepare a VRE RUNNER script for your application
  • Step 3       Annotate and submit the new VRE tool
  • Step 4       Test and debug the new tool from the VRE user interface
  • Step 5       (optional) Prepare a web page to display a summary report on each execution
  • Step 6       Provide documentation for the new tool

STEP 1: Initial tool description

In this step we need to produce a couple of JSON files containing an initial and very basic description of the tool. These JSON files would be used in the following steps as for locally testing the integration of the VRE_RUNNER with the application itself. The files to be manually produced contain the list of arguments, input files and output files that will have the tool once integrated in the VRE. For each infile a local path pointing to a test input file is required. For each outfile, a local path where the result is to be produced is required. These would indeed correspond to the VRE job execution files. In production, these will be generated by the VRE server on each execution initiated by the user at the web interface.

VRE job execution files Description
Run configuration file
i.e. config.json
contains the list of input file selected by the user for a particular run, the values of the arguments, and the list of expected output files.
Infiles' metadata file
i.e. in_metadata.json
contains the metadata of the input files listed as in config.json, including information like the absolute file path

These two files are the standardized input files of the VRE RUNNER installed in the docker.

Additionally, it is handy to have a shell script with the execution command line of the RUNNER (i.e. test.sh). The 2 previous files are passed in as arguments.

How to There are two ways of creating these test files:

  • Manual approach:

    Manually generate the 2 files following the corresponding JSON schemes and taking as reference some examples

  • VRE web interface approach:

    Use the tool's developer admin panel to created these files. The user interface include web forms that allows the edition and validation of a JSON document gathering data about the input files and arguments. If you provide data about your local development environment (i.e. working directories or the location of test input files, VRE will generate a config.json and a in_metadata.json for downloading.

    • Where: in the left navigation menu, Admin → My Tools → Development → (+) Add new tool
    • Requirements: user account with "tool developer" rights. Sign up to the VRE web and contact us to grant your account with the required permissions.

Note:
schemes are being adapted to each VRE project. If the list of accepted values for data-type or file-type is not covering your use-case, just contact us. We'll extend the supported metadata.


STEP 2: Prepare a VRE RUNNER script for your application

VRE RUNNERs are pieces of code that work as adapters between the VRE server and each of the integrated applications or pipelines. Eventually, the RUNNER should:

  1. Consume the VRE job execution files that will be generated when a user submits a new job from the web interface,
  2. Run locally the wrapped application or pipeline (the command that would be the CMD for the docker image of the tool),
  3. Generate a list of output files, information that the VRE server will use to register and display the files at users' workspace.

For preparing the RUNNER, the easiest option is to take as reference a RUNNER template and use it as skeleton to wrap your own application. The template includes a couple of python classes that parse and load VRE job execution files into python dictionaries. The template includes a method that you can customize at your convenience to call the application, module or pipeline to be integrated.

Step-by-step

  1. Fork or clone the repository of the RUNNER template in your local development environment.

    RUNNER template repository
    https://github.com/inab/vre_template_tool documentation
  2. (optional) Run the demo example. The RUNNER template is initially configured to "wrap" an application called demo. It demonstrates the overall flow of a VRE RUNNER.

  3. Include your own job execution files in the repository. You can copy the JSON files generated in STEP 1 into the test/ folder of the repository to replace the basic demo example. They should contain the input files and arguments for a test execution of your tool. You can try again to run the RUNNER as above, but now it's going to fail, as the RUNNER is still expecting the arguments and the input files of the demo example.

    Make sure that the absolute path of the working directory and the input files defined in these JSON files are accessible.

  4. Implement the run method of the VRE_Tool.py so that the function executes the application, module or pipeline to be integrated. The input file locations and argument values as defined in the job execution files are going to be the content of parameters received in the run method.

class myTool(Tool):
    DEFAULT_KEYS = ['execution', 'project', 'description']
    PYTHON_SCRIPT_PATH = "/home/../seqio_tool/extract_sequences.py" ##Your code application path

The $PYTHON_SCRIPT_PATH will point directly to your script. Make sure the path is consistent.

Path consistency

Before running the ultimate VRE Tool dockerized version of your tool, make sure that the path you used in your Dockerfile could be easily called from the $WORK_DIR in the vre_tool This path would never change in the VRE Tool , /home/vre_template_tool/, so make sure to keep it in mind when changing the $PYTHON_SCRIPT_PATH.

You would also need to specify in this code the inputs, arguments. The default is one input_file and one argument. This is how the runToolExecution section of VRE_Tool_Template.py has been modify to adapt to the SeqIO tool dependencies:

try:
            # Get input files
            input_file_1 = input_files.get('fasta_file')
            if not os.path.isabs(input_file_1):
                input_file_1 = os.path.normpath(os.path.join(self.parent_dir, input_file_1))

            input_file_2 = input_files.get('ids_file')
            if not os.path.isabs(input_file_2):
                input_file_2 = os.path.normpath(os.path.join(self.parent_dir, input_file_2))

            # TODO: add more input files to use, if it is necessary for you

            # Get arguments
            argument_1 = self.arguments.get('min_lenght')
            if argument_1 is None:
                errstr = "min_lenght must be defined."
                logger.fatal(errstr)
                raise Exception(errstr)

Finally, you would need to change the cmd command in the same code section,following your requirments for your script, who is gonna be called everytime the user would launch a job request.

In the template version:

cmd = [
                'bash', '/home/my_demo_pipeline.sh', output_file_path
            ]

In the example SeqIO tool:


cmd = [
                    'python3',
                    self.parent_dir + self.PYTHON_SCRIPT_PATH,  # extract_sequences.py
                    input_file_1,  # fasta file
                    input_file_2, #ids file
                    output_file_path,
                    argument_1 #min_lenght
            ]

Remember to change the name of the Dockerfile from VRE_Tool_Template.py to VRE_Tool.py for it to run and test.

  1. The RUNNER will be ready when the wrapped application is properly executed and the output files are generated at the location specified in output_files[].file.path. These paths are usually defined in config.json file. Alternatively, if the name and number of the output files cannot be known before the execution, you should extend the VRE_Tool.run method to define the file.path attribute into the output_files dictionary. The RUNNER will write down it into the out-files metadata file (i.e. out_metadata.json).

    Make sure your output files are generated in the root of the working directory

  2. Save your RUNNER in a GIT repository publicly available. In the same way than the template RUNNER, document the installation and include some test datasets, considering also the installation of the wrapped application itself: extra modules, dependencies, libraries, etc. VRE administrators will eventually install this repository in the VRE cloud.


STEP 3: Annotate and submit the new VRE tool

Once the RUNNER is successfully executing the application in your local development environment, it is time to ask for registering the new tool to the corresponding VRE server. To do so, some descriptive metadata on the new application is required, i.e., tool descriptions and titles, ownership, references, keywords, etc.

Again, two approaches are supported:

  • Manual approach:

    Generate the tool specification file taking as reference some examples to fully annotate the new tool

    Integrate the Tool in the MongoDB corresping section following the example here.

    In /openVRE/tools, make a new directory copying the tool_skeleton one with the name of the tool (same ID that was used in Mongo);

    Modify the input.php file (especially the $tool_id) based on the requirments of the tools (more inputs, more arguments);

    Modify the /openVRE/tools/$your_tool/assets/home/ the index.html file, for your tool to be consinstent with the mongoDB.

    Save your tool specification file in your repository and send it all together to VRE administrators. They will validate the data and register the tool the the VRE.

  • VRE web interface approach:

    Go to the tools' developer administration panel and fill in the missing information for the tool entry generated in STEP 1 when preparing the job execution files.

    • Where:
      1. in the left navigation menu, Admin → My Tools → Development →. Find the row corresponding to your tool in preparation.
      2. Fill in the two last columns:
        • bring us your code: URL of the your RUNNER's repository created in STEP 2
        • Define Tool: edit online the template JSON document. Find a title for your tool, a description, etc. All this information will be displayed to the user on the web application.
      3. Send the Submit button. It will send an email to VRE administrators, who will validate the data and register the tool to the VRE.

After approval, the tool will be accessible on the web application in test mode, i.e., only tool developers and administrators will be able to find and run the tool at the VRE.


STEP 4: Test and debug the new tool from the VRE

As Tool developer, you'll be able to run your tool from the VRE as any other tool, although some extra information will be available for you to easy monitoring and debugging:

  • Extra Job information: access to the complete metadata of jobs, including identifiers, full paths, submitted command lines and access to the configuration JSONs
  • Tool Panel Administration: right to enable/disable your tool, together with some statistics on the runs carried out at the VRE

STEP 5: Prepare a custom report viewer for each execution

If any of the visualizers is not display your results in the appropriated way, you can create a dynamics HTML page that is going to be framed within the platform in the Run Folder > View Results tab, at the workspace

STEP 6: Provide documentation for the new tool

  • Tool logo: minimal resolution of 400px x 400px
  • Sample datasets
  • Help pages