Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: review of documenting using pymapdl on clusters (#3466) #3506

Merged
merged 4 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/changelog.d/3466.documentation.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
feat: passing tight integration env vars to mapdl
docs: documenting using pymapdl on clusters
65 changes: 32 additions & 33 deletions doc/source/user_guide/hpc/pymapdl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,24 @@
.. _ref_hpc_pymapdl_job:

=======================
PyMAPDL on HPC Clusters
PyMAPDL on HPC clusters
=======================


Introduction
============

PyMAPDL communicates with MAPDL using the gRPC protocol.
This protocol offers many advantages and features, for more information
This protocol offers the many advantages and features described in
see :ref:`ref_project_page`.
One of these features is that it is not required to have both,
PyMAPDL and MAPDL processes, running on the same machine.
This possibility open the door to many configurations, depending
on whether you run them both or not on the HPC compute nodes.
Additionally, you might to be able interact with them (``interactive`` mode)
One of these features is that it is not required to have both
PyMAPDL and MAPDL processes running on the same machine.
This possibility opens the door to many configurations, depending
on whether or not you run them both on the HPC compute nodes.
Additionally, you might be able interact with them (``interactive`` mode)
or not (``batch`` mode).

Currently, the supported configurations are:

* :ref:`ref_pymapdl_batch_in_cluster_hpc`
For information on supported configurations, see :ref:`ref_pymapdl_batch_in_cluster_hpc`.


Since v0.68.5, PyMAPDL can take advantage of the tight integration
Expand All @@ -31,26 +29,26 @@ to that job.
For instance, if a SLURM job has allocated 8 nodes with 4 cores each,
then PyMAPDL launches an MAPDL instance which uses 32 cores
spawning across those 8 nodes.
This behaviour can turn off if passing the environment variable
:envvar:`PYMAPDL_ON_SLURM` or passing the argument `detect_HPC=False`
to :func:`launch_mapdl() <ansys.mapdl.core.launcher.launch_mapdl>`.
This behavior can turn off if passing the :envvar:`PYMAPDL_ON_SLURM`
environment variable or passing the ``detect_HPC=False`` argument
to the :func:`launch_mapdl() <ansys.mapdl.core.launcher.launch_mapdl>` function.


.. _ref_pymapdl_batch_in_cluster_hpc:

Submit a PyMAPDL batch job to the cluster from the entrypoint node
==================================================================

Many HPC clusters allow their users to login in a machine using
``ssh``, ``vnc``, ``rdp``, or similar technologies and submit a job
Many HPC clusters allow their users to log into a machine using
``ssh``, ``vnc``, ``rdp``, or similar technologies and then submit a job
to the cluster from there.
This entrypoint machine, sometimes known as *head node* or *entrypoint node*,
This entrypoint machine, sometimes known as the *head node* or *entrypoint node*,
might be a virtual machine (VDI/VM).

In such cases, once the Python virtual environment with PyMAPDL is already
set and is accessible to all the compute nodes, launching a
PyMAPDL job from the entrypoint is very easy to do using ``sbatch`` command.
Using ``sbatch`` command, the PyMAPDL runs and launches an MAPDL instance in
PyMAPDL job from the entrypoint node is very easy to do using the ``sbatch`` command.
When the ``sbatch`` command is used, PyMAPDL runs and launches an MAPDL instance in
the compute nodes.
No changes are needed on a PyMAPDL script to run it on an SLURM cluster.

Expand All @@ -61,10 +59,10 @@ First the virtual environment must be activated in the current terminal.
user@entrypoint-machine:~$ export VENV_PATH=/my/path/to/the/venv
user@entrypoint-machine:~$ source $VENV_PATH/bin/activate

Once the virtual environment has been activated, you can launch any Python
script if they do have the proper Python shebang (``#!/usr/bin/env python3``).
Once the virtual environment is activated, you can launch any Python
script that has the proper Python shebang (``#!/usr/bin/env python3``).

For instance, to launch the following Python script ``main.py``:
For instance, assume that you want to launch the following ``main.py`` Python script:

.. code-block:: python
:caption: main.py
Expand All @@ -80,21 +78,21 @@ For instance, to launch the following Python script ``main.py``:

mapdl.exit()

You can just run in your console:
You can run this command in your console:

.. code-block:: console

(venv) user@entrypoint-machine:~$ sbatch main.py

Alternatively, you can remove the shebang from the python file and use a
Alternatively, you can remove the shebang from the Python file and use a
Python executable call:

.. code-block:: console

(venv) user@entrypoint-machine:~$ sbatch python main.py

Additionally, you can change the amount of cores used in your
job, by setting the :envvar:`PYMAPDL_NPROC` to the desired value.
Additionally, you can change the number of cores used in your
job by setting the :envvar:`PYMAPDL_NPROC` environment variable to the desired value.

.. code-block:: console

Expand All @@ -107,8 +105,8 @@ You can also add ``sbatch`` options to the command:
(venv) user@entrypoint-machine:~$ PYMAPDL_NPROC=4 sbatch main.py


For instance, to launch a PyMAPDL job which start a four cores MAPDL instance
on a 10 CPU SLURM job, you can use:
For instance, to launch a PyMAPDL job that starts a four-core MAPDL instance
on a 10-CPU SLURM job, you can run this command:

.. code-block:: console

Expand All @@ -118,13 +116,13 @@ on a 10 CPU SLURM job, you can use:
Using a submission script
-------------------------

In case you need to customize more your job, you can create a SLURM
submission script to submit a PyMAPDL job.
If you need to customize your PyMAPDL job further, you can create a SLURM
submission script for submitting it.
In this case, you must create two files:

- Python script with the PyMAPDL code
- Bash script that activates the virtual environment and calls the
Python script.
Python script

.. code-block:: python
:caption: main.py
Expand Down Expand Up @@ -156,9 +154,9 @@ In this case, you must create two files:
# Set env vars
export MY_ENV_VAR=VALUE

# Activating Python virtual environment
# Activate Python virtual environment
source /home/user/.venv/bin/activate
# Calling Python script
# Call Python script
python main.py

To start the simulation, you use this code:
Expand All @@ -170,7 +168,7 @@ To start the simulation, you use this code:
In this case, the Python virtual environment does not need to be activated
before submission since it is activated later in the script.

The expected output of the job is
The expected output of the job follows:

.. code-block:: text

Expand All @@ -182,3 +180,4 @@ Python script.
This bash script performs tasks such as creating environment variables,
moving files to different directories, and printing to ensure your
configuration is correct.

6 changes: 3 additions & 3 deletions doc/source/user_guide/hpc/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ Requirements
Using PyMAPDL in an HPC environment managed by SLURM scheduler has certain
requirements:

* **An Ansys installation must be accessible from all the compute nodes**.
* **An Ansys installation must be accessible from all the compute nodes.**
This normally implies that the ``ANSYS`` installation directory is in a
shared drive or directory. Your HPC cluster administrator
should provide you with the path to the ``ANSYS`` directory.

* **A compatible Python installation must be accessible from all the compute
nodes**.
nodes.**
For compatible Python versions, see :ref:`ref_pymapdl_installation`.

Additionally, you must perform a few key steps to ensure efficient job
Expand Down Expand Up @@ -123,6 +123,7 @@ then you can run that script using:

This command might take a minute or two to complete, depending on the amount of
free resources available in the cluster.

On the console, you should see this output:

.. code-block:: text
Expand All @@ -132,4 +133,3 @@ On the console, you should see this output:

If you see an error in the output, see :ref:`ref_hpc_troubleshooting`,
especially :ref:`ref_python_venv_not_accesible`.

51 changes: 25 additions & 26 deletions doc/source/user_guide/hpc/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Troubleshooting

Debugging jobs
--------------
- Use ``--output`` and ``--error`` directives in batch scripts to captures
standard output and error messages to specific files.
- Use ``--output`` and ``--error`` directives in batch scripts to capture
standard output and error messages to specific files:

.. code-block:: bash

Expand All @@ -23,7 +23,8 @@ Debugging jobs

- Check SLURM logs for error messages and debugging information.
- It is also good idea to print the environment variables in your bash script, using
``printenv``. Additionally, you can filter them using ``grep``.
``printenv`` *bash* command.
Additionally, you can filter its output using ``grep`` *bash* command.

.. code-block:: bash

Expand All @@ -41,7 +42,7 @@ Debugging jobs
- Use PyMAPDL logging to printout valuable information. To activate this, see
:ref:`ref_debug_pymapdl`.

- In case you need more help, visit :ref:`ref_troubleshooting`.
- If you need more help, see :ref:`ref_troubleshooting`.


.. _ref_python_venv_not_accesible:
Expand All @@ -50,14 +51,14 @@ Python virtual environment is not accessible
--------------------------------------------
If there is an error while testing the Python installation, it might mean
that the Python environment is not accessible to the compute nodes.
For example, given the following *bash* script `test.sh`:
For example, assume you have the following `test.sh` *bash* script:

.. code-block:: bash

source /home/user/.venv/bin/activate
python -c "from ansys.mapdl import core as pymapdl; pymapdl.report()"

The following output is shown after running in the terminal:
The following output is shown after running this script in the terminal:

.. code-block:: console

Expand All @@ -68,18 +69,18 @@ The following output is shown after running in the terminal:
File "<string>", line 1, in <module>
ImportError: No module named ansys.mapdl

As the output shows, PyMAPDL could not be found, meaning that either:
As the output shows, PyMAPDL could not be found, indicating one of the following problems:

* The virtual environment does not have PyMAPDL installed.
See :ref:`ref_install_pymapdl_on_hpc`.

* Or the script did not activate properly the virtual environment
* The script did not properly activate the virtual environment
(``/home/user/.venv``).

For the second reason, there could be a number of reasons.
The second problem can occur due to a number of reasons.
One of them is that the system Python distribution used to create
the virtual environment is not accessible from the compute nodes
due to one of these reasons:
because of one of these situations:

- The virtual environment has been created in a directory that is
not accessible from the nodes. In this case, your terminal might
Expand All @@ -92,21 +93,20 @@ due to one of these reasons:
bash: .venv/bin/activate: No such file or directory

Depending on your terminal configuration, the preceding error might be
sufficient to exit the terminal process, or not.
If not, the execution continues, and the subsequent ``python`` call is
executed using the default python executable.
It is very likely that the default ``python`` executable does not have
PyMAPDL installed, hence the ``ImportError`` error showed preceding might
sufficient to exit the terminal process. If it is not, the execution continues,
and the subsequent ``python`` call is executed using the default Python executable.
It is very likely that the default Python executable does not have
PyMAPDL installed. Hence the ``ImportError`` error might
appear too.

- The virtual environment has been created from a Python executable that is
not available to the compute nodes. Hence, the virtual environment is not
activated.
For example, you might be creating the virtual environment Using
For example, you might be creating the virtual environment using
Python 3.10, but only Python 3.8 is available from the compute nodes.
You can test which Python executable the cluster is using by starting an
interactive session in a compute node with this code to list all commands
which starts with ``python``:
that start with ``python``:

.. code-block:: console

Expand All @@ -116,12 +116,11 @@ due to one of these reasons:
.. the approach to solve this comes from:
https://stackoverflow.com/questions/64188693/problem-with-python-environment-and-slurm-srun-sbatch

It should be noticed the preceding approach assumes that all the nodes have similar
configuration, hence all of them should have the same Python installations
It should be noted that the preceding approach assumes that all the nodes have similar
configurations. Hence, all of them should have the same Python installations
available.

It is also convenient to be aware that environment variable modules can be
used to activate Python installations.
You can also use environment variable modules to activate Python installations.
For more information, see :ref:`ref_envvar_modules_on_hpc`.


Expand Down Expand Up @@ -158,10 +157,10 @@ In certain HPC environments the possibility of installing a different Python
version is limited for security reasons.
In such cases, the Python distribution available in the Ansys installation
can be used.
This Python distribution is a customized Python (CPython) version for Ansys
products use only.
Its use is **discouraged** except for very advanced users and special use
cases.
This Python distribution is a customized Python (CPython) version for use only by Ansys
products.
Its use is **discouraged** unless you are a very advanced user or have a special use
case.

This Python distribution is in the following directory, where
``%MAPDL_VERSION%`` is the three-digit Ansys version:
Expand All @@ -178,7 +177,7 @@ For example, here is the directory for Ansys 2024 R2:


In Ansys 2024 R1 and later, the unified installer includes CPython 3.10.
Earlier versions include CPython 3.7
Earlier Ansys versions include CPython 3.7
(``/commonfiles/CPython/3_7/linx64/Release/python``).

Because the Ansys installation must be available to all
Expand Down
11 changes: 5 additions & 6 deletions doc/source/user_guide/mapdl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1099,20 +1099,19 @@ Environment variables
There are several PyMAPDL-specific environment variables that can be
used to control the default behavior of PyMAPDL or launching MAPDL.

It should be mentioned that these environment variables do not have
These environment variables do not have
priority over the arguments given in the corresponding functions.
For instance:
Consider this command:

.. code-block:: console

user@machine:~$ export PYMAPDL_PORT=50052
user@machine:~$ python -c "from ansys.mapdl.core import launch_mapdl; mapdl=launch_mapdl(port=60053)"

The preceding command launches an MAPDL instance on the port 60053,
because the argument ``port`` has priority over the environment
variable :envvar:`PYMAPDL_PORT`.
This command launches an MAPDL instance on port 60053
because the ``port`` argument has priority over the :envvar:`PYMAPDL_PORT`
environment variable. The following table describes all arguments.

These are described in the following table:

+---------------------------------------+----------------------------------------------------------------------------------+
| :envvar:`PYMAPDL_START_INSTANCE` | Override the behavior of the |
Expand Down
13 changes: 7 additions & 6 deletions src/ansys/mapdl/core/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@
"replace_env_vars",
"version",
"detect_HPC",
"detect_HPC",
germa89 marked this conversation as resolved.
Show resolved Hide resolved
"set_no_abort",
"force_intel"
# Non documented args
Expand Down Expand Up @@ -1153,15 +1154,15 @@ def launch_mapdl(
export PYMAPDL_MAPDL_VERSION=22.2

detect_HPC: bool, optional
Whether detect if PyMAPDL is running on an HPC cluster or not. Currently
only SLURM clusters are supported. By detaul, it is set to true.
This option can be bypassed if the environment variable
``PYMAPDL_ON_SLURM`` is set to "true". For more information visit
Whether detect if PyMAPDL is running on an HPC cluster. Currently
only SLURM clusters are supported. By default, it is set to true.
This option can be bypassed if the ``PYMAPDL_ON_SLURM``
environment variable is set to "true". For more information, see
:ref:`ref_hpc_slurm`.

kwargs : dict, optional
These keyword arguments are interface specific or for
development purposes. See Notes for more details.
These keyword arguments are interface-specific or for
development purposes. For more information, see Notes.

set_no_abort : :class:`bool`
*(Development use only)*
Expand Down
Loading