From 4830a0786213b0dc15053bb2f55c37fba1a953ce Mon Sep 17 00:00:00 2001 From: Anna Shors Date: Tue, 10 Dec 2024 13:39:05 -0800 Subject: [PATCH] docs: add eval documentation (#428) Signed-off-by: ashors1 --- docs/user-guide/aligner-algo-header.rst | 4 +- docs/user-guide/evaluation.rst | 39 +++++++++++++++++++ .../nlp/data/sft/remove_long_dialogues.py | 2 +- 3 files changed, 43 insertions(+), 2 deletions(-) create mode 100644 docs/user-guide/evaluation.rst diff --git a/docs/user-guide/aligner-algo-header.rst b/docs/user-guide/aligner-algo-header.rst index 15114dc02..a9e029784 100644 --- a/docs/user-guide/aligner-algo-header.rst +++ b/docs/user-guide/aligner-algo-header.rst @@ -1,4 +1,6 @@ .. important:: Before starting this tutorial, be sure to review the :ref:`introduction ` for tips on setting up your NeMo-Aligner environment. - If you run into any problems, refer to NeMo's `Known Issues page `__. The page enumerates known issues and provides suggested workarounds where appropriate. \ No newline at end of file + If you run into any problems, refer to NeMo's `Known Issues page `__. The page enumerates known issues and provides suggested workarounds where appropriate. + + After completing this tutorial, refer to the :ref:`evaluation documentation ` for tips on evaluating a trained model. \ No newline at end of file diff --git a/docs/user-guide/evaluation.rst b/docs/user-guide/evaluation.rst new file mode 100644 index 000000000..0922905a8 --- /dev/null +++ b/docs/user-guide/evaluation.rst @@ -0,0 +1,39 @@ +.. include:: /content/nemo.rsts + +.. _nemo-aligner-eval: + +Evaluate a Trained Model +@@@@@@@@@@@@@@@@@@@@@@@@ + +After training a model, you may want to run evaluation to understand how the model performs on unseen tasks. You can use Eleuther AI's `Language Model Evaluation Harness `_ +to quickly run a variety of popular benchmarks, including MMLU, SuperGLUE, HellaSwag, and WinoGrande. +A full list of supported tasks can be found `here `_. + +Install the LM Evaluation Harness +################################# + +Run the following commands inside of a NeMo container to install the LM Evaluation Harness: + +.. code-block:: bash + + git clone --depth 1 https://github.com/EleutherAI/lm-evaluation-harness + cd lm-evaluation-harness + pip install -e . + + +Run Evaluations +############### + +A detailed description of running evaluation with ``.nemo`` models can be found in Eleuther AI's `documentation `_. +Single- and multi-GPU evaluation is supported. The following is an example of running evaluation using 8 GPUs on the ``hellaswag``, ``super_glue``, and ``winogrande`` tasks using a ``.nemo`` file from NeMo-Aligner. +Please note that while it is recommended, you are not required to unzip your .nemo file before running evaluations. + +.. code-block:: bash + + mkdir unzipped_checkpoint + tar -xvf /path/to/model.nemo -c unzipped_checkpoint + + torchrun --nproc-per-node=8 --no-python lm_eval --model nemo_lm \ + --model_args path='unzipped_checkpoint',devices=8,tensor_model_parallel_size=8 \ + --tasks lambada_openai,super-glue-lm-eval-v1,winogrande \ + --batch_size 8 diff --git a/examples/nlp/data/sft/remove_long_dialogues.py b/examples/nlp/data/sft/remove_long_dialogues.py index 680f91606..95208f440 100644 --- a/examples/nlp/data/sft/remove_long_dialogues.py +++ b/examples/nlp/data/sft/remove_long_dialogues.py @@ -25,7 +25,7 @@ Usage: python3 remove_long_dialogues.py \ --tokenizer_path \ - --tokenizer_type sentencepiece + --tokenizer_type sentencepiece \ --dataset_file \ --output_file \ --seq_len