Evaluation of metrics for Speech Translation
From source:
git clone https://github.com/alan-turing-institute/ARC-M4ST
cd ARC-M4ST
python -m pip install .
The code used to generate the experiment results for the main experiments with the WMT, DEMETR, and CallHome datasets is in the scripts directory, which all have readmes describing their data sources and main scripts. It also contains a 4th exploratory experiment looking at the potential impact of transcription errors.
Metric classes for ChrF
, BLEU
, COMET
, MetricX
, and BLASER-2.0
are defined in src/m4st/metrics. All of them provide a get_scores
method which takes an instance of a TranslationDataset
, which takes lists of predictions, and optionally sources, references, and languages (see the docstring of the TranslationDataset
class).
brew install typst
typst compile notes.typ
To use the Ollama client, which is one way to corrupt sentences randomly, you need to install https://ollama.com, and run the server.
Distributed under the terms of the MIT license. Some of the datasets used and metric code use different permissive licenses - see the relevant files/their documentation.